PhD Dissertation Proposal by YANG Zhou | Towards Robust, Secure, and Privacy-aware Large Language Models of Code

Please click here if you are unable to view this page.

Towards Robust, Secure, and Privacy-aware Large Language Models of Code

YANG Zhou

PhD Candidate
School of Computing and Information Systems
Singapore Management University

FULL PROFILE

Research Area

Information Systems & Technology
- Software Engineering

Dissertation Committee

Research Advisor

OUB Chair Prof David LO

Dissertation Committee Members

Date

26 July 2024 (Friday)

Time

10:30am – 11:30am

Venue

Meeting Room 5.1,
Level 5
School of Computing and Information Systems 1, Singapore Management University, 80 Stamford Road Singapore 178902

Please register by 25 July 2024.

We look forward to seeing you at this research seminar.

ABOUT THE TALK

Artificial Intelligence, specifically the large language models for code (LLM4Code), has reshaped software engineering. LLM4Code demonstrate strong functional capability in generating and summarizing code, predicting vulnerabilities, etc. Yet, researchers have recently unraveled that LLM4Code fail to satisfy non-functional properties. In a recent survey, we analyze 146 papers and identify six important properties that deserve attention from researchers and practitioners, including robustness, security, privacy, explainability, efficiency, and usability.

In this talk, I will highlight my research regarding three properties: robustness, security, and privacy. First, LLM4Code are not robust. We show human-imperceptible perturbations can make models produce wrong results. Second, LLM4Code is vulnerable to backdoor attack [3] and membership inference attack. It is worrisome that existing methods cannot fully address such threats. Third, we expose that LLM4Code can memorize its training data, exposing vulnerable, sensitive, and privacy-revealing code to the end users. It potentially causes security and ethical issues. I will also briefly explain our latest work on effectively mitigating such undesired behavior in a time efficient manner. To summarize, we provide a higher-level "ecosystem perspective" of analyzing LLM4Code, aiming to improve the trustworthiness and transparency in building the next generation of AI tools for software engineering.

ABOUT THE SPEAKER

YANG Zhou is a third-year PhD candidate at Singapore Management University, mentored by Prof. David LO. Zhou's main research focus is "beyond accuracy of large language models for code (LLM4Code)," analyzing and assuring a broad list of properties including robustness, security, privacy, efficiency, explainability and usability of LLM4Code ecosystems. Zhou also has publication records in the general AI testing, including evaluating correctness of speech recognition systems, fairness of NLP models, and security threats in reinforcement learning models.

Where to find us

Get in touch