PhD Dissertation Defense by YANG Zhou | Towards Robust, Secure, and Privacy-aware Large Language Models of Code

Please click here if you are unable to view this page.

Towards Robust, Secure, and Privacy-aware Large Language Models of Code

YANG Zhou

PhD Candidate
School of Computing and Information Systems
Singapore Management University

FULL PROFILE

Research Area

Information Systems & Technology
- Software Engineering

Dissertation Committee

Research Advisor

OUB Chair Prof David LO

Dissertation Committee Member

External Member

Premkumar DEVANBU, Distinguished Professor, Department of Computer Science,University of California, Davis

Date

11 December 2024 (Wednesday)

Time

9:00am – 10:00am

Venue

Meeting room 5.1,
Level 5
School of Computing and Information Systems 1,
Singapore Management University,
80 Stamford Road,
Singapore 178902

Please register by 10 December 2024.

We look forward to seeing you at this research seminar.

ABOUT THE TALK

The large language models for code (LLM4Code) have reshaped software engineering, demonstrating strong functional capability in generating and summarizing code, predicting vulnerabilities, etc. Yet, we have recently unraveled that LLM4Code fails to satisfy non-functional properties. This dissertation presents a series of works on evaluating and enhancing the robustness, security, and privacy of the LLM4Code ecosystem.

The dissertation starts with the first systematic literature review that analyzes 146 papers to identify six important properties that deserve attention from researchers and practitioners, including robustness, security, privacy, explainability, efficiency, and usability. Then, we highlight four research papers. First, LLM4Code is not robust. We show that human-imperceptible perturbations can make models produce wrong results. Second, LLM4Code is vulnerable to backdoor attacks. It is worrisome that existing methods cannot fully address such threats. Third, we expose that LLM4Code can memorize its training data, exposing vulnerable, sensitive, and privacy-revealing code to the end users. This potentially causes security and ethical issues. Fourth, we show that LLM4Code is threatened by the membership inference attack. To summarize, we further discuss our latest study on building and analyzing the “LLM4Code ecosystem," aiming to improve the trustworthiness and transparency in building the next generation of AI tools for software engineering.

ABOUT THE SPEAKER

YANG Zhou focuses on different properties of LLM4Code, e.g., robustness, security, and privacy. He won the SMU Research Staff Excellence Award, the ACM SIGSOFT Distinguished Paper Award, and the 1st place ACM SRC Award. Zhou received his MSc degree from UCL and his B.Eng. degree from Yangzhou University (yes, the same as his full name!). Zhou likes to walk on the streets and freeze memorable moments with his Fujifilm X100 camera.

Where to find us

Get in touch