PhD Dissertation Proposal by ZHAO Wei | Peer Group Analysis using Knowledge Graphs and Large Language Models for Business Optimization

Please click here if you are unable to view this page.

Evaluating and Enhancing Safety Alignment of Large Language Model

ZHAO Wei

PhD Candidate
School of Computing and Information Systems
Singapore Management University

FULL PROFILE

Research Area

Information Systems & Technology
- Cybersecurity

Dissertation Committee

Research Advisor

Prof SUN Jun

Dissertation Committee Members

Date

6 November 2024 (Wednesday)

Time

1:00pm – 2:00pm

Venue

Meeting Room 5.1,
Level 5
School of Computing and Information Systems 1, Singapore Management University, 80 Stamford Road Singapore 178902

Please register by 5 November 2024.

We look forward to seeing you at this research seminar.

ABOUT THE TALK

Large Language Models (LLMs) have transformed the field of natural language processing, but concerns about their security and reliability persist. This dissertation investigates advanced techniques for assessing and improving LLM security. First, we present CASPER, a framework for lightweight causality analysis that facilitates the evaluation of LLM behavior at both the layer and neuron levels. Building on these findings, we introduce Layer-specific Editing (LED), a method based on knowledge editing to enhance LLM alignment against adversarial attacks. Furthermore, our detailed examination of adversarial suffixes uncovers their role as significant features within LLMs, while also showing that fine-tuning with benign data can degrade safety alignment. This research deepens the understanding of LLM security and offers practical tools for improving model safety alignment.

ABOUT THE SPEAKER

ZHAO Wei is a PhD Candidate in Computer Science at the SMU School of Computing and Information Systems, supervised by Prof. SUN Jun. His research is focused on LLM Safety.

Where to find us

Get in touch