| |
 Safety-Constrained Learning for Intelligent Systems: From Risk-Aware Planning to Safety Aligned Large Language Models |  | LU Yuxiao PhD Candidate School of Computing and Information Systems Singapore Management University | Research Area Dissertation Committee Research Advisor Co-Research Advisor - Arunesh SINHA, Assistant Professor, Department of Management Science & Information Systems, Rutgers Business School, Rutgers University
Committee Members External Member - Thanh Hong NGUYEN, Associate Professor, Department of Computer Science, University of Oregon
|
| | Date 15 May 2026 (Friday) | Time 9:00am - 10:00am | Venue Meeting room 5.1, Level 5 School of Computing and Information Systems 1, Singapore Management University, 80 Stamford Road Singapore 178902 | Please register by 13 May 2026. We look forward to seeing you at this research seminar. 
|
|
|
| | ABOUT THE TALK This dissertation studies how to build intelligent systems that are safe without sacrificing their usefulness. It addresses this challenge in two settings: long-horizon reinforcement learning and large language model alignment. For sequential decision-making, the thesis proposes a constrained hierarchical reinforcement learning framework that treats safety as an explicit and adjustable planning objective. By combining high-level constrained subgoal planning with low-level goal-conditioned control, the method improves task success, constraint satisfaction, and re-planning efficiency in safety-critical environments. For language alignment, the thesis develops two complementary methods for improving LLM safety. The first is a semantics-guided supervised fine-tuning approach that uses harmful data only and encourages models to move away from unsafe responses while preserving general capability. The second, Discernment via Contrastive Refinement (DCR), reduces over-refusal by separating the representations of truly toxic and seemingly toxic prompts before safety alignment. Together, these studies show that safety can be enforced through principled constraints and representation-level interventions, leading to AI systems that are both reliable and useful. | | | SPEAKER BIOGRAPHY Yuxiao LU is a final-year PhD candidate at Singapore Management University, supervised by Prof. Pradeep Varakantham and Prof. Arunesh Sinha. His research lies at the intersection of artificial intelligence, reinforcement learning, and large language model safety, with a focus on developing intelligent systems that are both safe and useful under practical constraints. Before joining SMU,Yuxiao received his B.Sc. at Shandong University. His work has been published in leading AI and computer science venues, including AAAI, ICLR, and JCSS. |
|