Pre-Conference Talk by CHEANG Chi Seng | Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness

Please click here if you are unable to view this page.

Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness

Speaker:

CHEANG Chi Seng
Ph.D. Student
School of Computing and Information Systems
Singapore Management University

Date:

Time:

Venue:

11 June 2026, Thursday

11:00am – 11:30am

Meeting room 4.4, Level 4. School of Computing and Information Systems 1, Singapore Management University, 80 Stamford Road Singapore 178902

Please register by 9 June 2026.

About the Talk

Recent work suggests that LLMs "know what they don't know”, positing that hallucinated and factually correct outputs arise from distinct internal processes and can therefore be distinguished using internal signals. However, hallucinations have multifaceted causes: beyond simple knowledge gaps, they can emerge from training incentives that encourage models to exploit statistical shortcuts or spurious associations learned during pretraining. In this paper, we argue that when LLMs rely on such learned associations to produce hallucinations, their internal processes are mechanistically similar to those of factual recall, as both stem from strong statistical correlations encoded in the model's parameters. To verify this, we propose a novel taxonomy categorizing hallucinations into Unassociated Hallucinations (UHs), where outputs lack parametric grounding, and Associated Hallucinations (AHs), which are driven by spurious associations. Through mechanistic analysis, we compare their computational processes and hidden-state geometries with factually correct outputs. Our results show that hidden states primarily reflect whether the model is recalling parametric knowledge rather than the truthfulness of the output itself. Consequently, AHs exhibit hidden-state geometries that largely overlap with factual outputs, rendering standard detection methods ineffective. In contrast, UHs exhibit distinctive, clustered representations that facilitate reliable detection.

This is a Pre-Conference talk for The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026).

About the Speaker

CHEANG Chi Seng is a first-year PhD student in Computer Science at Singapore Management University, supervised by Dr. DENG Yang. His research interests include natural language processing, trustworthy AI, and hallucination detection and mitigation in large language models.

Where to find us

Get in touch