showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

Pre-Conference Talk by ZHU Jiawen | Unleashing Vision-Language Semantics for Deepfake Video Detection

Please click here if you are unable to view this page.

 

Unleashing Vision-Language Semantics for Deepfake Video Detection

Speaker:


ZHU Jiawen
PhD Candidate 
School of Computing and Information Systems 
Singapore Management University

 

Date:

Time:

Venue:

 

20 May 2026 (Wednesday)

3:30pm – 4:00pm

Meeting room 4.4, Level 4. School of Computing and Information Systems 1, 
Singapore Management University,
80 Stamford Road
Singapore 178902

Please register by 18 May 2026.

About the Talk

Recent Deepfake Video Detection (DFD) studies have demonstrated that pre-trained Vision-Language Models (VLMs) such as CLIP exhibit strong generalization capabilities in detecting artifacts across different identities. However, existing approaches focus on leveraging visual features only, overlooking their most distinctive strength — the rich vision-language semantics embedded in the latent space. We propose VLAForge, a novel DFD framework that unleashes the potential of such cross-modal semantics to enhance model's discriminability in deepfake detection. This work i) enhances the visual perception of VLM through a ForgePerceiver, which acts as an independent learner to capture diverse, subtle forgery cues both granularly and holistically, while preserving the pretrained Vision–Language Alignment (VLA) knowledge, and ii) provides a complementary discriminative cue — Identity-Aware VLA score, derived by coupling cross-modal semantics with the forgery cues learned by ForgePerceiver. Notably, the VLA score is augmented by an identity prior-informed text prompting to capture authenticity cues tailored to each identity, thereby enabling more discriminative cross-modal semantics. Comprehensive experiments on video DFD benchmarks, including classical face-swapping forgeries and recent full-face generation forgeries, demonstrate that our VLAForge substantially outperforms state-of-the-art methods at both frame and video levels. Code is available at https://github.com/mala-lab/VLAForge.

This is a Pre-Conference talk for The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026).

About the Speaker

Jiawen ZHU is a final-year PhD candidate at Singapore Management University under the supervision of Prof. Guansong Pang. She has published multiple papers at top-tier conferences, including CVPR and ICCV. Her research interests include computer vision and open-world learning, with a particular focus on generalist anomaly detection and deepfake artifact detection.