PhD Dissertation Defense by LE Hung | Deep Learning for Video-grounded Dialogue Systems

Please click here if you are unable to view this page.

Deep Learning for Video-grounded Dialogue Systems

LE Hung

PhD Candidate
School of Computing and Information Systems
Singapore Management University

FULL PROFILE

Research Area

Artificial Intelligence & Data Science
- Machine Learning & Intelligence

Dissertation Committee

Research Advisor

Prof. Steven HOI

Co-Research Advisor

Nancy F. Chen, Senior Scientist, Agency for Science, Technology, and Research

Committee Members

External Member

Dr Mike Z SHOU, Assistant Professor, National University of Singapore

Date

28 January 2022 (Friday)

Time

12:00pm - 1:00pm

Venue

This is a virtual seminar. Please register by 27 January, the zoom link will be sent out on the following day to those who have registered.

We look forward to seeing you at this research seminar.

About The Talk

In recent years, we have witnessed significant progress in building systems with artificial intelligence. However, despite advancements in machine learning and deep learning, we are still far from achieving autonomous agents that can perceive multi-dimensional information from the surrounding world and converse with humans in natural language. Towards this goal, this thesis is dedicated to building intelligent systems in the task of video-grounded dialogues.

Specifically, in a video-grounded dialogue, a system is required to hold a multi-turn conversation with humans about the content of a video. Given an input video, a dialogue history, and a question about the video, the system has to understand contextual information of dialogue, extract relevant information from the video, and construct a dialogue response that is both contextually relevant and video-grounded. Compared to related research domains in computer vision and natural language processing, the video-grounded dialogue task raises challenging requirements, including:

(1) language reasoning in multiple turns: the ability to understand contextual information from dialogues, which often consist of linguistic dependencies from turn to turn;
(2) visual reasoning in spatio-temporal space: the ability to extract information from videos, which contain both spatial and temporal variations that characterize object appearance and actions; and
(3) language generation: the ability to acquire natural language and generate responses with both contextually relevant and video-grounded information.

Throughout this thesis, we introduced various approaches to address one or more of the above challenges in building video-grounded dialogue systems. Specifically, we proposed deep learning-based solutions with novel neural network architectures, model optimization, data augmentation, and diagnostic tasks. We hope our contributions and the insights of our thesis will facilitate the future development of intelligent multimodal dialogue systems.

Speaker Biography

Hung Le is currently a fourth-year PhD of Computer Science student at Singapore Management University. He is advised by Professor Steven Hoi (SMU, Salesforce Research) and Dr. Nancy Chen (A*STAR, I2R, Human Language Technology Group). Hung is passionate about machine learning research and application, specifically conversational AI, video understanding, and task-oriented dialogues. He has published as the first author in established academic conferences such as EMNLP, ICLR, ACL, and AAAI.

Hung was awarded the A*STAR Computer and Information Scholarship (ACIS) to pursue PhD with a focus on Deep Learning. During his PhD, he received the Presidential Doctoral Fellowships for three consecutive academic years 2019-2021, and the Dean's List award in 2021. Between his academic years, Hung joined research internships in AI labs in the industry, including the Facebook AI Research (now Meta AI Research) Salesforce Research Asia.

Where to find us

Get in touch