showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

PhD Dissertation Defense by CAO Rui | Using Pre-trained Models for Vision-Language Understanding Tasks

Please click here if you are unable to view this page.

 
 

Using Pre-trained Models for Vision-Language Understanding Tasks

CAO Rui

PhD Candidate   
School of Computing and Information Systems   
Singapore Management University   
 

FULL PROFILE

Research Area

Dissertation Committee

Research Advisor

Dissertation Committee Member

External Member

  • Roy Ka-Wei LEE, Assistant Professor, Design & Artificial Intelligence Programme, Singapore University of Technology and Design
 

Date

2 May 2024 (Thursday)

Time

9:00am – 10:00am

Venue

Meeting room 5.1, Level 5   
School of Computing and Information Systems 1, Singapore Management University, 80 Stamford Road Singapore 178902

Please register by 01 May 2024.

We look forward to seeing you at this research seminar.

 

ABOUT THE TALK

In recent years, remarkable progress has been made in Artificial Intelligence (AI), with an increasing focus on integrating AI systems into people’s daily lives. In the context of our diverse world, research attention has shifted towards applying AI to multimodal understanding tasks. This thesis specifically addresses two key modalities, namely, vision and language, and explores Vision-Language Understanding (VLU).

In the past, addressing VLU tasks involved training distinct models from scratch using task-specific data. However, limited by the amount of training data, models may easily overfit the training data and fail to generalize. A recent breakthrough is the development of Pre-trained Models (PTMs), which are trained on extensive datasets to acquire universal representations. Leveraging these PTMs for VLU tasks has become a prevalent approach.

The use of PTMs for VLU tasks can be divided into two paradigms: (1) finetuning PTMs with downstream task data, and (2) zero-shot transfer or few-shot learning based on frozen PTMs. However, existing methods under these two paradigms suffer from a few limitations: direct fine-tuning of PTMs may overlook the unique characteristics of the downstream tasks; the zero-shot and few-shot performance of PTMs on some tasks may be poor; and complex VLU tasks may require multiple reasoning skills that a single PTM may not possess.

In the thesis, we aim to address the limitations above by optimizing the utilization of PTMs for VLU tasks, with the task of hateful meme detection and visual question answering as testbeds.

ABOUT THE SPEAKER

Rui CAO is currently a PhD Candidate in the School of Computing and Information Systems, supervised by Prof. Jing JIANG. Her research interest is vision-language understanding, with a specific focus on Visual Question Answering and Hateful Meme Detection.