PhD Dissertation Defense by TAN Minghuan | Chinese Idiom Understanding with Transformer-based Pretrained Language Models

Please click here if you are unable to view this page.

Chinese Idiom Understanding with Transformer-based Pretrained Language Models

TAN Minghuan

PhD Candidate
School of Computing and Information Systems
Singapore Management University

FULL PROFILE

Research Area

Artificial Intelligence & Data Science
- Machine Learning & intelligence

Dissertation Committee

Research Advisor

Prof. Jing JIANG

Committee Members

External Member

SUN Aixin, Associate Professor, Nanyang Technological University

Date

19 May 2022 (Thursday)

Time

10:00am - 11:00am

Venue

This is a virtual seminar. Please register by 17 May 2022, the zoom link will be send out on the following day to those who have registered.

We look forward to seeing you at this research seminar.

About The Talk

In this dissertation, I study the understanding of Chinese idioms using transformer-based pretrained language models. By "understanding", I confine the topics to the widely adopted practices like static word embeddings learning, contextualized word representation learning and conditional text generation.

Chinese idioms are fixed phrases that have special meanings usually derived from an ancient story. The meanings of these idioms are oftentimes not directly related to their component characters, which makes it hard to model them compared with standard phrases whose meanings are compositional.

I initiate the work with studying idiom representations derived from pretrained language models. We adopt probing-based methods to investigate to what extent BERT can encode an idiom's meaning and design two probing tasks to test whether idiom encodings through pretrained language models. Then we propose a BERT-based method to better learn Chinese idioms' embeddings and evaluate the embeddings using our newly constructed dataset of Chinese idiom synonyms and antonyms. I further study Chinese idiom prediction based on a context and propose a new task called Chengyu-oriented text polishing.

I finally conclude the thesis by summarizing the contributions of this thesis and pointing out potential future directions to explore related to Chinese idiom understanding, namely, sentiment analysis with idioms and explaining Chinese Chengyu recommendation models.

Speaker Biography

Minghuan got his bachelor's degree in Mathematics from Shandong University in the year 2011. He has been working as a software engineer at Huawei and Elong before 2017. After Joining SMU, he is interested in pretrained language models and their applications. He has several publications on Multiword Expressions and Chinese Idioms in TALLIP, RANLP, SemEval and COLING. Since July 2021, he has been a research intern at Tencent AI Lab, where he worked on text generation and multi-modal pretraining.

Where to find us

Get in touch