| |
| |
|
Chinese Idiom Understanding with Transformer-based Pretrained Language Models
|
|

|
TAN Minghuan
PhD Candidate
School of Computing and Information Systems
Singapore Management University
|
|
Research Area
Dissertation Committee
Research Advisor
Committee Members
External Member
- SUN Aixin, Associate Professor, Nanyang Technological University
|
|
|
|
Date
19 May 2022 (Thursday)
|
|
Time
10:00am - 11:00am
|
|
Venue
This is a virtual seminar. Please register by 17 May 2022, the zoom link will be send out on the following day to those who have registered.
|
|
We look forward to seeing you at this research seminar.

|
|
|
| |
|
About The Talk
In this dissertation, I study the understanding of Chinese idioms using transformer-based pretrained language models. By "understanding", I confine the topics to the widely adopted practices like static word embeddings learning, contextualized word representation learning and conditional text generation.
Chinese idioms are fixed phrases that have special meanings usually derived from an ancient story. The meanings of these idioms are oftentimes not directly related to their component characters, which makes it hard to model them compared with standard phrases whose meanings are compositional.
I initiate the work with studying idiom representations derived from pretrained language models. We adopt probing-based methods to investigate to what extent BERT can encode an idiom's meaning and design two probing tasks to test whether idiom encodings through pretrained language models. Then we propose a BERT-based method to better learn Chinese idioms' embeddings and evaluate the embeddings using our newly constructed dataset of Chinese idiom synonyms and antonyms. I further study Chinese idiom prediction based on a context and propose a new task called Chengyu-oriented text polishing.
I finally conclude the thesis by summarizing the contributions of this thesis and pointing out potential future directions to explore related to Chinese idiom understanding, namely, sentiment analysis with idioms and explaining Chinese Chengyu recommendation models.
|
| |
|
Speaker Biography
Minghuan got his bachelor's degree in Mathematics from Shandong University in the year 2011. He has been working as a software engineer at Huawei and Elong before 2017. After Joining SMU, he is interested in pretrained language models and their applications. He has several publications on Multiword Expressions and Chinese Idioms in TALLIP, RANLP, SemEval and COLING. Since July 2021, he has been a research intern at Tencent AI Lab, where he worked on text generation and multi-modal pretraining.
|
|