|
Pre-Conference Talks
by CHIA Chong Cher and LIM Jia Peng
|
DATE : |
14th Nov 2022, Monday |
TIME : |
3.00pm to 3.40pm |
VENUE : |
Meeting room 5.1, Level 5. School of Computing and Information Systems, Singapore Management University, 80 Stamford Road Singapore 178902. Please register by 13th Nov 2022 |
|

|
|
There are 2 talks in this session, each talk is approximately 20 minutes.
|
|
About the Talk (s)
Talk #1: Morphologically-Aware Vocabulary Reduction of Word Embeddings
by CHIA Chong Cher, PhD Candidate
for The 21st IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'22)
|
In this presentation, we will discuss SubText, a word embedding compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages.
|
Talk #2: Towards Reinterpreting Neural Topic Models via Composite Activations
by LIM Jia Peng, PhD Student
for The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
|
Most Neural Topic Models (NTM) use a variational auto-encoder framework producing K topics limited to the size of the encoder’s output. These topics are interpreted through the selection of the top activated words via the weights or reconstructed vector of the decoder that are directly connected to each neuron. In this paper, we present a model-free two-stage process to reinterpret NTM and derive further insights on the state of the trained model. Firstly, building on the original information from a trained NTM, we generate a pool of potential candidate “composite topics” by exploiting possible co-occurrences within the original set of topics, which decouples the strict interpretation of topics from the original NTM. This is followed by a combinatorial formulation to select a final set of composite topics, which we evaluate for coherence and diversity on a large external corpus.
|
|
|
About the Speaker (s)
 |
|
Chong Cher is a fifth year PhD candidate in computer science at the SMU School of Computing and Information Systems, supervised by Prof. Hady W. Lauw. His research focuses on the effectiveness and efficiency of semantic representations. |
|
 |
|
Jia Peng is a second-year PhD student in Computer Science dealing with Computational Linguistics. Currently, he is looking for potential user study participants (compensated of course) for word-grouping studies, please reach out to him at jiapeng.lim.2021@phdcs.smu.edu.sg to register your interest. |
|
|
|
|