PhD Dissertation Proposal by LE Van Minh Tuan

Please click here if you are unable to view this page.

Probabilistic Models for Semantic Visualization and Its Applications

Speaker (s):

LE Van Minh Tuan

PhD Candidate

School of Information Systems

Singapore Management University

Date:

Time:

Venue:

November 24, 2016, Thursday

10:00 am - 11:00 am

Meeting Room 4.4, Level 4

School of Information Systems

Singapore Management University

80 Stamford Road

Singapore 178902

We look forward to seeing you at this research seminar.

About the Talk

Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. These approaches consider the problem of semantic visualization which attempts to jointly model visualization and topics. They assume that both documents and topics have latent coordinates in the visualization space. The topic distribution of a document is then determined by the distances between the document and different topics. With semantic visualization, documents with similar topics will be displayed nearby even though they do not share any words. This dissertation focuses on building probabilistic models for semantic visualization by modeling other aspects of documents in addition to their texts. The dissertation includes three main parts:

Modeling document relationship.

Previous semantic visualization models assume that documents are sampled independently and they have no relationship. This assumption is not appropriate when documents exhibit some relationship such as a neighborhood structure or a network structure. Though semantic similarities inferred from text can compensate for this strict assumption, we show that it is not enough and there is still some information in the document relationship that need to be preserved in the visualization. With regards to this problem, we propose two semantic visualization models. The first one, SEMAFORE is for modeling neighborhood structure. The second one, PLANE is for modeling networked documents.

Modeling document representation.

Most of the visualization models represent documents as bags of word counts. This type of representation is sensitive to document lengths and cannot model word absences. In contrast, spherical representation where documents are represented as unit vectors (i.e., L2-normalized vectors) does not suffer this problem. Therefore, we propose SSE, a semantic visualization model for spherical representation. Another type of representation is bag of word vectors. Word vectors are known for its ability to deal with sparsity problem in short texts. We are currently developing a semantic visualization model using word vectors for visualizing short texts.

Applications of semantic visualization.

We find application of semantic visualization in various problems. For single document visualization, we propose a new framework called WORD FLOCK for visual comparison of documents using word clouds. In this framework, a semantic visualization method is used to visualize words which are represented as pseudo-documents. Another application of semantic visualization is for document collection visualization. We are currently developing a tool for visualizing a document corpus to support interactive topical exploration.

About the Speaker

LE Van Minh Tuan is a PhD candidate in School of Information Systems, Singapore Management University under the supervision of Assistant Professor Hady W. Lauw. His current research interests focus on visualization, dimensionality reduction, topic modeling and generative models.

Where to find us

Get in touch