With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews reflect people’s opinions towards products and contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to achieve by doing customer survey in the traditional way. To customers, they can know the products they are interested in better by reading reviews, which may be uneasy without online reviews. Product reviews are playing more and more important roles in different aspects, such as shopping decision making, product design and development and so on. However, the accumulation has caused consuming all reviews impossible. It is necessary to develop automated techniques that can accurately process product reviews for humans.
One of the most fundamental research problems related to product review analysis is aspect discovery. Aspects are components or attributes of a product or service. Aspect discovery is to find the relevant terms and then cluster them into aspects. As users often evaluate products based on aspects, presenting them with aspect level analysis is very necessary. Meanwhile, aspect discovery works as the basis of many downstream applications, such as aspect level opinion summarization, rating prediction, and product recommendation. Only with accurate aspect discovery, aspect level analysis and the downstream applications can be completed well.
There are three basic steps to go through for aspect discovery. The first one is about defining the aspects we need. In this step, we need to understand and deter- mine what are considered aspects. The second one is about identifying words that are used to describe aspects. This step can help us concentrate on analyzing infor- mation that is most relevant to aspect discovery. The third one is about clustering words into aspects. The main goal of this step is to cluster words that are about the same aspect into the same group. All three problems are closely connected with each other.
There has been much work trying to do the three basic steps in different ways. However, there still exist some limitations with them. In the first step, most existing studies assume that they can discover aspects that people use to evaluate products. However, besides aspects, there also exist another type of latent topics in product re- views, which is named “properties” by us. Properties are attributes that are intrinsic to products, which are not suitable to be used to compare different products. In the second step, to identify aspect words, many supervised learning based models have been proposed. While proven to be effective, they require large amounts of training data and turn to be much less useful when applied to data from a different domain. To finish the third step, many extensions of LDA have been proposed for cluster- ing aspect words. Most of them only rely on the co-occurrence statistics of words. But this overlooks the semantic meanings of words, which should be considered in modeling product reviews.
In this dissertation, we try to propose several new models to deal with some of the remaining problems of existing work:
1. We propose a principled model to separate product properties from aspects and connect both of them with ratings. Our model can effectively do the separation and its output can help us understand users’ shopping behaviors and preferences better.
2. We design two Recurrent Neural Network (RNN) based models to incorporate domain independent rules into domain specific supervised learning based neural networks. With the help of syntactic rules, our models can improve a lot over some existing strong baselines in the task of cross-domain aspect word identification.
3. We use word embeddings to boost traditional topic modeling of product reviews. Instead of treating words as discrete signals, we use their representational vectors to model their semantic meanings. The proposed model is more effective in both discovering meaningful aspects and recommending products to users.
4. We propose a model integrating RNN with Neural Topic model (NTM) to jointly identify and cluster aspect words. Our model is able to discover clearer and more coherent aspects. It is also more effective in sentence clustering than the baselines.
About the Speaker
DING Ying is a PhD candidate in School of Information Systems, Singapore Management University, under the supervision of Assistant Professor JIANG Jing. He works in the area of text mining and recommender systems. His primary research interest focuses on using advanced techniques in natural language processing and machine learning to analyze product reviews. He has also done some exploration in text summarization, topic modelling and social media mining.