About The Talk Online reviews are prevalent in many modern Web applications, such as e-commerce websites, online check-in and/or review platforms. Fueled by the rise of mobile phones that are often the only cameras on hand, reviews are increasingly multimodal in nature, with photos in addition to textual content. In this work, we focus on modeling the subjectivity carried in this form of data. There are two main research objectives being pursued. In the first part of this presentation, we will discuss the problem of detecting the sentiment expressed by a review. First, we investigate visual sentiment analysis on review images. Second, we observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity. Therefore, we create a visual aspect attention mechanism, which relies on visual information as the alignment for pointing out the important sentences of a document. Third, we further study the utilization of sentiment as an independent modality in the context of cross-modal retrieval.
In the second part of this presentation, we will introduce methods for modeling user preferences from such multimodal data. In online reviews, for instance, preference manifests in numerical rating, textual content, as well as visual images. First, we hypothesize that modeling these modalities jointly would result in a more holistic representation of a review towards more accurate recommendations. Second, we introduce a new generative model of preferences, inspired by variational autoencoder architecture, which is bilateral in nature, i.e., users and items are treated symmetrically, making it more apt for user-item interaction type of data and alleviating the sparsity issue faced by traditional collaborative filtering methods.
|