About The Talk In recent years, software question and answer (SQA) sites have grown rapidly and nowadays they already become an essential part of developers' day-to-day work for various purposes, e.g., problem-solving and self-learning. Particularly, Stack Overflow is the most popular and also the largest SQA site. According to the latest developers survey of SO, as of July 2020, it already has over 13 million registered users and 20 million questions. Moreover, about 50 million people visit Stack Overflow each month, and more than 7.5 thousand questions are raised every day. The huge amount of data in SQA sites constitutes a core knowledge asset for the software engineering domain. At the same time, based on the data in SQA sites, a large number of machine learning approaches have been proposed to interpret and make use of the knowledge from those sites. For example, some works aim to boost SQA sites by improving relevant question retrieval, post classification, and tag recommendation. Also, there are various works that propose solutions to automate software development activities by leveraging SQA data, e.g., program repair, refactoring, etc.
In this dissertation proposal, extending on the body of work that analyzes SQA, we propose to tackle three research problems related to (1) linking information in SQA sites, (2) representing contents in SQA sites, and (3) summarizing contents in SQA sites. |
Speaker Biography Bowen Xu is a third-year Ph.D. candidate in the School of Information Systems, Singapore Management University, advised by Associate Professor David Lo. His research focuses on leveraging machine learning techniques for interpreting knowledge from software question and answer sites. |