About The Talk
Software engineering corpora, collected from large software systems (i.e., MacOS, Ubuntu, Firefox, etc.), differs from natural language corpora. Specifically, software engineering corpora does not only include natural language, used by humans, but also includes programming language, used by machines. Software engineering corpora has been heavily studied in the last decade and used to solve many software engineering problems, e.g., tag recommendation, detecting duplicated bug report, pro ling android application, etc. In this dissertation, we take advantage of software engineering corpora to detect bugs in software systems. Specifically, we aim to solve three main software engineering problems: i.e., bug localization, just-in-time defect prediction, and bug fixing patch identification.
In this dissertation, we aim to (1) propose a model taking advantage of bug report similarity and method similarity graphs to localize bug effectively; and (2) propose a deep learning model automatically extracting code change features by leveraging the semantic and syntactic structure of the actual code changes for detecting bugs in commits. While (1) aims to solve bug localization problem, (2) aims to address just-in-time defect prediction and bug fixing patches problems in the software engineering community. Our contributions in this dissertation proposal are as follows: (1) bug localization: We propose a new approach, namely Network-clustered Multi-modal Bug Localization (NetML), which utilizes multi-modal information from both bug reports and program spectra to localize bugs. NetML facilitates an effective bug localization by carrying out a joint optimization of bug localization error and clustering of both bug reports and program elements (i.e., methods). (2) Just-in-time defect prediction: We propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. (3) We propose a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches, namely PatchNet. Unlike DeepJIT, PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. |