PhD Dissertation Defense by LE Dinh Xuan Bach | Overfitting in Automated Program Repair: Challenges and Solutions

Please click here if you are unable to view this page.

Overfitting in Automated Program Repair: Challenges and Solutions

LE Dinh Xuan Bach

PhD Candidate

School of Information Systems

Singapore Management University

FULL PROFILE

Research Area

Software & Cyber-Physical Systems

Dissertation Committee

Chairman

Associate Prof. David LO

Committee Members

External Member

Willem Conradie Visser, Professor, Stellenbosch University

Date

May 15, 2018 (Tuesday)

Time

3.00pm - 4.00pm

Venue

Meeting Room 5.1, Level 5,

School of Information Systems Singapore Management University

80 Stamford Road

Singapore 178902

We look forward to seeing you at this research seminar.

About The Talk

Bug fixing is time-consuming and costly. Hence, automated program repair (APR) techniques that can relieve the burden on human developers in bug fixing would be of tremendous value. Substantial recent works have been proposed to automatically repair variety of bugs in many real-world large software, gradually materializing the futuristic idea of APR. These APR techniques, despite varying in the ways they search for repairs, commonly rely on test cases to guide the repair process and validate machine-generated patches. The reliance on test cases is, in fact, problematic to research in APR since test cases are known to be incomplete, in a sense that they often insufficiently encode desired behaviors of software. This could lead APR techniques to generate patches that overfit to the test cases used for repair, but do not necessarily generalize to expected behavior that developers would expect. To overcome the mentioned problem – often regarded as patch overfitting, APR techniques must address the followings: (1) maintaining both scalability and tractability, in which APR techniques must cheaply scale to large, real-world programs, while being able to tackle the large search space for repairs for those programs to find correct repairs, (2) enhancing expressive power to correctly fix many more real bugs from diverse real-world programs (3) methodologies to validate machine-generated patches. This dissertation tackles the above challenges posed by the overfitting problem by (1) proposing new search- and semantics-based APR techniques that are capable of generating generalizable repairs, (2) empirically studying the overfitting issue in semantics-based APR, complementing existing study on the search-based counterparts, and (3) empirically evaluating the reliability of patch validation methodologies, providing insightful guidelines on how machine-generated patches should be evaluated. In particular, we proposed HDRepair – a search-based APR technique that leverages the development history of many software to guide and drive the repair process. We empirically study various characteristics of different semantics-based APR techniques, showing that APR techniques are indeed subject to overfitting at various degrees. We subsequently proposed S3 – a semantics-based APR technique that systematically constrains the syntactic search space for repairs and effectively ranks solutions to find correct repairs. Finally, we study the reliability of existing popular patch validation methodologies, and provide several guidelines and insights on how APR-generated patches should be evaluated.

Speaker Biography

Xuan-Bach D. LE is currently a final year PhD candidate in SIS, SMU. His research focuses on Software Engineering, particularly on program analysis, repair, synthesis, and verification. Prior to joining SMU, he was a research assistant in School of Computing, National University of Singapore. He obtained bachelor’s degree from Hanoi University of Science and Technology, Vietnam. He will be joining Cylab, Carnegie Mellon University from June 2018, working on software security.

Where to find us

Get in touch