| |
CCBERT: Self-Supervised Code Change Representation Learning
Speaker (s):

ZHOU Xin
PhD Candidate,
School of Computing and Information Systems
Singapore Management University
|
|
Date:
Time:
Venue:
|
|
29 September 2023, Friday
2:00pm – 2:30pm
Meeting room 5.1, Level 5.
School of Computing and Information Systems 1,
Singapore Management University,
80 Stamford Road, Singapore 178902
We look forward to seeing you at this research seminar.
Please register by 28 September 2023.

|
|
About the Talk
Numerous code changes are made by developers in their daily work, and a superior representation of code changes is desired for effective code change analysis. Recently, Hoang et al. proposed CC2Vec, a neural network-based approach that learns a distributed representation of code changes to capture the semantic intent of the changes. Despite demonstrated effectiveness in multiple tasks, CC2Vec has several limitations: 1) it considers only coarse-grained information about code changes, and 2) it relies on log messages rather than the self-contained content of the code changes. In this work, we propose CCBERT (Code Change BERT), a new Transformer-based pre-trained model that learns a generic representation of code changes based on a large-scale dataset containing massive unlabeled code changes. CCBERT is pre-trained on four proposed self-supervised objectives that are specialized for learning code change representations based on the contents of code changes. CCBERT perceives fine-grained code changes at the token level by learning from the old and new versions of the content, along with the edit actions. Our experiments demonstrate that CCBERT significantly outperforms CC2Vec or the state-of-the-art approaches of the downstream tasks by 7.7%--14.0% in terms of different metrics and tasks. CCBERT consistently outperforms large pre-trained code models, such as CodeBERT, while requiring 6--10x less training time, 5--30x less inference time, and 7.9x less GPU memory.
This is a Pre-Conference talk for 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023)
About the Speaker
ZHOU, Xin is a Ph.D. student in SCIS, under the supervision of Prof. David LO. Xin's research focuses on pre-trained code representation and automation for software maintenance and development.
|