Pre-conference talk for ICSE, MSR and ICPC 2023

Please click here if you are unable to view this page.

DATE :	20 April 2023, Thursday
TIME :	3:30pm to 5:30pm
VENUE :	Seminar Room 4.4, Level 4. School of Economics/School of Computing and Information Systems 2 (SOE/SCIS2), Singapore Management University, 90 Stamford Road, Singapore 178903 Please register by 19 April 2023

There are 7 talks in this session, each talk is approximately 20 minutes.

About the Talk (s)

Talk #1: Generation-based Code Review Automation: How Far Are We?
by ZHOU Xin, PhD Candidate
for 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023)

A number of generation-based automatic code review (ACR) approaches have been proposed recently to automate various activities in the code review process. We find the previous works carry several main limitations. First, the exsting ACR approaches are not comprehensively compared with each other to show their superiority over their peer ACR approaches. Second, prior works heavily rely on the Exact Match metric which only focuses on the perfect predictions and ignores the positive progress made by incomplete answers. To fill such a research gap, we conduct a comprehensive study by comparing the effectiveness of recent ACR tools. The results show that CodeT5 can outperform other models in most cases. In addition, we introduce a new metric namely Edit Progress (EP) to quantify the partial progress made by ACR tools. The results show that the rankings of models for each task could be changed according to whether EM or EP is being utilized. Lastly, we derive several insightful lessons and reveal future research directions for generation-based code review automation.

Talk #2: TECHSUMBOT: A Stack Overflow Answer Summarization Tool for Technical Query
by YANG Chengran, PhD Candidate
for 45th International Conference on Software Engineering (ICSE 2023)

Stack Overflow is a popular platform for developers to seek solutions to programming-related problems. However, prior studies identified that developers may suffer from the redundant, useless, and incomplete information retrieved by the Stack Overflow search engine. To help developers better utilize the Stack Overflow knowledge, researchers proposed tools to summarize answers to a Stack Overflow question. However, existing tools use hand-craft features to assess the usefulness of each answer sentence and fail to remove semantically redundant information in the result. Besides, existing tools only focus on a certain programming language and cannot retrieve up-to-date new posted knowledge from Stack Overflow. In this paper, we propose TECHSUMBOT, an automatic answer summary generation tool for a technical problem. Given a question, TECHSUMBOT first retrieves answers using the Stack Overflow search engine, then TECHSUMBOT 1) ranks each answers sentence based on the sentence's usefulness, 2) estimates the centrality of each sentence to all candidates, and 3) removes the semantic redundant information. Finally, TECHSUMBOT returns the top 5 ranked answer sentences as the answer summary. We implement TECHSUMBOT in the form of a search engine website. To evaluate TECHSUMBOT in both automatic and manual manners, we construct the first Stack Overflow multi-answer summarization benchmark and design a user study to assess the effectiveness of TECHSUMBOT and state-of-the-art baselines from the NLP and SE domains. Both results indicate that the summaries generated by TECHSUMBOT are more diverse, useful, and similar to the ground truth summaries.

Talk #3: Real World Projects, Real Faults: Evaluating Spectrum Based Fault Localization Techniques on Python Projects
by Ratnadira WIDYASARI, PhD Candidate
for 45th International Conference on Software Engineering (ICSE 2023)

Spectrum Based Fault Localization (SBFL) is a statistical approach to identify faulty code within a program given a program spectra. Several SBFL techniques have been proposed over the years, but most evaluations of those techniques were done only on Java and C programs, and frequently involve artificial faults. Considering the current popularity of Python, it becomes increasingly important to understand how SBFL techniques perform on Python projects. In this work, our objective is to analyze the effectiveness of popular SBFL techniques in real-world Python projects. We also aim to compare our observed performance on Python to previously-reported performance on Java. We find that 1) the performance of the evaluated SBFL techniques are lower on BugsInPy; 2) older techniques outperform newer techniques in a variety of metrics and debugging scenarios; 3) claims in preceding studies done on artificial faults in C and Java do not hold on Python real faults; 4) lower-performing techniques can outperform higher-performing techniques in some cases. Our results yield insight into how popular SBFL techniques perform in real Python faults and emphasize the importance of conducting SBFL evaluations on real faults.

Talk #4: What Do Users Ask in Open-Source AI Repositories? An Empirical Study of GitHub Issues
by YANG Zhou, PhD Candidate
for 20th International Conference on Mining Software Repositories (MSR 2023)

The speaker will share findings from an empirical study on open-source AI repositories. They'll discuss 13 categories of issues developers may face, with runtime errors and unclear instructions being the most common. The talk will cover issue management features and offer recommendations to improve open-source AI repository quality. Useful for developers using or planning to use these repositories.

Talk #5: NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python
by YANG Zhou, PhD Candidate
for 20th International Conference on Mining Software Repositories (MSR 2023)

The speaker will discuss the NICHE dataset, which includes 572 high-quality machine learning projects manually labeled as engineered or non-engineered. This dataset is a useful resource for researchers to study best practices in ML projects and can be used to benchmark classifiers. Relevant for researchers and practitioners in machine learning and software engineering.

Talk #6: BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems
by YANG Zhou, PhD Candidate
for 45th International Conference on Software Engineering (ICSE 2023)

The speaker will discuss BiasFinder, which uses metamorphic testing to automatically detect demographic bias in Sentiment Analysis systems. BiasFinder generates new texts by mutating words associated with a demographic characteristic to uncover bias in SA systems. The paper evaluates BiasFinder on 10 SA systems and 2 large-scale datasets, and the results show it creates more bias-uncovering test cases than two baselines. Relevant for AI and machine learning researchers and practitioners, particularly those concerned with mitigating bias in AI systems.

Talk #7: PICASO: Enhancing API Recommendations with Relevant Stack Overflow Posts
by ZHANG Ting, PhD Candidate
for 20th International Conference on Mining Software Repositories (MSR 2023)

Previous studies on API recommendation leverage natural language (query) to identify which API would be suitable for the given task. However, these studies only consider one source of input, i.e., GitHub or Stack Overflow, independently. There are no existing approaches that utilize Stack Overflow to help generate better API sequence recommendations from queries obtained from GitHub. In this work, we propose PICASO, which leverages a bi-encoder to do contrastive learning and a cross-encoder to build a classification model in order to find a semantically similar Stack Overflow post given an annotation (i.e., code comment). Based on our experiments, we found that incorporating the Stack Overflow information into CodeBERT would improve the performance of API sequence generation's BLEU-4 score by 10.8%.

About the Speaker (s)

		ZHOU Xin is a Ph.D. student in SCIS, under the supervision of Prof. David LO. Xin's research focuses on pre-trained code representation and automation for software maintenance and development.

		Yang Chengran is a Ph.D. student in SCIS, under the supervision of Prof. David LO. Chengran’s research focuses on the software artifact summarization and maintainance.

		Ratnadira Widyasari is a PhD Candidate in Computer Science at the SMU School of Computing and Information Systems, supervised by Prof. David LO. Her research focuses on automated software engineering.

		YANG Zhou is a Ph.D. student in SCIS, supervised by Prof. David LO. Zhou is working hard on the "RESPECTED AI" project, which stands for: Robust, Explainable, Secure, Privacy-aware, Efficient, Correct, Transferable, Ethical, and Deployable AI.

		ZHANG, Ting is a Ph.D. candidate at SMU SCIS, supervised by Prof. David Lo and Prof. Lingxiao Jiang. Her research focuses on automatic software bug management, from detecting duplicate bug reports to repairing API misuse bugs.

Where to find us

Get in touch