About The Talk
The code hosting platform GitHub has been gaining immense popularity worldwide, with over 200 million repositories hosted as of June 2021. However, while research on improving GitHub repositories has potential to create widespread improvement, much of existing research focuses on code quality. Fewer work focus on other aspects, although quality of a GitHub repository is also affected by factors such as documentation, the project's dependencies, and pool of contributors.
The three works in this dissertation investigate aspects of GitHub repositories beyond the code, and identify specific potential improvements applicable to wide range of GitHub repositories. The first work aims to provide a systematic understanding of the content of README files in GitHub software projects, and to develop a tool that can process them automatically. It begins with a study on a dataset of README file sections, followed by a development and evaluation of a multi-label classifier that can predict eight different README content categories.
The second work analyzes vulnerabilities in open-source libraries used by software projects on GitHub. This study is done on commits of 450 software projects written in popular languages (Java, Python, and Ruby), and identifies characteristics of dependency vulnerabilities such as common types and persistence. It also investigates relationship between various attributes of software projects and their commits with presence of such vulnerabilities. The findings in this work has a number of implications for library users, library developers, as well as researchers.
Finally, the third work is a multi-region geographical analysis of gender inclusion on GitHub. This work uses a mixed-methods approach involving a quantitative analysis of commit authors of 21,456 project repositories, followed by a strategically-targeted worldwide survey on developers and a qualitative analysis of the responses. Among other aspects, the work investigate differences in diversity levels between regions, how they change over time, and correlation between gender and geographic diversity of a repository's commit authors. Further, analysis of the survey results also enabled identification of barriers and motivations to contribute to open-source software. The results of this work provides insights on current state of gender diversity in open source software and potential ways to improve participation of developers from under-represented regions and gender. This can subsequently improve the open-source software community in general.
|
Speaker Biography
Gede Artha Azriadi Prana is a PhD student in the School of Computing and Information Systems, Singapore Management University, under supervision of Professor David Lo. His research focuses on software engineering analytics. He received his Bachelor of Engineering degree in Computer Engineering from Nanyang Technological University and his Master of Technology degree in Knowledge Engineering from National University of Singapore. Prior to enrolling in SMU, he worked for about a decade in software developer, quality assurance, and data analyst roles in several industries.
|