Talk #1: Revisiting Assert Use in GitHub Projects
Assertions are often used to test the assumptions that developers have about a program. An assertion contains a boolean expression which developers believe to be true at a particular program point. It throws an error if the expression is not satisfied, which helps developers to detect and correct bugs. Since assertions make developer assumptions explicit, assertions are also believed to improve understandability of code. Recently, Casalnuovo et al. analyse C and C++ programs to understand the relationship between assertion usage and defect occurrence. Their results show that asserts have a small effect on reducing the density of bugs and developers often add asserts to methods they have prior knowledge of and larger ownership. In this study, we perform a partial replication of the above study on a large dataset of Java projects from GitHub (185 projects, 20 million LOC, 4 million commits, 0.2 million files and 1 million methods). We collect metrics such as number of asserts, number of defects, number of developers and number of lines changed to a method, and examine the relationship between asserts and defect occurrence. We also analyse relationship between developer experience and ownership and the number of asserts. Furthermore, we perform a study of what are different types of asserts added and why they are added by developers. We find that asserts have a small yet significant relationship with defect occurrence and developers who have added asserts to methods often have higher ownership of and experience with the methods than developers who did not add asserts.
Talk #2: An Exploratory Study of Functionality and Learning Resources of WebAPIs on ProgrammableWeb
Web APIs provide various functionalities that can be leveraged by developers in building their applications. ProgrammableWeb, which is the largest and most active web API and mashup collection, provides a record of thousands of web APIs and mashups. However, important properties about these large number of web APIs, such as their functionality and support/resources for learning, have never been studied by the existing research work.
In this study, we perform an exploratory analysis on functionality and learning resources of 9,883 web APIs and 4,315 mashups listed on ProgrammableWeb, and find that: (1) web APIs provide a wide range of functionalities related to business solution, text analysis, data source, etc.; many of them are substitutable; only a minority have been used with other APIs; (2) a majority of web APIs on ProgrammableWeb have provided resources to support developers in learning how to use the APIs.
Talk #3: Cataloging GitHub Repositories
GitHub is one of the largest and most popular repository hosting service today, having about 14 million users and more than 54 million repositories as of March 2017. This makes it an excellent platform to find projects that developers are interested in exploring. GitHub showcases its most popular projects by cataloging them manually into categories such as DevOps tools, web application frameworks, and game engines. We propose that such cataloging should not be limited only to popular projects. We explore the possibility of developing such cataloging system by automatically extracting functionality descriptive text segments from readme files of GitHub repositories. These descriptions are then input to LDA-GA, a state-of-the-art topic modeling algorithm, to identify categories. Our preliminary experiments demonstrate that additional meaningful categories which complement existing GitHub categories can be inferred. Moreover, for inferred categories that match GitHub categories, our approach can identify additional projects belonging to them. Our experimental results establish a promising direction in realizing automatic cataloging system for GitHub.
These are pre-conference talks for 21st International Conference on Evaluation and Assessment in Software Engineering (EASE 2017).
About the Speaker
Pavneet is a PhD candidate in School of Information Systems at Singapore Management University working with Associate Professor David Lo and Assistant Professor Lingxiao Jiang. In 2015, he was an intern at Microsoft Research and prior to that he completed an exchange programme at Carnegie Mellon University. His research interests involve data analytics for software engineering particularly focusing on software metrics, software testing, bug localization and reliability.