In this talk, we will present two papers accepted by ICSE 2022, "Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?" and "Active Learning of Discriminative Subgraph Patterns for API Misuse Detection".
In the first paper, we investigate the problem of the high false alarm rate from automatic static analysis tools (ASATs), e.g. Findbugs. Researchers have proposed the use of machine learning to prune false alarms. The state-of-the-art study has identified a set of “Golden Features” based on metrics computed over the characteristics and history of the file, code, and warning. Recent studies show that machine learning using these features achieves almost perfect performance.
We analyze approaches using the “Golden Features” to better understand their strong performance. We found that several studies used an experimental procedure that results in data leakage and data duplication, which are subtle issues with significant implications. Firstly, the ground-truth labels have leaked into features that measure the proportion of actionable warnings in a given context. Secondly, many warnings in the testing dataset appear in the training dataset. Next, we demonstrate limitations in the warning oracle that determines the ground-truth labels, a heuristic comparing warnings in a given revision to a reference revision in the future. We show the choice of reference revision influences the warning distribution. Moreover, the heuristic produces labels that do not agree with human oracles. Hence, the strong performance of these techniques previously seen is overoptimistic of their true performance if adopted in practice. Our results convey several lessons and provide guidelines for evaluating false alarm detectors.
In the second paper, we investigate API misuses. A common cause of bugs is the violation of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common and while there have been techniques proposed to detect them, studies have shown that they fail to detect many misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct usage patterns. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that are uncommon but correct, anomaly detection-based techniques have limited success. We propose ALP, reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. Active learning is incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples.
In our empirical evaluation, ALP outperforms prior approaches on both MUBench and a newly constructed dataset.
This is a pre-conference talk for the 44th International Conference on Software Engineering (ICSE 2022).
KANG Hong Jin is a PhD candidate in School of Computing and Information Systems, Singapore Management University. He is supervised by Prof. David Lo.