Background - Xu et al. used a deep neural network (DNN) technique to classify the degree of relatedness between two knowledge units (question-answer threads) on Stack Overflow. More recently, extending Xu et al.’s work, Fu and Menzies proposed a simpler classification technique based on a fine-tuned support vector machine (SVM) that achieves similar performance but in a much shorter time. Thus, they suggested that researchers need to compare their sophisticated methods against simpler alternatives.
Aim - The aim of this work is to replicate the previous studies and further investigate the validity of Fu and Menzies’ claim by evaluating the DNN- and SVM-based approaches on a larger dataset. We also compare the effectiveness of these two approaches against SimBow, a lightweight SVM-based method that was previously used for general community question-answering.
Method - We (1) collect a large dataset containing knowledge units from Stack Overflow, (2) show the value of the new dataset addressing shortcomings of the original one, (3) re-evaluate both the DNNand SVM-based approaches on the new dataset, and (4) compare the performance of the two approaches against that of SimBow.
Results - We find that: (1) there are several limitations in the original dataset used in the previous studies, (2) effectiveness of both Xu et al.’s and Fu and Menzies’ approaches (as measured using F1-score) drop sharply on the new dataset, (3) similar to the previous finding, performance of SVM-based approaches (Fu and Menzies’ approach and SimBow) are slightly better than the DNN-based approach, (4) contrary to the previous findings, Fu and Menzies’ approach runs much slower than DNN-based approach on the larger dataset – its runtime grows sharply with increase in dataset size, and (5) SimBow outperforms both Xu et al. and Fu and Menzies’ approaches in terms of runtime.
This a pre-conference talk for 12th International Symposium on Empirical Software Engineering and Measurement.
Bowen XU is a PhD student at School of Information Systems, Singapore Management University advised by Associate Professor David Lo. He received his M.Eng. in College of Software Technology, Zhejiang University in 2017. His research interests are in software engineering area, especially, mining software repository and automated program repair.