基于推荐列表的缺陷文件识别

doi:10.11896/jsjkx.230600088

Abstract

Abstract: Bug localization is a key step for bug fixing but also a tedious software activity.Existing static defect location techniques typically treat defect location as a search task,generating a list of recommended documents for each defect report in descending order of program entity relevance to the defect.However,developers still need to manually review each file to find the ones that are actually defective,which increases the time and cost of locating them.To solve this problem,this paper proposes a solution.Firstly,running state-of-the-art information-retrieval-based(IR-based) bug localization techniques to obtain an initial buggy files recommendation list.Then,three domain characteristics are proposed according to the characteristics of the problem,and a machine learning model is built based on these three characteristics,trying to identify the truly buggy files from the list.Preliminary experiments verify that the proposed approach is reasonable and actionable in practice.Experiments are carried out on four open source projects with 2558 bugs(ZooKeeper,OpenJPA,Tomcat,AspectJ) and the results show that it could obtain 72.6%~80.7% prediction accuracy initially recommending the buggy code files in the list.At the same time,we explore the three feature subsets and the importance of each feature in predicting the truly buggy files,and find that the feature of the relationship between the bug report and the source code is more important.

Key words: Bug Report, Bug localization, Machine learning, Information retrieval, Buggy files

CLC Number:

TP311

WANG Zhaodan, ZOU Weiqin, LIU Wenjie. Buggy File Identification Based on Recommendation Lists[J].Computer Science, 2024, 51(6A): 230600088-8.

References

[1]ZOU W,LO D,CHEN Z,et al.How practitioners perceive auto-mated bug report management techniques[J].IEEE Transactions on Software Engineering,2018,46(8):836-862.
[2]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//International Conference on Software Engineering.IEEE,2012:14-24.
[3]RAHMAN M M,ROY C K.Improving ir-based bug localization with context-aware query reformulation[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Enginee-ring Conference and Symposium on the Foundations of Software Engineering.2018:621-632.
[4]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.2014:689-699.
[5]XUAN J,MONPERRUS M.Learning to combine multiple ran-king metrics for fault localization[C]//International Conference on Software Maintenance and Evolution.IEEE,2014:191-200.
[6]ZHOU Z H.Ensemble methods:foundations and algorithms[M].CRC Press,2012.
[7]ZIMMERMANN T,PREMRAJ R,BETTENBURG N,et al.What makes a good bug report?[J].IEEE Transactions on Software Engineering,2010,36(5):618-643.
[8]OSTRAND T J,WEYUKER E J,BELL R M.Programmer-based fault prediction[C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering.2010:1-10.
[9]POSNETT D,D’SOUZA R,DEVANBU P,et al.Dual ecological measures of focus in software development[C]//InternationalConference on Software Engineering.IEEE,2013:452-461.
[10]DI NUCCI D,PALOMBA F,DE ROSA G,et al.A developercentered bug prediction model[J].IEEE Transactions on Software Engineering,2017,44(1):5-24.
[11]JARMAN D,BERRY J,SMITH R,et al.Legion:Massivelycomposing rankers for improved bug localization at adobe[J].IEEE Transactions on Software Engineering,2021,48(8):3010-3024.
[12]CHIDAMBER S R,KEMERER C F.A metrics suite for object oriented design[J].IEEE Transactions on Software Enginee-ring,1994,20(6):476-493.
[13]BUSE R P L,WEIMER W R.Learning a metric for code reada-bility[J].IEEE Transactions on Software Engineering,2009,36(4):546-558.
[14]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
[15]BAO L,XING Z,XIA X,et al.Who will leave the company?A large-scale industry study of developer turnover by mining monthly work report[C]//2017 IEEE/ACM 14th International Conference on Mining Software Repositories.IEEE,2017:170-181.
[16]TIAN Y,NAGAPPAN M,LO D,et al.What are the characte-ristics of high-rated apps?A case study on free android applications[C]//International conference on software maintenance and evolution.IEEE,2015:301-310.
[17]CHAKKRIT T.The Scott-Knott Effect Size Difference(ESD) Test[EB/OL].(2018-05-08).https://cran.r-project.org/web/packages/ScottKnottESD/ScottKnottESD.pdf.
[18]WOLPERT D H,MACREADY W G.An efficient method to estimate bagging’s generalization error[J].Machine Learning,1999,35:41-55.
[19]ABDI H.Bonferroni and Šidák corrections for multiple comparisons[J].Encyclopedia of Measurement and Statistics,2007,3(1):2007.
[20]SALTON G,MCGILL M.Introduction to modern informationretrieval[M].McGraw-Hill,1983.
[21]GAY G,HAIDUC S,MARCUS A,et al.On the use of relevance feedback in IR-based concept location[C]//International Conference on Software Maintenance.IEEE,2009:351-360.
[22]WONG C P,XIONG Y,ZHANG H,et al.Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis[C]//International Conference on Software Maintenance and Evolution.IEEE,2014:181-190.
[23]RAHMAN S,GANGULY K K,SAKIB K.An improved buglocalization using structured information retrieval and version history[C]//International Conference on Computer and Information Technology.IEEE,2015:190-195.
[24]YOUM K C,AHN J,LEE E.Improved bug localization based on code change histories and bug reports[J].Information and Software Technology,2017,82:177-192.
[25]DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[26]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(Jan):993-1022.
[27]MORENO L,TREADWAY J J,MARCUS A,et al.On the use of stack traces to improve text retrieval-based bug localization[C]//International Conference on Software Maintenance and Evolution.IEEE,2014:151-160.
[28]WANG S,LO D.Amalgam+:Composing rich informationsources for accurate bug localization[J].Journal of Software:Evolution and Process,2016,28(10):921-942.
[29]SISMAN B,KAK A C.Assisting code search with automatic query reformulation for bug localization[C]//2013 10th Wor-king Conference on Mining Software Repositories.IEEE,2013:309-318.
[30]RAHMAN M M,ROY C.Poster:improving bug localizationwith report quality dynamics and query reformulation[C]//International Conference on Software Engineering:Companion.IEEE,2018:348-349.
[31]KIM M,LEE E.A novel approach to automatic query reformulation for ir-based bug localization[C]//Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing.2019:1752-1759.
[32]LAM A N,NGUYEN A T,NGUYEN H A,et al.Bug localization with combination of deep learning and information retrieval[C]//International Conference on Program Comprehension.IEEE,2017:218-229.
[33]XIAO Y,KEUNG J,BENNIN K E,et al.Improving bug localization with word embedding and enhanced convolutional neural networks[J].Information and Software Technology,2019,105:17-29.
[34]CAO J,YANG S,JIANG W,et al.Bugpecker:Locating faulty methods with deep learning on revision graphs[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.2020:1214-1218.
[35]HUO X,THUNG F,LI M,et al.Deep transfer bug localization[J].IEEE Transactions on Software Engineering,2019,47(7):1368-1380.
[36]MENG X,WANG X,ZHANG H,et al.Improving fault localization and program repair with deep semantic features and transferred knowledge[C]//Proceedings of the 44th International Conference on Software Engineering.2022:1169-1180.
[37]LIANG H,HANG D,LI X.Modeling function-level interactions for file-level bug localization[J].Empirical Software Enginee-ring,2022,27(7):1051-1076.
[38]YOUSOFVAND L,SOLEIMANI S,RAFE V.Automatic bug localization using a combination of deep learning and model transformation through node classification[J].Software Quality Journal,2023,31(4):1045-1063.

Related Articles 15

[1]	CHEN Bingting, ZOU Weiqin, CAI Biyu, LIU Wenjie. Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge [J]. Computer Science, 2024, 51(6A): 230400068-7.
[2]	LIU Wei, SONG You, ZHUO Peiyan, WU Weiqiang, LIAN Xin. Study on Kcore-GCN Anti-fraud Algorithm Fusing Multi-source Graph Features [J]. Computer Science, 2024, 51(6A): 230600040-7.
[3]	CHEN Xiangxiao, CUI Xin, DU Qin, TANG Haoyao. Study on Optimization of Abnormal Traffic Detection Model Based on Machine Learning [J]. Computer Science, 2024, 51(6A): 230700051-5.
[4]	ZHOU Tianyang, YANG Lei. Study on Client Selection Strategy and Dataset Partition in Federated Learning Basedon Edge TB [J]. Computer Science, 2024, 51(6A): 230800046-6.
[5]	SI Jia, LIANG Jianfeng, XIE Shuo, DENG Yingjun. Research Progress of Anomaly Detection in IaaS Cloud Operation Driven by Deep Learning [J]. Computer Science, 2024, 51(6A): 230400016-8.
[6]	XU Yiran, ZHOU Yu. Prompt Learning Based Parameter-efficient Code Generation [J]. Computer Science, 2024, 51(6): 61-67.
[7]	TIAN Shuaihua, LI Zheng, WU Yonghao, LIU Yong. Identifying Coincidental Correct Test Cases Based on Machine Learning [J]. Computer Science, 2024, 51(6): 68-77.
[8]	LIN Binwei, YU Zhiyong, HUANG Fangwan, GUO Xianwei. Data Completion and Prediction of Street Parking Spaces Based on Transformer [J]. Computer Science, 2024, 51(4): 165-173.
[9]	WANG Degang, SUN Yi, GAO Qi. Active Membership Inference Attack Method Based on Multiple Redundant Neurons [J]. Computer Science, 2024, 51(4): 373-380.
[10]	WANG Xin, HUANG Weikou, SUN Lingyun. Survey of Incentive Mechanism for Cross-silo Federated Learning [J]. Computer Science, 2024, 51(3): 20-29.
[11]	ZHANG Wenqiong, LI Yun. Fairness Metrics of Machine Learning:Review of Status,Challenges and Future Directions [J]. Computer Science, 2024, 51(1): 266-272.
[12]	FU Jianming, JIANG Yuqian, HE Jia, ZHENG Rui, SURI Guga, PENG Guojun. Cryptocurrency Mining Malware Detection Method Based on Sample Embedding [J]. Computer Science, 2024, 51(1): 327-334.
[13]	LI Meng, DAI Haipeng, SUI Yongxi, GU Rong, CHEN Guihai. Survey of Learning-based Filters [J]. Computer Science, 2024, 51(1): 41-49.
[14]	HUANG Shuxin, ZHANG Quanxin, WANG Yajie, ZHANG Yaoyuan, LI Yuanzhang. Research Progress of Backdoor Attacks in Deep Neural Networks [J]. Computer Science, 2023, 50(9): 52-61.
[15]	WANG Yao, LI Yi. Termination Analysis of Single Path Loop Programs Based on Iterative Trajectory Division [J]. Computer Science, 2023, 50(9): 108-116.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Buggy File Identification Based on Recommendation Lists

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0