基于推荐列表的缺陷文件识别

doi:10.11896/jsjkx.230600088

计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600088-8.doi: 10.11896/jsjkx.230600088

• 计算机软件&体系架构 • 上一篇下一篇

基于推荐列表的缺陷文件识别

王昭丹, 邹卫琴, 刘文杰

南京航空航天大学计算机科学与技术学院南京 211106

发布日期:2024-06-06
通讯作者: 邹卫琴(weiqin@nuaa.edu.cn)
作者简介:(wangzhaodan@nuaa.edu.cn)
基金资助:
国家自然科学基金(62002161);南京航空航天大学前瞻布局科研专项资金;南京航空航天大学人才科研启动基金

Buggy File Identification Based on Recommendation Lists

WANG Zhaodan, ZOU Weiqin, LIU Wenjie

College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China

Published:2024-06-06
About author:WANG Zhaodan,born in 1999,postgraduate.Her main research interests include software static bug localization.
ZOU Weiqin,born in 1988,Ph.D,professor,is a member of CCF(No.D3300M).Her main research interests include bug localization and software repository mining.
Supported by:
National Natural Science Foundation of China(62002161),Fund of Prospective Layout of Scientific Research for NUAA(Nanjing University of Aeronautics and Astronautics) and Scientific Research Foundation for the Introduction of Talent for NUAA.

摘要/Abstract

摘要： 缺陷定位是缺陷修复的关键步骤,同时也是一项繁琐的软件活动。现有的静态缺陷定位技术通常将缺陷定位视为一个检索任务,即为每个缺陷报告生成一份按照程序实体与缺陷相关度降序排列的可疑文件推荐列表。然而,开发人员仍需人工一一审查从而找到真正有缺陷的文件,这增加了定位的时间和成本。为解决这个问题,提出了一个相应的解决方案。首先运行主流的基于信息检索的静态缺陷定位技术来获得一个初始的可疑文件推荐列表;然后依据问题特性提出3类领域特征,并基于这3类特征构建一个机器学习模型,尝试从列表中识别出真正有缺陷(Truly Buggy)的源代码文件。在4个开源项目(Zoo-Keeper,OpenJPA,Tomcat,AspectJ)的2558个bug上进行了实验,结果表明,在最初可疑文件推荐列表上可以获得72.6％～80.7%的真正有缺陷的文件预测准确率。同时探究了3类特征子集及各个特征在预测真正有缺陷的文件上的重要性,发现缺陷报告与源代码的关系特征更重要。

关键词: 缺陷报告, 缺陷定位, 机器学习, 信息检索, 缺陷文件

Abstract: Bug localization is a key step for bug fixing but also a tedious software activity.Existing static defect location techniques typically treat defect location as a search task,generating a list of recommended documents for each defect report in descending order of program entity relevance to the defect.However,developers still need to manually review each file to find the ones that are actually defective,which increases the time and cost of locating them.To solve this problem,this paper proposes a solution.Firstly,running state-of-the-art information-retrieval-based(IR-based) bug localization techniques to obtain an initial buggy files recommendation list.Then,three domain characteristics are proposed according to the characteristics of the problem,and a machine learning model is built based on these three characteristics,trying to identify the truly buggy files from the list.Preliminary experiments verify that the proposed approach is reasonable and actionable in practice.Experiments are carried out on four open source projects with 2558 bugs(ZooKeeper,OpenJPA,Tomcat,AspectJ) and the results show that it could obtain 72.6%~80.7% prediction accuracy initially recommending the buggy code files in the list.At the same time,we explore the three feature subsets and the importance of each feature in predicting the truly buggy files,and find that the feature of the relationship between the bug report and the source code is more important.

Key words: Bug Report, Bug localization, Machine learning, Information retrieval, Buggy files

中图分类号:

TP311

王昭丹, 邹卫琴, 刘文杰. 基于推荐列表的缺陷文件识别[J]. 计算机科学, 2024, 51(6A): 230600088-8. https://doi.org/10.11896/jsjkx.230600088

WANG Zhaodan, ZOU Weiqin, LIU Wenjie. Buggy File Identification Based on Recommendation Lists[J]. Computer Science, 2024, 51(6A): 230600088-8. https://doi.org/10.11896/jsjkx.230600088

参考文献

[1]ZOU W,LO D,CHEN Z,et al.How practitioners perceive auto-mated bug report management techniques[J].IEEE Transactions on Software Engineering,2018,46(8):836-862.
[2]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//International Conference on Software Engineering.IEEE,2012:14-24.
[3]RAHMAN M M,ROY C K.Improving ir-based bug localization with context-aware query reformulation[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Enginee-ring Conference and Symposium on the Foundations of Software Engineering.2018:621-632.
[4]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.2014:689-699.
[5]XUAN J,MONPERRUS M.Learning to combine multiple ran-king metrics for fault localization[C]//International Conference on Software Maintenance and Evolution.IEEE,2014:191-200.
[6]ZHOU Z H.Ensemble methods:foundations and algorithms[M].CRC Press,2012.
[7]ZIMMERMANN T,PREMRAJ R,BETTENBURG N,et al.What makes a good bug report?[J].IEEE Transactions on Software Engineering,2010,36(5):618-643.
[8]OSTRAND T J,WEYUKER E J,BELL R M.Programmer-based fault prediction[C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering.2010:1-10.
[9]POSNETT D,D’SOUZA R,DEVANBU P,et al.Dual ecological measures of focus in software development[C]//InternationalConference on Software Engineering.IEEE,2013:452-461.
[10]DI NUCCI D,PALOMBA F,DE ROSA G,et al.A developercentered bug prediction model[J].IEEE Transactions on Software Engineering,2017,44(1):5-24.
[11]JARMAN D,BERRY J,SMITH R,et al.Legion:Massivelycomposing rankers for improved bug localization at adobe[J].IEEE Transactions on Software Engineering,2021,48(8):3010-3024.
[12]CHIDAMBER S R,KEMERER C F.A metrics suite for object oriented design[J].IEEE Transactions on Software Enginee-ring,1994,20(6):476-493.
[13]BUSE R P L,WEIMER W R.Learning a metric for code reada-bility[J].IEEE Transactions on Software Engineering,2009,36(4):546-558.
[14]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
[15]BAO L,XING Z,XIA X,et al.Who will leave the company?A large-scale industry study of developer turnover by mining monthly work report[C]//2017 IEEE/ACM 14th International Conference on Mining Software Repositories.IEEE,2017:170-181.
[16]TIAN Y,NAGAPPAN M,LO D,et al.What are the characte-ristics of high-rated apps?A case study on free android applications[C]//International conference on software maintenance and evolution.IEEE,2015:301-310.
[17]CHAKKRIT T.The Scott-Knott Effect Size Difference(ESD) Test[EB/OL].(2018-05-08).https://cran.r-project.org/web/packages/ScottKnottESD/ScottKnottESD.pdf.
[18]WOLPERT D H,MACREADY W G.An efficient method to estimate bagging’s generalization error[J].Machine Learning,1999,35:41-55.
[19]ABDI H.Bonferroni and Šidák corrections for multiple comparisons[J].Encyclopedia of Measurement and Statistics,2007,3(1):2007.
[20]SALTON G,MCGILL M.Introduction to modern informationretrieval[M].McGraw-Hill,1983.
[21]GAY G,HAIDUC S,MARCUS A,et al.On the use of relevance feedback in IR-based concept location[C]//International Conference on Software Maintenance.IEEE,2009:351-360.
[22]WONG C P,XIONG Y,ZHANG H,et al.Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis[C]//International Conference on Software Maintenance and Evolution.IEEE,2014:181-190.
[23]RAHMAN S,GANGULY K K,SAKIB K.An improved buglocalization using structured information retrieval and version history[C]//International Conference on Computer and Information Technology.IEEE,2015:190-195.
[24]YOUM K C,AHN J,LEE E.Improved bug localization based on code change histories and bug reports[J].Information and Software Technology,2017,82:177-192.
[25]DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[26]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(Jan):993-1022.
[27]MORENO L,TREADWAY J J,MARCUS A,et al.On the use of stack traces to improve text retrieval-based bug localization[C]//International Conference on Software Maintenance and Evolution.IEEE,2014:151-160.
[28]WANG S,LO D.Amalgam+:Composing rich informationsources for accurate bug localization[J].Journal of Software:Evolution and Process,2016,28(10):921-942.
[29]SISMAN B,KAK A C.Assisting code search with automatic query reformulation for bug localization[C]//2013 10th Wor-king Conference on Mining Software Repositories.IEEE,2013:309-318.
[30]RAHMAN M M,ROY C.Poster:improving bug localizationwith report quality dynamics and query reformulation[C]//International Conference on Software Engineering:Companion.IEEE,2018:348-349.
[31]KIM M,LEE E.A novel approach to automatic query reformulation for ir-based bug localization[C]//Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing.2019:1752-1759.
[32]LAM A N,NGUYEN A T,NGUYEN H A,et al.Bug localization with combination of deep learning and information retrieval[C]//International Conference on Program Comprehension.IEEE,2017:218-229.
[33]XIAO Y,KEUNG J,BENNIN K E,et al.Improving bug localization with word embedding and enhanced convolutional neural networks[J].Information and Software Technology,2019,105:17-29.
[34]CAO J,YANG S,JIANG W,et al.Bugpecker:Locating faulty methods with deep learning on revision graphs[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.2020:1214-1218.
[35]HUO X,THUNG F,LI M,et al.Deep transfer bug localization[J].IEEE Transactions on Software Engineering,2019,47(7):1368-1380.
[36]MENG X,WANG X,ZHANG H,et al.Improving fault localization and program repair with deep semantic features and transferred knowledge[C]//Proceedings of the 44th International Conference on Software Engineering.2022:1169-1180.
[37]LIANG H,HANG D,LI X.Modeling function-level interactions for file-level bug localization[J].Empirical Software Enginee-ring,2022,27(7):1051-1076.
[38]YOUSOFVAND L,SOLEIMANI S,RAFE V.Automatic bug localization using a combination of deep learning and model transformation through node classification[J].Software Quality Journal,2023,31(4):1045-1063.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于推荐列表的缺陷文件识别

Buggy File Identification Based on Recommendation Lists

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0