计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 140-148.doi: 10.11896/jsjkx.201100209

• 计算机软件 • 上一篇    下一篇

面向缺陷定位的代码搜索引擎

常建明1,2,3, 薄莉莉1,2,4, 孙小兵1,2,4   

  1. 1 扬州大学信息工程学院 江苏 扬州225127
    2 江苏省知识管理与智能服务工程研究中心 江苏 扬州225127
    3 东南大学软件学院 南京211189
    4 计算机软件新技术国家重点实验室(南京大学) 南京210023
  • 收稿日期:2020-11-29 修回日期:2021-04-09 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 薄莉莉(lilibo@yzu.edu.cn)
  • 作者简介:carrycrebrith@163.com
  • 基金资助:
    国家自然科学基金(61872312,61972335,62002309);南京大学计算机软件新技术国家重点实验室资助项目(KFKT2020B15,KFKT2020B16);扬州市校合作项目(YZU201803);扬州大学“高端人才支持计划”(2019);江苏省“六大人才高峰”高层次人才项目(RJFW-053);江苏省“333”工程;扬州大学畜牧学学科特区学科交叉课题支持项目(yzuxk202015)

Code Search Engine for Bug Localization

CHANG Jian-ming1,2,3, BO Li-li1,2,4, SUN Xiao-bing1,2,4   

  1. 1 School of Information Engineering,Yangzhou University,Yangzhou,Jiangsu 225127,China
    2 Jiangsu Engineering Research Center Knowledge Management and Intelligent Service,Yangzhou,Jiangsu 225127,China
    3 School of Software,Southeast University,Nanjing 211189,China
    4 State Key Lab. for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2020-11-29 Revised:2021-04-09 Online:2021-12-15 Published:2021-11-26
  • About author:CHANG Jian-ming,born in 1998,postgraduate.His main research interests include bug localization and so on.
    BO Li-li,born in 1989,Ph.D,lecturer.Her main research interests include software testing,software security,etc.
  • Supported by:
    National Natural Science Foundation of China(61872312,61972335,62002309),Open Funds of State Key Laboratory for Novel Software Technology of Nanjing University(KFKT2020B15,KFKT2020B16),Yangzhou City-Yangzhou University Science and Technology Cooperation Fund Project(YZU201803),Yangzhou University Top-level Talents Support Program(2019),Six Talent Peaks Project in Jiangsu Province(RJFW-053) and Jiangsu “333” Project and Yangzhou University Cross-disciplinary Project of Animal Science(yzuxk202015).

摘要: 随着软件项目规模的扩大以及软件复杂性的增加,缺陷修复的难度越来越大。其中,绝大多数缺陷问题都是由代码的错误编写导致的,在软件缺陷修复过程中开发维护人员需要花费大量的时间定位并修改缺陷代码。针对这个问题,对缺陷报告以及对应的项目变更信息进行整合,根据代码抽象语法树结构信息计算代码块与缺陷报告之间的关系,从而构建缺陷-代码知识库。在此知识库的基础上构建面向缺陷定位的代码搜索引擎,以向用户推荐更全面的缺陷定位信息,包括相似缺陷报告、相关缺陷代码文件以及缺陷代码块,从而帮助开发和维护人员及时有效地定位缺陷。实验结果说明,相比现有的缺陷定位方法,所提方法能够更准确地定位缺陷代码文件,并且可有效定位到代码粒度。

关键词: 缺陷定位, 缺陷报告, 抽象语法树, 代码搜索

Abstract: With the evolution and the increased complexity of software project,bug fixing is getting more difficult.During the bug fixing,developers need to spend a lot of time on bug localization and fixing.To evaluate this problem,this paper builds a bug-code database by integrating the bug reports and the corresponding evolution information,and analyzing the relationship between the code block and the bug report.Then,a code search engine is constructed for bug localization based on the bug-code database,which is used for recommending more comprehensive information about similar bug reports,bug related code files,and code blocks.The experiment results show that the proposed approach is more accurate to localize buggy files,and the localization can effectively reach code level.

Key words: Bug localization, Bug report, Abstract syntax tree, Code search

中图分类号: 

  • TP311
[1]ZHAO Y,LEUNG H,YANG Y,et al.Towards an understan- ding of change types in bug fixing code[J].Information & Software Technology,2017,86:37-53.
[2]LATOZA T D,MYERS B A.Hard-to-answer questions about code[C]//Evaluation and Usability of Programming Languages and Tools on-PLATEAU'10.Reno,Nevada:ACM Press,2010:1-6.
[3]PRESSMAN R S.Software Engineering:A Pratitoner's Ap- proach [M].New York:McGraw-Hill,2010:46-48.
[4]LI B,HE Y P,MA H T.Automatic program repair:Key pro- blems and technologies[J].Ruan Jian Xue Bao/Journal of Software,2019,30(2):244-265.
[5]YOUM K C,AHN J,KIM J,et al.Bug localization based on code change histories and bug reports[C]//Asia-Pacific Software Engineering Conference.2015:190-197.
[6]LE T D B,OENTARYO R J,LO D.Information retrieval and spectrum based bug localization:Better together[C]//FSE.ACM,2015:579-590.
[7]GUO Z Q,ZHOU H C,LIU S R,et al.Information retrieval based bug localization:research problem,progress,and challenges[J].Journal of Software,2020,31(9):2826-2854.
[8]WANG S,LO D.Version history,similar report,and structure:Putting them together for improved bug localization[C]//IEEE International Conference on Program Comprehension.ACM,2014:53-63.
[9]NAISH L,HUA J L,RAMAMOHANARAO K.A model for spectra-based software diagnosis [M]//ACM Trans.Software Engineering & Methodology.2011,20(3):1-32.
[10]WANG X,ZHANG W,WANG Q.Two-phase bug localization method based on defect repair history[J].Computer Systems & Applications,2014,23(11):99-104.
[11]LOYOLA P,GAJANANAN K,SATOH F.Bug Localization by Learning to Rank and Represent Bug Inducing Changes[C]//ACM Press the 27th ACM International Conference on Information and Knowledge Management.2018:657-665.
[12]ZHANG W,LI Z Q,DU Y H,et al.Fine-grained software bug location approach at method level[J].Ruan Jian Xue Bao/Journal of Software,2019,30(2):195-210.
[13]TANG M,ZHU L,ZOU X C.Document vector representation based on Word2Vec[J].Computer Science,2016,43(6):214-217.
[14]HOANG T,OENTARYO R J,LE T D B,et al.Network-clustered multi-modal bug localization[J].IEEE Transactions on Software Engineering,2019,45(10):1002-1023.
[15]CHEN X,JU X L,WEN W Z,et al.Review of dynamic fault localization approaches based on program spectrum[J].Ruan Jian Xue Bao/Journal of Software,2015,26(2):390-412.
[16]JAYDEN Z,GIULIA B,ADELE G,et al.How abstract is syntax?Evidence from structural priming[J].Cognition,2019,193(12):1-13.
[17]JEAN R F,MORANDAT F,BlANC X,et al.Fine-grained and Accurate Source Code Differencing[C]//ACM/IEEE International Conference on Automated Software Engineering.2014:313-324.
[18]SOUMYA S,SANDIPAN S,SANJUKTA B,et al.Using core-periphery structure to predict high centrality nodes in time-varying networks[J].Data Mining and Knowledge Discovery,2018,32(7):1368-1396.
[19]BIRD S.NLTK:the natural language toolkit[C]//Proceedings of the COLING/ACL on Interactive presentation sessions.Association for Computational Linguistics,2006:69-72.
[20]ZHU Z W,ZHOU X H,SHAO K.A novel approach based on Neo4j for multi-constrained flexible job shop scheduling problem[J].Computers & Industrial Engineering,2019,130(4):671-686.
[21]SALTON G,WONG A,YANG C S.A vector space model for automatic indexing[J].Communications of the ACM,1975,11:613-620.
[22]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//IEEE International Conference on Software Engineering.2012:14-24.
[23]SUN X B,WANG L,WANG J W,et al.Construct Knowledge Graph for Exploratory Bug Issue Searching[J].Acta Electronica Sinica,2018,46(7):1578-1583.
[24]YE X,BUNESCU R,LIU C.Mapping Bug Reports to Relevant Files:A Ranking Model,a Fine-Grained Benchmark,and Feature Evaluation[J].IEEE Transactions on Software Engineering,2016,42(4):379-402.
[25]YOUM K C,AHN J,LEE E.Improved bug localization based on code change histories and bug reports[J].Information and Software Technology,2017,82(2):177-192.
[26]WANG Y,YAO Y,TONG H,et al.Bug Localization via Supervised Topic Modeling[C]//International Conference on Data Mining(ICDM).IEEE,2018:607-616.
[27]HORVAT H,FEREN C,GERGEL Y,et al.Code coverage differences of Java bytecode and source code instrumentation tools[J].Software Quality Journal,2017,27(12):79-123.
[28]MEHRDAD G,SEYED M B.Runtime deadlock tracking and prevention of concurrent multithreaded programs:A learning-based approach[J].Concurrency and Computation:Practice and Experience,2020,32(10):1-21.
[29]DING H,CHEN L,QIAN J,et al.Fault localization method using information quantity[J].Ruan Jian Xue Bao/Journal of Software,2013,24(7):1484-1494.
[30]SHU T,HUANG M X,DING Z H,et al.Fault localization method based on conditional probability model[J].Ruan Jian Xue Bao/Journal of Software,2018,29(6):1756-1769.
[1] 余笙, 李斌, 孙小兵, 薄莉莉, 周澄. 知识驱动的相似缺陷报告推荐方法[J]. 计算机科学, 2021, 48(5): 91-98.
[2] 冉丹, 陈哲, 孙毅, 杨志斌. 基于程序转化的SCADE模型检测[J]. 计算机科学, 2021, 48(12): 125-130.
[3] 韩磊, 胡建鹏. 基于关键词Trie树的GCC抽象语法树消除冗余算法[J]. 计算机科学, 2020, 47(9): 47-51.
[4] 周凯, 任怡, 汪哲, 管剑波, 张芳, 赵言亢. 基于主题模型的Ubuntu操作系统缺陷报告的分类及分析[J]. 计算机科学, 2020, 47(12): 35-41.
[5] 丁嵘, 于千惠. 基于AADL的自主无人系统可成长框架[J]. 计算机科学, 2020, 47(12): 87-92.
[6] 范道远, 孙吉红, 王炜, 涂吉屏, 何欣. 融合文本与分类信息的重复缺陷报告检测方法[J]. 计算机科学, 2019, 46(12): 192-200.
[7] 周明泉, 江国华. 一种基于频谱信息并结合碰集和遗传算法的缺陷定位方法[J]. 计算机科学, 2018, 45(9): 207-212.
[8] 聂黎明,江贺,高国军,王涵,徐秀娟. 代码搜索与API推荐文献分析[J]. 计算机科学, 2017, 44(Z6): 475-482.
[9] 陈诚,郑征,王皓钦,乔禹. 基于测试充分性准则的非死锁并发缺陷定位方法[J]. 计算机科学, 2017, 44(11): 195-201.
[10] 熊文军,张璇,王旭,李彤,尹春林. 面向Issue跟踪系统的变更请求报告关闭可能性预测[J]. 计算机科学, 2017, 44(11): 146-155.
[11] 林涛,高建华,伏雪,马燕,林艳. 面向软件缺陷报告的提取方法[J]. 计算机科学, 2016, 43(6): 179-183.
[12] 史高翔,赵逢禹. 基于缺陷相似度与再分配图的软件缺陷分配方法[J]. 计算机科学, 2016, 43(11): 246-251.
[13] 李晓晨,江贺,任志磊. 面向软件仓库挖掘的数据驱动特征提取方法[J]. 计算机科学, 2015, 42(9): 159-164.
[14] 李昂,毛晓光,雷晏. 面向自动修复并融合失效场景的缺陷定位方法[J]. 计算机科学, 2015, 42(12): 102-104.
[15] 纪涛,齐玉华,毛晓光. 基于软件自动修复评估缺陷定位技术的工具:GenProg-FL[J]. 计算机科学, 2014, 41(9): 88-90.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周雅兰,黄韬. 和声搜索算法改进与应用[J]. 计算机科学, 2014, 41(Z6): 52 -56 .
[2] 陈松立,杨春晖,戴青云,刘奕宏. 基于JASO的无线传感器网络接口设计[J]. 计算机科学, 2015, 42(Z6): 260 -262 .
[3] 钟夫,郭建胜,张斯嘉,王族统. 基于优化支持向量机的供应链竞争力评价方法[J]. 计算机科学, 2015, 42(Z11): 27 -31 .
[4] 王红霞,曹波. 基于遗传编程的中国股票市场有效性检验[J]. 计算机科学, 2016, 43(Z6): 538 -541 .
[5] 杨畅,李华. 基于个性化情境和项目类别的资源推荐研究[J]. 计算机科学, 2011, 38(Z10): 175 -177 .
[6] 钱小妹,严刚. 基于Bayes定理和mGA的结构损伤识别方法研究[J]. 计算机科学, 2011, 38(Z10): 408 -411 .
[7] 苗德成,奚建清. 一种时态数据形式语言模型[J]. 计算机科学, 2012, 39(4): 172 -176 .
[8] 伍晓亮,田怀文. 一种基于正等轴测草图的三维重构算法[J]. 计算机科学, 2013, 40(9): 275 -278 .
[9] 班晓娟,陈希,宁淑荣. 具有自身平衡系统的虚拟生物在三维空间内捕食与逃逸的关键技术研究[J]. 计算机科学, 2009, 36(9): 234 -237 .
[10] 黄美蓉, 欧博, 何思源. 一种基于特征提取的访问控制方法[J]. 计算机科学, 2019, 46(2): 109 -114 .