计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 1-9.doi: 10.11896/jsjkx.230400069

• 计算机软件 •    下一篇

基于主题一致性保持和伪相关反馈库扩展的缺陷报告重构方法

刘文杰, 邹卫琴, 蔡碧瑜, 陈冰婷   

  1. 南京航空航天大学计算机科学与技术学院 南京 211106
  • 收稿日期:2023-04-11 修回日期:2023-07-31 出版日期:2024-07-15 发布日期:2024-07-10
  • 通讯作者: 邹卫琴(weiqin@nuaa.edu.cn)
  • 作者简介:(wenwenmu@nuaa.edu.cn)
  • 基金资助:
    国家自然科学基金(62002161,62272225);南京航空航天大学前瞻布局科研专项资金

Bug Report Reformulation Method Based on Topic Consistency Maintenance and Pseudo-correlation Feedback Library Extension

LIU Wenjie, ZOU Weiqin, CAI Biyu, CHEN Bingting   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2023-04-11 Revised:2023-07-31 Online:2024-07-15 Published:2024-07-10
  • About author:LIU Wenjie,born in 1999,postgraduate.His main research interest is bug localization.
    ZOU Weiqin,born in 1988,Ph.D,associate professor,is a member of CCF(No.D3300M).Her main research interests include bug localization and software repository mining.
  • Supported by:
    National Natural Science Foundation of China(62002161,62272225) and Fund of Prospective Layout of Scientific Research for Nanjing University of Aeronautics and Astronautics.

摘要: 为了加快开发人员定位软件缺陷,研究人员提出了一系列基于文本检索的缺陷定位技术,自动为用户所提交的缺陷报告推荐可疑的代码文件。由于用户的专业知识不同,编写的缺陷报告质量不一致,因此某些低质量的缺陷报告无法被成功定位。对低质量的缺陷报告进行重构从而改进其定位效果,是常见的解决方案。现有基于查询扩展和查询缩减的主流重构方法,容易出现重构前后查询主题不一致或所依赖伪相关库质量差导致重构质量低的问题。对此,提出了一种基于主题一致性保持和伪相关反馈库扩展的缺陷报告重构方法,由主题一致性保持的查询缩减阶段和伪相关反馈库扩展的查询扩展阶段两部分组成。查询缩减阶段将缺陷报告的概要问题描述和从问题描述文本中提取的关键词合并来解决主题不一致性问题;查询扩展阶段综合使用多种定位工具(即 Lucene,BugLocator 和 Blizzard)来获得伪相关反馈库,并从中提取查询扩展关键词,以解决现有伪相关反馈库质量差导致的重构质量低的问题;最后将查询缩减和扩展阶段的输出合并得到重构后的查询。通过在6个 Java 项目上进行实验发现,对于使用现有缺陷定位方法无法在TOP 10可疑推荐文件中定位的低质量缺陷报告,使用所提重构方法后,能定位其中21%~39%的缺陷即Accuracy@10,MRR@10为 10%~16%。对比现有重构技术,所提重构方法在Accuracy@10和MRR@10 两个指标上分别可以提升7%~32%和2%~13%。

关键词: 缺陷定位, 查询重构, 查询缩减, 查询扩展, 伪相关反馈库, 缺陷报告质量

Abstract: To enhance the speed of locating software bugs for developers,a set of bug location techniques based on text retrieval has been proposed.These techniques aim to automatically recommend potentially suspicious code files associated with bug reports submitted by users.However,due to varying levels of professional expertise among users,the quality of bug reports tends to be inconsistent.As a result,some low-quality bug reports cannot be successfully located.To improve the quality of those bug reports,it is common to refactor the bug reports.Existing mainstream methods for reformulation,which involve query extension and query reduction,often face issues such as inconsistent query topics before and after reformulation or the utilization of poor-quality pseudo-correlation libraries.To address this problem,this paper proposes a bug report reformulation method that focuses on maintaining topic consistency and extending pseudo-correlation feedback libraries.This method consists of two parts:the query reduction stage,which aims to maintain topic consistency through combining a concise problem description with keywords extracted from the text,and the query expansion stage,which involves using various locating tools(Lucene,BugLocator,and Blizzard) to comprehensively obtain a pseudo-correlation feedback library.From this library,additional keywords for query expansion are extracted to address the issue of low reformulation quality caused by the inadequacy of the existing pseudo-correlation feedback library.Ultimately,the outputs of the query reduction and expansion stages are combined to form the reformulated query.Through experiments conducted on six Java projects,it is discovered that for low-quality bug reports that could not be identified among the top 10 recommended files using the existing bug location method,21%~39% of them can be located using the proposed reformulation method,i.e.,Accuracy@10 and MRR@10 is 10%~16%.Compared withexisting reformulation techniques,the Accuracy@10and MRR@10 of the proposed reformulation method can improve by 7%~32% and 2%~13%,respectively.

Key words: Bug localization, Query reformulation, Query reduction, Query expansion, Pseudo-correlation feedback libraries, Quality of bug report

中图分类号: 

  • TP311
[1]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//International Conference on Software Engineering.2012:14-24.
[2]WONG C P,XIONG Y,ZHANG H,et al.Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis[C]//International Conference on Software Maintenance and Evolution.2014:181-190.
[3]WANG S,LO D.Version history,similar report,and structure:Putting them together for improved bug localization[C]//International Conference on Program Comprehension.2014:53-63.
[4]KEVIC K,FRITZ T.Automatic Search Term Identification for Change Tasks[C]//International Conference on Software Engineering.2014:468-471.
[5]RAHMAN M M,ROY C K.STRICT:Information retrievalbased search term identification for concept location[C]//International Conference on Software Analysis.Evolution & Reengineering,2017:79-90.
[6]ROCCHIO J J.The SMART Retrieval System-Experiments in Automatic Document Processing[C]//IEEE Transactions on Professional Communication.1972:17-17.
[7]CARPINETO C,ROMANO G.A Survey of Automatic QueryExpansion in Information Retrieval[J].ACM Computing Surveys,2012,14(1):1:50.
[8]RAHMAN M M,ROY C K.Improved query reformulation for concept location using coderank and document structures[C]//International Conference on Automated Software Engineering.2017:428-439.
[9]CHAPARRO O,FLOREZ J M,MARCUS A.Using bug de-scriptions to reformulate queries duringtext-retrieval-based bug localization[J].Empirical Software Engineering,2019,25(4):2947-3007.
[10]HOWARD M J,GUPTA S,POLLOCK L,et al.Automatically mining software-based,semantically-similar words from comment-code mappings[C]//Working Conference on Mining Software Repositories.2013:377-386.
[11]TIAN Y,LO D,LAWALL J.Automated construction of a soft-ware-specific word similarity database[C]//Software Evolution Week-IEEE Conference on Software Maintenance,Reenginee-ring,and Reverse Engineering.2014:44-53.
[12]CAO K,CHEN C,BALTES S,et al.Automated query reformulation for efficientsearch based on query logs from stack overflow[C]//International Conference on Software Engineering.2021:1273-1285.
[13]LEMOS O A L,PAULA A C,ZANICHELLI F C,et al.Thesaurus-based Automatic Query Expansion for Interface-driven Code Search[C]//Working Conference on Mining Software Repositories.2014:212-221.
[14]HILL E,POLLOCK L,VIJAY-SHANKER K.AutomaticallyCapturing Source Code Context of NL-queries for Software Maintenance and Reuse[C]//International Conference on Software Engineering.2009:232-242.
[15]SISMAN B,KAK A C.Assisting Code Search with Automatic Query Reformulation for Bug Localization[C]//Working Conference on Mining Software Repositories.2013:309-318.
[16]RAHMAN M M,ROY C K.Improving IR-based bug localization with context-aware query reformulation[C]//Joint Meeting on European Software Engineering Conference & Symposium on the Foundations of Software Engineering.2018:621-632.
[17]WONG C P,XIONG Y,ZHANG H,et al.Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis[C]//International Conference on Software Maintenance and Evolution.2014:181-190.
[18]DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Inde-xing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[19]NGUYEN A T,NGUYEN T T,AL-KOFAHI J,et al.A topic-based approach for narrowing the search space of buggy files from a bug report[C]//International Conference on Automated Software Engineering.2011:263-272.
[20]THOMAS S W,NAGAPPAN M,BLOSTEIN D,et al.The impact of classifier configuration and classifier combination on buglocalization[J].IEEE Transactions on Software Engineering,2013,39(10):1427-1443.
[21]LAM A N,NGUYEN A T,NGUYEN H A,et al.Bug localiza-tion with combination of deep learning and information retrieval[C]//International Conference on Program Comprehension.IEEE,2017:218-229.
[22]XIAO Y,KEUNG J.Improving bug localization with character-level convolutional neural network and recurrent neural network[C]//Software Engineering Conference.IEEE,2018:703-704.
[23]QIU F,GAO Z,XIA X,et al.Deep just-in-time defect localization[J].IEEE Transactions on Software Engineering,2021,48(12):5068-5086.
[24]DIT B,GUERROUJ L,POSHYVANYK D,et al.Can betteridentifier splitting techniques help feature location? [C]//International Conference on Program Comprehension.2011:11-20.
[25]MIHALCEA R,TARAU P.Textrank:Bringing order into text [C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[26]BRIN S,PAGE L.The anatomy of a large-scale hypertextualweb search engine [J].Computer Networks and ISDN Systems,1998,30(1/2/3/4/5/6/7):107-117.
[27]BLANCO R,LIOMA C.Graph-based term weighting for information retrieval [J].Information Retrieval,2012,15:54-92.
[28]ZOU D,LIANG J,XIONG Y,et al.An empirical study of fault localization families and their combinations[J].IEEE Transactions on Software Engineering,2019,47(2):332-347.
[29]MORENO L,TREADWAY J J,Marcus A,et al.On the use of stack traces to improve text retrieval-based bug localization [C]//International Conference on Software Maintenance and Evolution.2014:151-160.
[30]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.2014:689-699.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!