Computer Science ›› 2024, Vol. 51 ›› Issue (7): 1-9.doi: 10.11896/jsjkx.230400069

• Computer Software •     Next Articles

Bug Report Reformulation Method Based on Topic Consistency Maintenance and Pseudo-correlation Feedback Library Extension

LIU Wenjie, ZOU Weiqin, CAI Biyu, CHEN Bingting   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2023-04-11 Revised:2023-07-31 Online:2024-07-15 Published:2024-07-10
  • About author:LIU Wenjie,born in 1999,postgraduate.His main research interest is bug localization.
    ZOU Weiqin,born in 1988,Ph.D,associate professor,is a member of CCF(No.D3300M).Her main research interests include bug localization and software repository mining.
  • Supported by:
    National Natural Science Foundation of China(62002161,62272225) and Fund of Prospective Layout of Scientific Research for Nanjing University of Aeronautics and Astronautics.

Abstract: To enhance the speed of locating software bugs for developers,a set of bug location techniques based on text retrieval has been proposed.These techniques aim to automatically recommend potentially suspicious code files associated with bug reports submitted by users.However,due to varying levels of professional expertise among users,the quality of bug reports tends to be inconsistent.As a result,some low-quality bug reports cannot be successfully located.To improve the quality of those bug reports,it is common to refactor the bug reports.Existing mainstream methods for reformulation,which involve query extension and query reduction,often face issues such as inconsistent query topics before and after reformulation or the utilization of poor-quality pseudo-correlation libraries.To address this problem,this paper proposes a bug report reformulation method that focuses on maintaining topic consistency and extending pseudo-correlation feedback libraries.This method consists of two parts:the query reduction stage,which aims to maintain topic consistency through combining a concise problem description with keywords extracted from the text,and the query expansion stage,which involves using various locating tools(Lucene,BugLocator,and Blizzard) to comprehensively obtain a pseudo-correlation feedback library.From this library,additional keywords for query expansion are extracted to address the issue of low reformulation quality caused by the inadequacy of the existing pseudo-correlation feedback library.Ultimately,the outputs of the query reduction and expansion stages are combined to form the reformulated query.Through experiments conducted on six Java projects,it is discovered that for low-quality bug reports that could not be identified among the top 10 recommended files using the existing bug location method,21%~39% of them can be located using the proposed reformulation method,i.e.,Accuracy@10 and MRR@10 is 10%~16%.Compared withexisting reformulation techniques,the Accuracy@10and MRR@10 of the proposed reformulation method can improve by 7%~32% and 2%~13%,respectively.

Key words: Bug localization, Query reformulation, Query reduction, Query expansion, Pseudo-correlation feedback libraries, Quality of bug report

CLC Number: 

  • TP311
[1]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//International Conference on Software Engineering.2012:14-24.
[2]WONG C P,XIONG Y,ZHANG H,et al.Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis[C]//International Conference on Software Maintenance and Evolution.2014:181-190.
[3]WANG S,LO D.Version history,similar report,and structure:Putting them together for improved bug localization[C]//International Conference on Program Comprehension.2014:53-63.
[4]KEVIC K,FRITZ T.Automatic Search Term Identification for Change Tasks[C]//International Conference on Software Engineering.2014:468-471.
[5]RAHMAN M M,ROY C K.STRICT:Information retrievalbased search term identification for concept location[C]//International Conference on Software Analysis.Evolution & Reengineering,2017:79-90.
[6]ROCCHIO J J.The SMART Retrieval System-Experiments in Automatic Document Processing[C]//IEEE Transactions on Professional Communication.1972:17-17.
[7]CARPINETO C,ROMANO G.A Survey of Automatic QueryExpansion in Information Retrieval[J].ACM Computing Surveys,2012,14(1):1:50.
[8]RAHMAN M M,ROY C K.Improved query reformulation for concept location using coderank and document structures[C]//International Conference on Automated Software Engineering.2017:428-439.
[9]CHAPARRO O,FLOREZ J M,MARCUS A.Using bug de-scriptions to reformulate queries duringtext-retrieval-based bug localization[J].Empirical Software Engineering,2019,25(4):2947-3007.
[10]HOWARD M J,GUPTA S,POLLOCK L,et al.Automatically mining software-based,semantically-similar words from comment-code mappings[C]//Working Conference on Mining Software Repositories.2013:377-386.
[11]TIAN Y,LO D,LAWALL J.Automated construction of a soft-ware-specific word similarity database[C]//Software Evolution Week-IEEE Conference on Software Maintenance,Reenginee-ring,and Reverse Engineering.2014:44-53.
[12]CAO K,CHEN C,BALTES S,et al.Automated query reformulation for efficientsearch based on query logs from stack overflow[C]//International Conference on Software Engineering.2021:1273-1285.
[13]LEMOS O A L,PAULA A C,ZANICHELLI F C,et al.Thesaurus-based Automatic Query Expansion for Interface-driven Code Search[C]//Working Conference on Mining Software Repositories.2014:212-221.
[14]HILL E,POLLOCK L,VIJAY-SHANKER K.AutomaticallyCapturing Source Code Context of NL-queries for Software Maintenance and Reuse[C]//International Conference on Software Engineering.2009:232-242.
[15]SISMAN B,KAK A C.Assisting Code Search with Automatic Query Reformulation for Bug Localization[C]//Working Conference on Mining Software Repositories.2013:309-318.
[16]RAHMAN M M,ROY C K.Improving IR-based bug localization with context-aware query reformulation[C]//Joint Meeting on European Software Engineering Conference & Symposium on the Foundations of Software Engineering.2018:621-632.
[17]WONG C P,XIONG Y,ZHANG H,et al.Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis[C]//International Conference on Software Maintenance and Evolution.2014:181-190.
[18]DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Inde-xing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[19]NGUYEN A T,NGUYEN T T,AL-KOFAHI J,et al.A topic-based approach for narrowing the search space of buggy files from a bug report[C]//International Conference on Automated Software Engineering.2011:263-272.
[20]THOMAS S W,NAGAPPAN M,BLOSTEIN D,et al.The impact of classifier configuration and classifier combination on buglocalization[J].IEEE Transactions on Software Engineering,2013,39(10):1427-1443.
[21]LAM A N,NGUYEN A T,NGUYEN H A,et al.Bug localiza-tion with combination of deep learning and information retrieval[C]//International Conference on Program Comprehension.IEEE,2017:218-229.
[22]XIAO Y,KEUNG J.Improving bug localization with character-level convolutional neural network and recurrent neural network[C]//Software Engineering Conference.IEEE,2018:703-704.
[23]QIU F,GAO Z,XIA X,et al.Deep just-in-time defect localization[J].IEEE Transactions on Software Engineering,2021,48(12):5068-5086.
[24]DIT B,GUERROUJ L,POSHYVANYK D,et al.Can betteridentifier splitting techniques help feature location? [C]//International Conference on Program Comprehension.2011:11-20.
[25]MIHALCEA R,TARAU P.Textrank:Bringing order into text [C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[26]BRIN S,PAGE L.The anatomy of a large-scale hypertextualweb search engine [J].Computer Networks and ISDN Systems,1998,30(1/2/3/4/5/6/7):107-117.
[27]BLANCO R,LIOMA C.Graph-based term weighting for information retrieval [J].Information Retrieval,2012,15:54-92.
[28]ZOU D,LIANG J,XIONG Y,et al.An empirical study of fault localization families and their combinations[J].IEEE Transactions on Software Engineering,2019,47(2):332-347.
[29]MORENO L,TREADWAY J J,Marcus A,et al.On the use of stack traces to improve text retrieval-based bug localization [C]//International Conference on Software Maintenance and Evolution.2014:151-160.
[30]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.2014:689-699.
[1] WANG Zhaodan, ZOU Weiqin, LIU Wenjie. Buggy File Identification Based on Recommendation Lists [J]. Computer Science, 2024, 51(6A): 230600088-8.
[2] MENG Yiyue, PENG Rong, LYU Qibiao. Text Material Recommendation Method Combining Label Classification and Semantic QueryExpansion [J]. Computer Science, 2023, 50(1): 76-86.
[3] NI Zhen, LI Bin, SUN Xiao-bing, LI Bi-xin, ZHU Cheng. Research and Progress on Bug Report-oriented Bug Localization Techniques [J]. Computer Science, 2022, 49(11): 8-23.
[4] CHANG Jian-ming, BO Li-li, SUN Xiao-bing. Code Search Engine for Bug Localization [J]. Computer Science, 2021, 48(12): 140-148.
[5] WANG Xu-yang and WEI Xing-xing. Query Expansion Method Based on Ontology and Local Co-occurrence [J]. Computer Science, 2017, 44(1): 214-218.
[6] ZHANG Shu-bo, ZHANG Yin, ZHANG Bin and SUN Da-ming. Combined Query Expansion Method Based on Copulas Framework [J]. Computer Science, 2016, 43(Z6): 485-488.
[7] LI Wei-jiang and WANG Feng. Method of Query Expansion Based on LCA Prune Semantic Tree [J]. Computer Science, 2015, 42(Z6): 479-483.
[8] YAN Rong and GAO Guang-lai. Pseudo Relevance Feedback Based on Maximal Marginal Relevance [J]. Computer Science, 2015, 42(6): 276-278.
[9] LI Wei-yin, SHI Yu-long, CHEN Jie and SHI Chong-yang. Query Expansion Based on Classification Model [J]. Computer Science, 2015, 42(6): 18-22.
[10] LIU Tong and NI Wei-jian. Information Retrieval Model for Domain-specific Structural Documents and its Application in Agricultural Disease Prescription Retrieval [J]. Computer Science, 2015, 42(10): 275-280.
[11] ZHONG Min-juan,WAN Chang-xuan,LIU De-xi,LIAO Shu-mei and JIAO Xian-pei. XML Query Expansion Based on High Quality Expansion Source and Local Word Co-occurrence Model [J]. Computer Science, 2014, 41(4): 200-204.
[12] JIAO Jian and ZHANG Yang-sen. Query Expansion Method Based on Hidden Markov Model [J]. Computer Science, 2014, 41(12): 168-171.
[13] WANG Zhong-min,HUO Yi-wei and DENG Wan-yu. Personalized Query Expansion Based on Environment Information for Mobile Search [J]. Computer Science, 2013, 40(9): 182-184.
[14] . Information Retrieval Model Based on Relative Feedback [J]. Computer Science, 2012, 39(7): 140-143.
[15] WANG Jun-yi,YE Xin-ming. Research of Personalized Methods of Information Retrieval [J]. Computer Science, 2010, 37(6): 211-213.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!