知识驱动的相似缺陷报告推荐方法

doi:10.11896/jsjkx.200600159

计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 91-98.doi: 10.11896/jsjkx.200600159

知识驱动的相似缺陷报告推荐方法

余笙, 李斌, 孙小兵, 薄莉莉, 周澄

扬州大学信息工程学院江苏扬州225127
江苏省知识管理与智能服务工程研究中心江苏扬州225127

收稿日期:2020-06-28 修回日期:2020-08-01 出版日期:2021-05-15 发布日期:2021-05-09
通讯作者: 李斌(lb@yzu.edu.cn)
基金资助:
国家自然科学基金(61972335,61872312);扬州市校合作项目(YZU201803);江苏省六大人才高峰项目(RJFW-053)

Approach for Knowledge-driven Similar Bug Report Recommendation

YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng

School of Information Engineering,Yangzhou University,Yangzhou,Jiangsu 225127,China
Jiangsu Engineering Research Center of Knowledge Management and Intelligent Service,Yangzhou,Jiangsu 225127,China

Received:2020-06-28 Revised:2020-08-01 Online:2021-05-15 Published:2021-05-09
About author:YU Sheng,born in 1997,postgraduate.His main research interests include intelligent analysis of software data.(2822863494@qq.com)
LI Bin,born in 1965,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include software engineering and artificial intelligence.
Supported by:
National Natural Science Foundation of China (61972335,61872312),Yangzhou city-Yangzhou University Science and Technology Cooperation Fund Project (YZU201803) and Six Talent Peaks Project in Jiangsu Province (RJFW-053).

摘要/Abstract

摘要： 软件缺陷在软件开发过程中不可避免,提交的缺陷报告则是分析和修复缺陷的重要信息来源。开发人员常通过借鉴相似的历史缺陷报告和修复信息来辅助对当前新缺陷的分析和修复。文中提出了一种知识驱动的相似缺陷报告推荐方法。该方法首先利用信息检索和Word Embedding技术构建缺陷知识图谱;然后利用TF-IDF和Word Embedding技术计算缺陷报告之间的文本相似度,同时综合考虑缺陷的各项属性,从而得到缺陷报告之间的主次要属性相似度;最后将上述相似度融合成综合相似度,利用综合相似度推荐相似缺陷报告。实验结果表明,与基线方法相比,在Firefox数据集上所提方法的性能平均提高了12.7%。

关键词: 词嵌, 推荐系统, 相似缺陷报告, 信息检索, 知识图谱

Abstract: Software bug is inevitable in the process of software development,and the submitted bug reports are important source of information for bug analysis and fixing.Developers usually refer to similar historical bug reports and fixing solutions to analyze and fix the new bug at hand.This paper proposes an approach for knowledge-driven similar bug report recommendation.Based on the combination of information retrieval and Word Embedding,it constructs a bug knowledge graph.Then,it uses TF-IDF and Word Embedding to calculate the text similarity between bug reports,and generates the similarity of primary and secondary attributes between the bug reports.Finally,the above similarities are merged into a comprehensive similarity,and similar bug reports are recommended based on the comprehensive similarity.Experimental results show that the proposed approach improves the performance by an average of 12.7% on the Firefox dataset compared to the baseline method.

Key words: Information retrieval, Knowledge graph, Recommendation systems, Similar bug report, Word embedding

中图分类号:

TP311

余笙, 李斌, 孙小兵, 薄莉莉, 周澄. 知识驱动的相似缺陷报告推荐方法[J]. 计算机科学, 2021, 48(5): 91-98. https://doi.org/10.11896/jsjkx.200600159

YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng. Approach for Knowledge-driven Similar Bug Report Recommendation[J]. Computer Science, 2021, 48(5): 91-98. https://doi.org/10.11896/jsjkx.200600159

参考文献

[1]FAN T T,XU L,CHEN L.Recommending Similar Bug Reports Based on Multi-Targets Optimization Algorithm NSGA-II[J].Chinese Journal of Computers,2019,42(10):2175-2189.
[2]ANVIK J,HIEW L,MURPHY G C.Coping with an open bug repository[C]//Proceedings of the 2005 OOPSLA.San Diego:ACM,2005:35-39.
[3]MINELLI R,MOCCI A,LANZA M.I know what you did last summer:an investigation of how developers spend their time[C]//Proceedings of ICPC 2015.Florence/Firenze:IEEE,2015:5-35.
[4]ROCHA H,VALENTE M T,MARQUES-NETO H,et al.An empirical study on recommendations of similar bugs[C]//Proceedings of SANER 2016.Osaka:IEEE,2016:46-56.
[5]RUNESON P,ALEXANDERSSON M,NYHOLM O.Detection of duplicate defect reports using natural language processing[C]//Proceedings of ICSE 2007.Minneapolis:IEEE,2007:499-510.
[6]WANG X,ZHANG L,XIE T,et al.An approach to detecting duplicate bug reports using natural language and execution information[C]//Proceedings of ICSE 2008.Leipzig:IEEE,2008:461-470.
[7]SUN C,LO D,WANG X,et al.A discriminative model approach for accurate duplicate bug report retrieval[C]//Proceedings of ICSE 2010.Cape Town:ACM,2010:45-54.
[8]SUN C,LO D,JIANG J,et al.Towards more accurate retrieval of duplicate bug reports[C]//Proceedings of ASE 2011.Lawrence:IEEE,2011:253-262.
[9]TIAN Y,SUN C,LO D.Improved duplicate bug report identification[C]//Proceedings of CSMR 2012.Szeged:IEEE,2012:385-390.
[10]YANG X L,LO D,XIA X,et al.Combining Word Embeddingwith Information Retrieval to Recommend Similar Bug Reports[C]//Proceedings of ISSRE 2016.Ottawa:IEEE,2016:127-137.
[11]BETTENBURG N,PREMRAJ R,ZIMMERMANN T.Dupli-cate bug reports considered harmful.Really?[C]//Proceedings of ICSM 2008.Beijing:IEEE,2008:337-345.
[12]HU D Y,CHEN M,WANG T,et al.Recommending Similar Bug Reports:A Novel Approach Using Document Embedding Model[C]//Proceedings of APSEC 2018.Nara:IEEE,2018:725-726.
[13]DEB K,PRATAP A,AGARWAL S,et al.A fast and elitist multi-objective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197.
[14]HENARD C,PAPADAKIS M,TRAON Y L.Mutation-basedgeneration of software product line test configurations[C]//Proceedings of SSBSE 2014.Bergamo:Springer,2014:92-106.
[15]DONG M H.Similar bug identification method based on bug report and source code[D].Harbin:Harbin Institute of Technology,2018.
[16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].(2013-9-7)[2020-06-27].https://arxiv.org/abs/1301.3781.
[17]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS 2013.Lake Tahoe:NIPS Foundation,2013:3111-3119.
[18]ZIMMERMANN T,ZELLER A,WEISSGERBER P,et al.Minning version histories to guide software changes[J].IEEE Transactions,2005,31(6):429-445.
[19]CHEN C Y,GAO S,XING Z.Mining Analogical Libraries in Q&A Discussions-Incorporating Relational and Categorical Knowledge into Word Embedding[C]//Proceedings of SANER 2016.Suita:IEEE,2016:338-348.
[20]SHIHAB E,IHARA A,KAMEI Y,et al.Studying re-openedbugs in open source software[J].Empirical Software Enginee-ring,2013,18(5):1005-1042.
[21]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of SIGSOFT FSE 2014.Hong Kong:ACM,2014:689-699.
[22]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports[C]//Proceedings of ICSE 2012.Zurich:IEEE,2012:14-24.
[23]YE X,SHEN H,MA X,et al.From word embeddings to document similarities for improved information retrieval in software engineering[C]//Proceedings of ICSE 2016.Austin:IEEE,2016:404-415.
[24]KOCHHAR P S,XIA X,LO D,et al.Practitioners' expectations on automated fault localization[C]//Proceedings of ISSTA 2016.Saarbrücken:ACM,2016:165-176.

相关文章 15

[1]	程章桃, 钟婷, 张晟铭, 周帆. 基于图学习的推荐系统研究综述 Survey of Recommender Systems Based on Graph Learning 计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2]	王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[3]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[5]	吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[6]	孔世明, 冯永, 张嘉云. 融合知识图谱的多层次传承影响力计算与泛化研究 Multi-level Inheritance Influence Calculation and Generalization Based on Knowledge Graph 计算机科学, 2022, 49(9): 221-227. https://doi.org/10.11896/jsjkx.210700144
[7]	秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[8]	方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[9]	帅剑波, 王金策, 黄飞虎, 彭舰. 基于神经架构搜索的点击率预测模型 Click-Through Rate Prediction Model Based on Neural Architecture Search 计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009
[10]	齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[11]	王杰, 李晓楠, 李冠宇. 基于自适应注意力机制的知识图谱补全算法 Adaptive Attention-based Knowledge Graph Completion 计算机科学, 2022, 49(7): 204-211. https://doi.org/10.11896/jsjkx.210400129
[12]	马瑞新, 李泽阳, 陈志奎, 赵亮. 知识图谱推理研究综述 Review of Reasoning on Knowledge Graph 计算机科学, 2022, 49(6A): 74-85. https://doi.org/10.11896/jsjkx.210100122
[13]	邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[14]	杜晓明, 袁清波, 杨帆, 姚奕, 蒋祥. 军事指控保障领域命名实体识别语料库的构建 Construction of Named Entity Recognition Corpus in Field of Military Command and Control Support 计算机科学, 2022, 49(6A): 133-139. https://doi.org/10.11896/jsjkx.210400132
[15]	蔡晓娟, 谭文安. 一种改进的融合相似度和信任度的协同过滤算法 Improved Collaborative Filtering Algorithm Combining Similarity and Trust 计算机科学, 2022, 49(6A): 238-241. https://doi.org/10.11896/jsjkx.210400088

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

知识驱动的相似缺陷报告推荐方法

Approach for Knowledge-driven Similar Bug Report Recommendation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0