计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 91-98.doi: 10.11896/jsjkx.200600159

• 计算机软件* 上一篇    下一篇

知识驱动的相似缺陷报告推荐方法

余笙, 李斌, 孙小兵, 薄莉莉, 周澄   

  1. 扬州大学信息工程学院 江苏 扬州225127
    江苏省知识管理与智能服务工程研究中心 江苏 扬州225127
  • 收稿日期:2020-06-28 修回日期:2020-08-01 出版日期:2021-05-15 发布日期:2021-05-09
  • 通讯作者: 李斌(lb@yzu.edu.cn)
  • 基金资助:
    国家自然科学基金(61972335,61872312);扬州市校合作项目(YZU201803);江苏省六大人才高峰项目(RJFW-053)

Approach for Knowledge-driven Similar Bug Report Recommendation

YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng   

  1. School of Information Engineering,Yangzhou University,Yangzhou,Jiangsu 225127,China
    Jiangsu Engineering Research Center of Knowledge Management and Intelligent Service,Yangzhou,Jiangsu 225127,China
  • Received:2020-06-28 Revised:2020-08-01 Online:2021-05-15 Published:2021-05-09
  • About author:YU Sheng,born in 1997,postgraduate.His main research interests include intelligent analysis of software data.(2822863494@qq.com)
    LI Bin,born in 1965,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include software engineering and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China (61972335,61872312),Yangzhou city-Yangzhou University Science and Technology Cooperation Fund Project (YZU201803) and Six Talent Peaks Project in Jiangsu Province (RJFW-053).

摘要: 软件缺陷在软件开发过程中不可避免,提交的缺陷报告则是分析和修复缺陷的重要信息来源。开发人员常通过借鉴相似的历史缺陷报告和修复信息来辅助对当前新缺陷的分析和修复。文中提出了一种知识驱动的相似缺陷报告推荐方法。该方法首先利用信息检索和Word Embedding技术构建缺陷知识图谱;然后利用TF-IDF和Word Embedding技术计算缺陷报告之间的文本相似度,同时综合考虑缺陷的各项属性,从而得到缺陷报告之间的主次要属性相似度;最后将上述相似度融合成综合相似度,利用综合相似度推荐相似缺陷报告。实验结果表明,与基线方法相比,在Firefox数据集上所提方法的性能平均提高了12.7%。

关键词: 相似缺陷报告, 信息检索, 词嵌, 知识图谱, 推荐系统

Abstract: Software bug is inevitable in the process of software development,and the submitted bug reports are important source of information for bug analysis and fixing.Developers usually refer to similar historical bug reports and fixing solutions to analyze and fix the new bug at hand.This paper proposes an approach for knowledge-driven similar bug report recommendation.Based on the combination of information retrieval and Word Embedding,it constructs a bug knowledge graph.Then,it uses TF-IDF and Word Embedding to calculate the text similarity between bug reports,and generates the similarity of primary and secondary attributes between the bug reports.Finally,the above similarities are merged into a comprehensive similarity,and similar bug reports are recommended based on the comprehensive similarity.Experimental results show that the proposed approach improves the performance by an average of 12.7% on the Firefox dataset compared to the baseline method.

Key words: Similar bug report, Information retrieval, Word embedding, Knowledge graph, Recommendation systems

中图分类号: 

  • TP311
[1]FAN T T,XU L,CHEN L.Recommending Similar Bug Reports Based on Multi-Targets Optimization Algorithm NSGA-II[J].Chinese Journal of Computers,2019,42(10):2175-2189.
[2]ANVIK J,HIEW L,MURPHY G C.Coping with an open bug repository[C]//Proceedings of the 2005 OOPSLA.San Diego:ACM,2005:35-39.
[3]MINELLI R,MOCCI A,LANZA M.I know what you did last summer:an investigation of how developers spend their time[C]//Proceedings of ICPC 2015.Florence/Firenze:IEEE,2015:5-35.
[4]ROCHA H,VALENTE M T,MARQUES-NETO H,et al.An empirical study on recommendations of similar bugs[C]//Proceedings of SANER 2016.Osaka:IEEE,2016:46-56.
[5]RUNESON P,ALEXANDERSSON M,NYHOLM O.Detection of duplicate defect reports using natural language processing[C]//Proceedings of ICSE 2007.Minneapolis:IEEE,2007:499-510.
[6]WANG X,ZHANG L,XIE T,et al.An approach to detecting duplicate bug reports using natural language and execution information[C]//Proceedings of ICSE 2008.Leipzig:IEEE,2008:461-470.
[7]SUN C,LO D,WANG X,et al.A discriminative model approach for accurate duplicate bug report retrieval[C]//Proceedings of ICSE 2010.Cape Town:ACM,2010:45-54.
[8]SUN C,LO D,JIANG J,et al.Towards more accurate retrieval of duplicate bug reports[C]//Proceedings of ASE 2011.Lawrence:IEEE,2011:253-262.
[9]TIAN Y,SUN C,LO D.Improved duplicate bug report identification[C]//Proceedings of CSMR 2012.Szeged:IEEE,2012:385-390.
[10]YANG X L,LO D,XIA X,et al.Combining Word Embeddingwith Information Retrieval to Recommend Similar Bug Reports[C]//Proceedings of ISSRE 2016.Ottawa:IEEE,2016:127-137.
[11]BETTENBURG N,PREMRAJ R,ZIMMERMANN T.Dupli-cate bug reports considered harmful.Really?[C]//Proceedings of ICSM 2008.Beijing:IEEE,2008:337-345.
[12]HU D Y,CHEN M,WANG T,et al.Recommending Similar Bug Reports:A Novel Approach Using Document Embedding Model[C]//Proceedings of APSEC 2018.Nara:IEEE,2018:725-726.
[13]DEB K,PRATAP A,AGARWAL S,et al.A fast and elitist multi-objective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197.
[14]HENARD C,PAPADAKIS M,TRAON Y L.Mutation-basedgeneration of software product line test configurations[C]//Proceedings of SSBSE 2014.Bergamo:Springer,2014:92-106.
[15]DONG M H.Similar bug identification method based on bug report and source code[D].Harbin:Harbin Institute of Technology,2018.
[16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].(2013-9-7)[2020-06-27].https://arxiv.org/abs/1301.3781.
[17]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS 2013.Lake Tahoe:NIPS Foundation,2013:3111-3119.
[18]ZIMMERMANN T,ZELLER A,WEISSGERBER P,et al.Minning version histories to guide software changes[J].IEEE Transactions,2005,31(6):429-445.
[19]CHEN C Y,GAO S,XING Z.Mining Analogical Libraries in Q&A Discussions-Incorporating Relational and Categorical Knowledge into Word Embedding[C]//Proceedings of SANER 2016.Suita:IEEE,2016:338-348.
[20]SHIHAB E,IHARA A,KAMEI Y,et al.Studying re-openedbugs in open source software[J].Empirical Software Enginee-ring,2013,18(5):1005-1042.
[21]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of SIGSOFT FSE 2014.Hong Kong:ACM,2014:689-699.
[22]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports[C]//Proceedings of ICSE 2012.Zurich:IEEE,2012:14-24.
[23]YE X,SHEN H,MA X,et al.From word embeddings to document similarities for improved information retrieval in software engineering[C]//Proceedings of ICSE 2016.Austin:IEEE,2016:404-415.
[24]KOCHHAR P S,XIA X,LO D,et al.Practitioners' expectations on automated fault localization[C]//Proceedings of ISSTA 2016.Saarbrücken:ACM,2016:165-176.
[1] 梁浩宏, 古天龙, 宾辰忠, 常亮. 联合学习用户端和项目端知识图谱的个性化推荐[J]. 计算机科学, 2021, 48(5): 109-116.
[2] 陈恒, 王维美, 李冠宇, 史一民. 四元数关系旋转的知识图谱补全模型[J]. 计算机科学, 2021, 48(5): 225-231.
[3] 肖诗涛, 邵蓥侠, 宋卫平, 崔斌. 面向协同过滤推荐的新型混合评分函数[J]. 计算机科学, 2021, 48(3): 113-118.
[4] 郝志峰, 廖祥财, 温雯, 蔡瑞初. 基于多上下文信息的协同过滤推荐算法[J]. 计算机科学, 2021, 48(3): 168-173.
[5] 韩立锋, 陈莉. 融合用户属性与项目流行度的用户冷启动推荐模型[J]. 计算机科学, 2021, 48(2): 114-120.
[6] 杭婷婷, 冯钧, 陆佳民. 知识图谱构建技术:分类、调查和未来方向[J]. 计算机科学, 2021, 48(2): 175-189.
[7] 田野, 寿黎但, 陈珂, 骆歆远, 陈刚. 基于字段嵌入的数据库自然语言查询接口[J]. 计算机科学, 2020, 47(9): 60-66.
[8] 金文清, 韩芳. 一种基于音高显著性增强的主旋律提取方法[J]. 计算机科学, 2020, 47(6A): 24-28.
[9] 白雪, 努尔布力, 王亚东. 网络安全态势感知研究现状与发展趋势的图谱分析[J]. 计算机科学, 2020, 47(6A): 340-343.
[10] 邹海涛, 郑尚, 王琦, 于化龙, 高尚. 基于牛顿法的自适应高阶评分距离推荐模型研究[J]. 计算机科学, 2020, 47(6A): 494-499.
[11] 李鑫超, 李培峰, 朱巧明. 一种基于改进向量投影距离的知识图谱表示方法[J]. 计算机科学, 2020, 47(4): 189-193.
[12] 李太松,贺泽宇,王冰,颜永红,唐向红. 基于循环时间卷积网络的序列流推荐算法[J]. 计算机科学, 2020, 47(3): 103-109.
[13] 冯晨娇,梁吉业,宋鹏,王智强. 基于极端评分行为的相似度计算[J]. 计算机科学, 2020, 47(2): 31-36.
[14] 古雪梅,刘嘉勇,程芃森,何祥. 基于增强BiLSTM-CRF模型的推文恶意软件名称识别[J]. 计算机科学, 2020, 47(2): 245-250.
[15] 相颖, 冯钧, 夏珮珮, 陆佳民. 基于Bootstrapping的水利空间关系词提取[J]. 计算机科学, 2020, 47(12): 131-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘东, 王叶斐, 林建平, 马海川, 杨闰宇. 端到端优化的图像压缩技术进展[J]. 计算机科学, 2021, 48(3): 1 -8 .
[2] 潘金山. 基于深度学习的图像去模糊方法研究进展[J]. 计算机科学, 2021, 48(3): 9 -13 .
[3] 赵露露, 沈玲, 洪日昌. 图像修复研究进展综述[J]. 计算机科学, 2021, 48(3): 14 -26 .
[4] . 多媒体技术进展专题前言[J]. 计算机科学, 2021, 48(3): 0 -00 .
[5] 李笠, 李广鹏, 常亮, 古天龙. 约束进化算法及其应用研究综述[J]. 计算机科学, 2021, 48(4): 1 -13 .
[6] 李超, 覃飙. 高效计算因果网中的最大可能解释[J]. 计算机科学, 2021, 48(4): 14 -19 .
[7] 宋慧超, 刘晓楠, 王洪, 尹美娟, 江舵. 基于Grover搜索算法的整数分解[J]. 计算机科学, 2021, 48(4): 20 -25 .
[8] 何彬, 许道云. 正则(3,4)-CNF公式的社区结构[J]. 计算机科学, 2021, 48(4): 26 -30 .
[9] 高吉吉, 岳雪蓉, 陈智斌. 针对经典排序问题的一种新算法的近似比分析[J]. 计算机科学, 2021, 48(4): 37 -42 .
[10] 鲁巡, 李妍妍, 秦克云. 三种近似算子之间的关系[J]. 计算机科学, 2021, 48(4): 49 -53 .