计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 91-98.doi: 10.11896/jsjkx.200600159
余笙, 李斌, 孙小兵, 薄莉莉, 周澄
YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng
摘要: 软件缺陷在软件开发过程中不可避免,提交的缺陷报告则是分析和修复缺陷的重要信息来源。开发人员常通过借鉴相似的历史缺陷报告和修复信息来辅助对当前新缺陷的分析和修复。文中提出了一种知识驱动的相似缺陷报告推荐方法。该方法首先利用信息检索和Word Embedding技术构建缺陷知识图谱;然后利用TF-IDF和Word Embedding技术计算缺陷报告之间的文本相似度,同时综合考虑缺陷的各项属性,从而得到缺陷报告之间的主次要属性相似度;最后将上述相似度融合成综合相似度,利用综合相似度推荐相似缺陷报告。实验结果表明,与基线方法相比,在Firefox数据集上所提方法的性能平均提高了12.7%。
中图分类号:
[1]FAN T T,XU L,CHEN L.Recommending Similar Bug Reports Based on Multi-Targets Optimization Algorithm NSGA-II[J].Chinese Journal of Computers,2019,42(10):2175-2189. [2]ANVIK J,HIEW L,MURPHY G C.Coping with an open bug repository[C]//Proceedings of the 2005 OOPSLA.San Diego:ACM,2005:35-39. [3]MINELLI R,MOCCI A,LANZA M.I know what you did last summer:an investigation of how developers spend their time[C]//Proceedings of ICPC 2015.Florence/Firenze:IEEE,2015:5-35. [4]ROCHA H,VALENTE M T,MARQUES-NETO H,et al.An empirical study on recommendations of similar bugs[C]//Proceedings of SANER 2016.Osaka:IEEE,2016:46-56. [5]RUNESON P,ALEXANDERSSON M,NYHOLM O.Detection of duplicate defect reports using natural language processing[C]//Proceedings of ICSE 2007.Minneapolis:IEEE,2007:499-510. [6]WANG X,ZHANG L,XIE T,et al.An approach to detecting duplicate bug reports using natural language and execution information[C]//Proceedings of ICSE 2008.Leipzig:IEEE,2008:461-470. [7]SUN C,LO D,WANG X,et al.A discriminative model approach for accurate duplicate bug report retrieval[C]//Proceedings of ICSE 2010.Cape Town:ACM,2010:45-54. [8]SUN C,LO D,JIANG J,et al.Towards more accurate retrieval of duplicate bug reports[C]//Proceedings of ASE 2011.Lawrence:IEEE,2011:253-262. [9]TIAN Y,SUN C,LO D.Improved duplicate bug report identification[C]//Proceedings of CSMR 2012.Szeged:IEEE,2012:385-390. [10]YANG X L,LO D,XIA X,et al.Combining Word Embeddingwith Information Retrieval to Recommend Similar Bug Reports[C]//Proceedings of ISSRE 2016.Ottawa:IEEE,2016:127-137. [11]BETTENBURG N,PREMRAJ R,ZIMMERMANN T.Dupli-cate bug reports considered harmful.Really?[C]//Proceedings of ICSM 2008.Beijing:IEEE,2008:337-345. [12]HU D Y,CHEN M,WANG T,et al.Recommending Similar Bug Reports:A Novel Approach Using Document Embedding Model[C]//Proceedings of APSEC 2018.Nara:IEEE,2018:725-726. [13]DEB K,PRATAP A,AGARWAL S,et al.A fast and elitist multi-objective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197. [14]HENARD C,PAPADAKIS M,TRAON Y L.Mutation-basedgeneration of software product line test configurations[C]//Proceedings of SSBSE 2014.Bergamo:Springer,2014:92-106. [15]DONG M H.Similar bug identification method based on bug report and source code[D].Harbin:Harbin Institute of Technology,2018. [16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].(2013-9-7)[2020-06-27].https://arxiv.org/abs/1301.3781. [17]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS 2013.Lake Tahoe:NIPS Foundation,2013:3111-3119. [18]ZIMMERMANN T,ZELLER A,WEISSGERBER P,et al.Minning version histories to guide software changes[J].IEEE Transactions,2005,31(6):429-445. [19]CHEN C Y,GAO S,XING Z.Mining Analogical Libraries in Q&A Discussions-Incorporating Relational and Categorical Knowledge into Word Embedding[C]//Proceedings of SANER 2016.Suita:IEEE,2016:338-348. [20]SHIHAB E,IHARA A,KAMEI Y,et al.Studying re-openedbugs in open source software[J].Empirical Software Enginee-ring,2013,18(5):1005-1042. [21]YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of SIGSOFT FSE 2014.Hong Kong:ACM,2014:689-699. [22]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports[C]//Proceedings of ICSE 2012.Zurich:IEEE,2012:14-24. [23]YE X,SHEN H,MA X,et al.From word embeddings to document similarities for improved information retrieval in software engineering[C]//Proceedings of ICSE 2016.Austin:IEEE,2016:404-415. [24]KOCHHAR P S,XIA X,LO D,et al.Practitioners' expectations on automated fault localization[C]//Proceedings of ISSTA 2016.Saarbrücken:ACM,2016:165-176. |
[1] | 程章桃, 钟婷, 张晟铭, 周帆. 基于图学习的推荐系统研究综述 Survey of Recommender Systems Based on Graph Learning 计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072 |
[2] | 王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[5] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[6] | 孔世明, 冯永, 张嘉云. 融合知识图谱的多层次传承影响力计算与泛化研究 Multi-level Inheritance Influence Calculation and Generalization Based on Knowledge Graph 计算机科学, 2022, 49(9): 221-227. https://doi.org/10.11896/jsjkx.210700144 |
[7] | 秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111 |
[8] | 方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011 |
[9] | 帅剑波, 王金策, 黄飞虎, 彭舰. 基于神经架构搜索的点击率预测模型 Click-Through Rate Prediction Model Based on Neural Architecture Search 计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009 |
[10] | 齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126 |
[11] | 王杰, 李晓楠, 李冠宇. 基于自适应注意力机制的知识图谱补全算法 Adaptive Attention-based Knowledge Graph Completion 计算机科学, 2022, 49(7): 204-211. https://doi.org/10.11896/jsjkx.210400129 |
[12] | 马瑞新, 李泽阳, 陈志奎, 赵亮. 知识图谱推理研究综述 Review of Reasoning on Knowledge Graph 计算机科学, 2022, 49(6A): 74-85. https://doi.org/10.11896/jsjkx.210100122 |
[13] | 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018 |
[14] | 杜晓明, 袁清波, 杨帆, 姚奕, 蒋祥. 军事指控保障领域命名实体识别语料库的构建 Construction of Named Entity Recognition Corpus in Field of Military Command and Control Support 计算机科学, 2022, 49(6A): 133-139. https://doi.org/10.11896/jsjkx.210400132 |
[15] | 蔡晓娟, 谭文安. 一种改进的融合相似度和信任度的协同过滤算法 Improved Collaborative Filtering Algorithm Combining Similarity and Trust 计算机科学, 2022, 49(6A): 238-241. https://doi.org/10.11896/jsjkx.210400088 |
|