一种基于文本分类和评分机制的软件缺陷分配方法

doi:10.11896／j.issn.1002-137X.2018.11.030

摘要/Abstract

摘要： 开源软件项目的缺陷管理和修复是保障软件质量及软件开发效率的重要手段,而提高软件缺陷分配的效率是其中亟需解决的一个关键问题。文中提出了一种基于文本分类和评分机制的开发者预测方法,其核心思想是综合考虑基于机器学习的文本分类和基于软件缺陷从属特征的评分机制来构建预测模型。针对大型开源软件项目Eclipse和Mozilla的十万级已修复软件缺陷的实验表明,在“十折”增量验证模式下,所提方法的最好平均准确率分别达到了78.39%和64.94%,比基准方法(机器学习分类+再分配图)的最高平均准确率分别提升了17.34%和10.82%,从而验证了其有效性。

关键词: 评分, 缺陷分配, 文本分类, 预测模型, 支持向量机

Abstract: Bug management and repair in open-source software (OSS) projects are meaningful ways to ensure the quality of software and the efficiency of software development,and improving the efficiency of bug triaging is an urgent problem to be resolved.A prediction method based on text classification and developer rating was proposed in this paper.The core idea of building the prediction model is to consider both text classification based on machine learning and rating mechanism based on the source of bugs.According to the experiment on hundreds of thousands of bugs in the Eclipse and Mozilla projects,in the ten-fold incremental verification mode,the best average accuracies of the proposed method reach 78.39% and 64.94%,respectively.Moreover,its accuracies are increased by 17.34% and 10.82%,respectively,compared with the highest average accuracies of the baseline method(machine learning classification +tos-sing graphs).Therefore,the results indicate the effectiveness of the proposed method.

Key words: Bug triage, Prediction model, Rating, Support vector machine, Text classification

中图分类号:

TP311.5

史小婉, 马于涛. 一种基于文本分类和评分机制的软件缺陷分配方法[J]. 计算机科学, 2018, 45(11): 193-198. https://doi.org/10.11896／j.issn.1002-137X.2018.11.030

SHI Xiao-wan, MA Yu-tao. Software Bug Triaging Method Based on Text Classification and Developer Rating[J]. Computer Science, 2018, 45(11): 193-198. https://doi.org/10.11896／j.issn.1002-137X.2018.11.030

参考文献

[1]ZIMMERMANN T,PREMRAJ R,SILLITO J,et al.Improving bug tracking systems[C]∥Proceedings of the 31^st International Conference on Software Engineering.New York:IEEE Press,2009:247-250.
[2]XUAN J,JIANG H,HU Y,et al.Towards Effective Bug Triage with Software Data Reduction Techniques [J].IEEE Transactions on Knowledge & Data Engineering,2014,27(1):264-280.
[3]JEONG G,KIM S,ZIMMERMANN T.Improving bug triage with bug tossing graphs[C]∥Proceedings of the 7^th Joint Mee-ting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.New York:ACM Press,2009:111-120.
[4]ANVIK J.Automating Bug Report Assignment [C]∥Procee- dings of the 28^th International Conference on Software enginee-ring.New York:ACM Press,2006:937-940.
[5]ZHANG T,JIANG H,LUO X,et al.A Literature Review of Research in Bug Resolution:Tasks,Challenges and Future Directions[J].The Computer Journal,2016,59(5):741-773.
[6]XIA X,LO D,WANG X,et al.Accurate developer recommendation for bug resolution[C]∥Proceedings of the 20^th Working Conference on Reverse Engineering.New York:IEEE Press,2013:72-81.
[7]AKILA V,ZAYARAZ G,GOVINDASAMY V.Bug triage in open source systems:a review[J].International Journal of Collaborative Enterprise,2014,4(4):299-319.
[8]LIU H Y,MA Y T.Developer Recommendation Method for Automatic Software Bug Triage [J].Journal of Chinese Computer Systems,2017,38(12):2747-2753.(in Chinese)
刘海洋,马于涛.一种针对软件缺陷自动分派的开发者推荐方法[J].小型微型计算机系统,2017,38(12):2747-2753.
[9]CUBRANIC D,MURPHY G C.Automatic Bug Triage Using Text Categorization[C]∥Proceedings of the 16^th International Conference on Software Enginee-ring and Knowledge Engineering.Pittsburgh:KSI Research Inc.,2004:92-97.
[10]ANVIK J,HIEW L,MURPHY G C.Who Should Fix This Bug?[C]∥Proceedings of the 28^th International Conference on Software Engineering.New York:ACM Press,2006:361-370.
[11]LIN Z,SHU F,YANG Y,et al.An empirical study on bug assignment automation using Chinese bug data[C]∥Proceedings of the 3^rd International Symposium on Empirical Software Engineering and Measurement.New York:IEEE Press,2009:451-455.
[12]SAHA R K,LEASE M,KHURSHID S,et al.Improving bug lo- calization using structured information retrieval[C]∥Procee-dings of the 28^th IEEE/ACM International Conference on Automated Software Engineering.New York:IEEE Press,2014:345-355.
[13]WANG S,LO D.Version history,similar report,and structure:putting them together for improved bug localization[C]∥Proceedings of the 22^nd International Conference on Program Comprehension.New York:ACM Press,2014:53-63.
[14]CHEN L,WANG X,LIU C.An Approach to Improving Bug Assignment with Bug Tossing Graphs and Bug Similarities[J].Journal of Software,2011,6(3):421-427.
[15]WANG S,ZHANG W,YANG Y,et al.DevNet:Exploring Developer Collaboration in Heterogeneous Networks of Bug Repositories[C]∥Proceedings of the 7^th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.New York:IEEE Press,2013:193-202.
[16]WU W,ZHANG W,YANG Y,et al.DREX:Developer Recommendation with K-Nearest-Neighbor Search and Expertise Ranking[C]∥Proceedings of the 18^th Asia Pacific Software Engineering Conference.New York:IEEE Press,2011:389-396.
[17]XUAN J,JIANG H,REN Z,et al.Developer Prioritization in Bug Repositories [C]∥Proceedings of the 34^th International Conference on Software Engineering.New York:IEEE Press,2012:25-35.
[18]HU H,ZHANG H,XUAN J,et al.Effective Bug Triage Based on Historical Bug-Fix Information[C]∥Proceedings of the 25^thIEEE International Symposium on Software Reliability Engineering.New York:IEEE Press,2014:122-132.
[19]YAN M,ZHANG X H,YANG D,et al.A Component Recommender for Bug Reports Using Discriminative Probability Latent Semantic Analysis[M].Butterworth-Heinemann,2016,73:37-51.
[20]ZHANG W,WANG S,WANG Q.KSAP:An Approach to Bug Report Assignment Using KNN Search and Heterogeneous Proximity [J].Information and Software Technology,2016,70:68-84.
[21]XIA X,LO D,WANG X,et al.Dual Analysis for Recommending Developers to Resolve Bugs [J].Journal of Software:Evolution and Process,2015,27(3):195-220.
[22]BHATTACHARYA P,NEAMTIU I,SHELTON C R.Auto- mated,Highly-Accurate,Bug Assignment Using Machine Learning and Tossing Graphs [J].Journal of Systems and Software,2012,85(10):2275-2292.
[23]MIKOLOV T,SUTSKEVERI,CHEN K,et al.Distributed Representations of Words and Phrases and their Compositionality [C]∥Proceedings ofthe 27^th Annual Conference on Neural Information Processing Systems.La Jolla:Neural Information Processing Systems Foundation,2013:3111-3119.
[24]GAN J,CHEN L C.Research of improved IF-IDF Weighting algorithm[C]∥Proceedings of the 2^nd International Conference on Information Science and Engineering.New York:IEEE Press,2011:2304-2307.
[25]LILLEBERG J,ZHU Y,ZHANG Y.Support vector machines and word2vec for text classification with semantic features[C]∥Proceedings of the 14^th IEEE International Conference on Cognitive Informatics & Cognitive Computing.New York:IEEE Press,2015:136-140.
[26]CHANG C C,LIN C J.LIBSVM:a library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.
[27]RONG X.word2vec parameter learning explained[EB/OL]. https://arXiv.org/abs/1411.2738.
[28]GOLDBERG Y,LEVY O.word2vec explained:deriving mikolov et al. negative-sampling word-embedding method[EB/OL].https://arXiv.org/abs/1402.3722.

相关文章 15

[1]	武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[2]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[3]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[4]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[5]	孙晓寒, 张莉. 基于评分区域子空间的协同过滤推荐算法 Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace 计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062
[6]	邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[7]	侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[8]	康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[9]	邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[10]	单晓英, 任迎春. 基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别 Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm 计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216
[11]	陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[12]	邓朝阳, 仲国强, 王栋. 基于注意力门控图神经网络的文本分类 Text Classification Based on Attention Gated Graph Neural Network 计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[13]	邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[14]	陈壮, 邹海涛, 郑尚, 于化龙, 高尚. 基于用户覆盖及评分差异的多样性推荐算法 Diversity Recommendation Algorithm Based on User Coverage and Rating Differences 计算机科学, 2022, 49(5): 159-164. https://doi.org/10.11896/jsjkx.210300263
[15]	刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed