计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 193-198.doi: 10.11896/j.issn.1002-137X.2018.11.030

• 软件与数据库技术 • 上一篇    下一篇

一种基于文本分类和评分机制的软件缺陷分配方法

史小婉, 马于涛   

  1. (武汉大学计算机学院 武汉430072)
  • 收稿日期:2017-11-30 发布日期:2019-02-25
  • 作者简介:史小婉(1991-),女,硕士生,主要研究方向为软件工程;马于涛(1980-),男,副教授,CCF高级会员,主要研究方向为软件工程和服务计算,E-mail:ytma@whu.edu.cn(通信作者)。
  • 基金资助:
    本文受国家重点基础研究发展计划(973)(2014CB340404),国家自然科学基金(61672387,61702378),武汉市黄鹤英才(现代服务)计划资助。

Software Bug Triaging Method Based on Text Classification and Developer Rating

SHI Xiao-wan, MA Yu-tao   

  1. (School of Computer Science,Wuhan University,Wuhan 430072,China)
  • Received:2017-11-30 Published:2019-02-25

摘要: 开源软件项目的缺陷管理和修复是保障软件质量及软件开发效率的重要手段,而提高软件缺陷分配的效率是其中亟需解决的一个关键问题。文中提出了一种基于文本分类和评分机制的开发者预测方法,其核心思想是综合考虑基于机器学习的文本分类和基于软件缺陷从属特征的评分机制来构建预测模型。针对大型开源软件项目Eclipse和Mozilla的十万级已修复软件缺陷的实验表明,在“十折”增量验证模式下,所提方法的最好平均准确率分别达到了78.39%和64.94%,比基准方法(机器学习分类+再分配图)的最高平均准确率分别提升了17.34%和10.82%,从而验证了其有效性。

关键词: 评分, 缺陷分配, 文本分类, 预测模型, 支持向量机

Abstract: Bug management and repair in open-source software (OSS) projects are meaningful ways to ensure the quality of software and the efficiency of software development,and improving the efficiency of bug triaging is an urgent problem to be resolved.A prediction method based on text classification and developer rating was proposed in this paper.The core idea of building the prediction model is to consider both text classification based on machine learning and rating mechanism based on the source of bugs.According to the experiment on hundreds of thousands of bugs in the Eclipse and Mozilla projects,in the ten-fold incremental verification mode,the best average accuracies of the proposed method reach 78.39% and 64.94%,respectively.Moreover,its accuracies are increased by 17.34% and 10.82%,respectively,compared with the highest average accuracies of the baseline method(machine learning classification +tos-sing graphs).Therefore,the results indicate the effectiveness of the proposed method.

Key words: Bug triage, Prediction model, Rating, Support vector machine, Text classification

中图分类号: 

  • TP311.5
[1]ZIMMERMANN T,PREMRAJ R,SILLITO J,et al.Improving bug tracking systems[C]∥Proceedings of the 31st International Conference on Software Engineering.New York:IEEE Press,2009:247-250.
[2]XUAN J,JIANG H,HU Y,et al.Towards Effective Bug Triage with Software Data Reduction Techniques [J].IEEE Transactions on Knowledge & Data Engineering,2014,27(1):264-280.
[3]JEONG G,KIM S,ZIMMERMANN T.Improving bug triage with bug tossing graphs[C]∥Proceedings of the 7th Joint Mee-ting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.New York:ACM Press,2009:111-120.
[4]ANVIK J.Automating Bug Report Assignment [C]∥Procee- dings of the 28th International Conference on Software enginee-ring.New York:ACM Press,2006:937-940.
[5]ZHANG T,JIANG H,LUO X,et al.A Literature Review of Research in Bug Resolution:Tasks,Challenges and Future Directions[J].The Computer Journal,2016,59(5):741-773.
[6]XIA X,LO D,WANG X,et al.Accurate developer recommendation for bug resolution[C]∥Proceedings of the 20th Working Conference on Reverse Engineering.New York:IEEE Press,2013:72-81.
[7]AKILA V,ZAYARAZ G,GOVINDASAMY V.Bug triage in open source systems:a review[J].International Journal of Collaborative Enterprise,2014,4(4):299-319.
[8]LIU H Y,MA Y T.Developer Recommendation Method for Automatic Software Bug Triage [J].Journal of Chinese Computer Systems,2017,38(12):2747-2753.(in Chinese)
刘海洋,马于涛.一种针对软件缺陷自动分派的开发者推荐方法[J].小型微型计算机系统,2017,38(12):2747-2753.
[9]CUBRANIC D,MURPHY G C.Automatic Bug Triage Using Text Categorization[C]∥Proceedings of the 16th International Conference on Software Enginee-ring and Knowledge Engineering.Pittsburgh:KSI Research Inc.,2004:92-97.
[10]ANVIK J,HIEW L,MURPHY G C.Who Should Fix This Bug?[C]∥Proceedings of the 28th International Conference on Software Engineering.New York:ACM Press,2006:361-370.
[11]LIN Z,SHU F,YANG Y,et al.An empirical study on bug assignment automation using Chinese bug data[C]∥Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement.New York:IEEE Press,2009:451-455.
[12]SAHA R K,LEASE M,KHURSHID S,et al.Improving bug lo- calization using structured information retrieval[C]∥Procee-dings of the 28th IEEE/ACM International Conference on Automated Software Engineering.New York:IEEE Press,2014:345-355.
[13]WANG S,LO D.Version history,similar report,and structure:putting them together for improved bug localization[C]∥Proceedings of the 22nd International Conference on Program Comprehension.New York:ACM Press,2014:53-63.
[14]CHEN L,WANG X,LIU C.An Approach to Improving Bug Assignment with Bug Tossing Graphs and Bug Similarities[J].Journal of Software,2011,6(3):421-427.
[15]WANG S,ZHANG W,YANG Y,et al.DevNet:Exploring Developer Collaboration in Heterogeneous Networks of Bug Repositories[C]∥Proceedings of the 7th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.New York:IEEE Press,2013:193-202.
[16]WU W,ZHANG W,YANG Y,et al.DREX:Developer Recommendation with K-Nearest-Neighbor Search and Expertise Ranking[C]∥Proceedings of the 18th Asia Pacific Software Engineering Conference.New York:IEEE Press,2011:389-396.
[17]XUAN J,JIANG H,REN Z,et al.Developer Prioritization in Bug Repositories [C]∥Proceedings of the 34th International Conference on Software Engineering.New York:IEEE Press,2012:25-35.
[18]HU H,ZHANG H,XUAN J,et al.Effective Bug Triage Based on Historical Bug-Fix Information[C]∥Proceedings of the 25thIEEE International Symposium on Software Reliability Engineering.New York:IEEE Press,2014:122-132.
[19]YAN M,ZHANG X H,YANG D,et al.A Component Recommender for Bug Reports Using Discriminative Probability Latent Semantic Analysis[M].Butterworth-Heinemann,2016,73:37-51.
[20]ZHANG W,WANG S,WANG Q.KSAP:An Approach to Bug Report Assignment Using KNN Search and Heterogeneous Proximity [J].Information and Software Technology,2016,70:68-84.
[21]XIA X,LO D,WANG X,et al.Dual Analysis for Recommending Developers to Resolve Bugs [J].Journal of Software:Evolution and Process,2015,27(3):195-220.
[22]BHATTACHARYA P,NEAMTIU I,SHELTON C R.Auto- mated,Highly-Accurate,Bug Assignment Using Machine Learning and Tossing Graphs [J].Journal of Systems and Software,2012,85(10):2275-2292.
[23]MIKOLOV T,SUTSKEVERI,CHEN K,et al.Distributed Representations of Words and Phrases and their Compositionality [C]∥Proceedings ofthe 27th Annual Conference on Neural Information Processing Systems.La Jolla:Neural Information Processing Systems Foundation,2013:3111-3119.
[24]GAN J,CHEN L C.Research of improved IF-IDF Weighting algorithm[C]∥Proceedings of the 2nd International Conference on Information Science and Engineering.New York:IEEE Press,2011:2304-2307.
[25]LILLEBERG J,ZHU Y,ZHANG Y.Support vector machines and word2vec for text classification with semantic features[C]∥Proceedings of the 14th IEEE International Conference on Cognitive Informatics & Cognitive Computing.New York:IEEE Press,2015:136-140.
[26]CHANG C C,LIN C J.LIBSVM:a library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.
[27]RONG X.word2vec parameter learning explained[EB/OL]. https://arXiv.org/abs/1411.2738.
[28]GOLDBERG Y,LEVY O.word2vec explained:deriving mikolov et al. negative-sampling word-embedding method[EB/OL].https://arXiv.org/abs/1402.3722.
[1] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[2] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[3] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[4] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[5] 孙晓寒, 张莉.
基于评分区域子空间的协同过滤推荐算法
Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace
计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062
[6] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[7] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[8] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[9] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[10] 单晓英, 任迎春.
基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别
Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm
计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216
[11] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[12] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[13] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[14] 陈壮, 邹海涛, 郑尚, 于化龙, 高尚.
基于用户覆盖及评分差异的多样性推荐算法
Diversity Recommendation Algorithm Based on User Coverage and Rating Differences
计算机科学, 2022, 49(5): 159-164. https://doi.org/10.11896/jsjkx.210300263
[15] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!