计算机科学 ›› 2016, Vol. 43 ›› Issue (6): 179-183.doi: 10.11896/j.issn.1002-137X.2016.06.036

• 软件与数据库技术 • 上一篇    下一篇

面向软件缺陷报告的提取方法

林涛,高建华,伏雪,马燕,林艳   

  1. 上海师范大学计算机科学与工程系 上海200234,上海师范大学计算机科学与工程系 上海200234,上海师范大学计算机科学与工程系 上海200234,上海师范大学计算机科学与工程系 上海200234,奥克兰大学信息系统系 奥克兰92019
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61073163,61373004),上海市企业自主创新专项资金项目(沪CXY-2013-88)资助

Extraction Approach for Software Bug Report

LIN Tao, GAO Jian-hua, FU Xue, MA Yan and LIN Yan   

  • Online:2018-12-01 Published:2018-12-01

摘要: 软件工程中的软件缺陷报告数量在快速增长,开发者们越来越困惑于大量的缺陷报告。因此,为了达到缺陷修复和软件复用等目的,有必要研究软件缺陷报告的提取方法。提出一种提取方法,该方法首先合并缺陷报告中的同义词,然后建立空间向量模型,使用词频反文档频率以及信息增益等文本挖掘的方法来收集软件缺陷报告中单词的特征,同时设计算法来确定句子复杂度以选择长句,最后将贝叶斯分类器引入该领域。该方法可以提高缺陷报告提取的命中率,降低虚警率。实验证明,基于文本挖掘和贝叶斯分类器的软件缺陷报告提取方法在接受者工作特征曲线面积(0.71)、F-score(0.80)和Kappa值(0.75)方面有良好效果。

关键词: 软件缺陷报告管理,文本挖掘,贝叶斯分类器,软件缺陷报告特征,空间向量模型,句子复杂度

Abstract: Bug reports in software engineering areincreasing rapidly,and developers are bewildered by the large number accumulation of reports.Therefore,it is necessary to study on the extraction of bug reports for the task of bug fixing and software reuse,etc.This paper proposed a novel extraction approach.Synonyms are merged into one specific word firstly in the approach.Then it sets up a vector space model.And some text mining methods,such as TF-IDF and information gain,are used to collect word features in bug reports specifically.Meanwhile,there is an algorithm for determining sentence complexity,so as to choose long sentences.Finally Bayes classifier is introduced to bug report extraction.TPR is increased and FPR is decreased in this approach.The experiment proves that the bug report extraction based on text mining and Bayes classifier is competitive in the evaluation of AUC(0.71),F-score(0.80) and Kappa value(0.75).

Key words: Bug report management,Text mining,Bayes classifier,Bug report feature,Vector space model,Sentence complexity

[1] Goyal P,Behera L,Mcginnity T M.A Context-Based Word Indexing Model for Document Summarization[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(8):1693-1705
[2] Mills M T,Bourbakis N G.Graph-Based Methods for Natural Language Processing and Understanding—A Survey and Analysis [J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2014,44(1):59-71
[3] Alenezi M,Banitaan S.Bug Reports Prioritization:Which Features and Classifier to Use?[C]∥12th International Conference on Machine Learning and Applications(ICMLA).Miami,FL,2013:112-116
[4] Kastner C,Dreiling A,Ostermann K.Variability Mining:Consistent Semi-automatic Detection of Product-Line Features[J].IEEE Transactions on Software Engineering,2014,40(1):67-82
[5] Rastkar S,Murphy G C,Murray G.Automatic Summarization of Bug Reports[J].IEEE Transactions on Software Engineering,2014,40(4):366-380
[6] Chen Xuan,Liu Jian,Feng Xin-qi,et al.Differential Private Synthesis Dataset Releasing Algorithm Based on Navie Bayes[J].Computer Science,2015,2(1):236-238(in Chinese) 陈旋,刘健,冯新淇,等.基于朴素贝叶斯的差分隐私合成数据集发布算法[J].计算机科学,2015,42(1):236-238
[7] Rastkar S,Murphy G C.Summarizing Software Artifacts[EB/OL].[2015-04-16].https://www.cs.ubc.ca/cs-research/software-practices-lab/projects/summarizing-software-artifacts
[8] Lee S,Baker J,Song J,et al.An Empirical Comparison of Four Text Mining Methods[J]Journal of Computer Information System,2010,1(1):1-10
[9] Saari P,Eerola T.Semantic Computing of Moods Based on Tags in Social Media of Music[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(10):2548-2560
[10] Mishra A,Singh G.Improving keyphrase extraction by usingdocument topic information[C]∥IEEE International Conference on Granular Computing (GrC).Kaohsiung,2011:463-467
[11] Wijayasekara D,Manic M,Mcqueen M.Information gain based dimensionality selection for classifying text documents[C]∥IEEE Congress on Evolutionary Computation (CEC).Cancun,2013:440-445
[12] Kuan-Yu C,Shih-Hung L,Chen B,et al.A recurrent neural network language modeling framework for extractive speech summarization[C]∥IEEE International Conference on Multimedia and Expo (ICME).Chengdu,2014:1-6
[13] Loria S.TextBlob[EB/OL].[2015-04-16].http://textblob.readthedocs.org/en/dev

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!