计算机科学 ›› 2016, Vol. 43 ›› Issue (6): 179-183.doi: 10.11896/j.issn.1002-137X.2016.06.036
林涛,高建华,伏雪,马燕,林艳
LIN Tao, GAO Jian-hua, FU Xue, MA Yan and LIN Yan
摘要: 软件工程中的软件缺陷报告数量在快速增长,开发者们越来越困惑于大量的缺陷报告。因此,为了达到缺陷修复和软件复用等目的,有必要研究软件缺陷报告的提取方法。提出一种提取方法,该方法首先合并缺陷报告中的同义词,然后建立空间向量模型,使用词频反文档频率以及信息增益等文本挖掘的方法来收集软件缺陷报告中单词的特征,同时设计算法来确定句子复杂度以选择长句,最后将贝叶斯分类器引入该领域。该方法可以提高缺陷报告提取的命中率,降低虚警率。实验证明,基于文本挖掘和贝叶斯分类器的软件缺陷报告提取方法在接受者工作特征曲线面积(0.71)、F-score(0.80)和Kappa值(0.75)方面有良好效果。
[1] Goyal P,Behera L,Mcginnity T M.A Context-Based Word Indexing Model for Document Summarization[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(8):1693-1705 [2] Mills M T,Bourbakis N G.Graph-Based Methods for Natural Language Processing and Understanding—A Survey and Analysis [J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2014,44(1):59-71 [3] Alenezi M,Banitaan S.Bug Reports Prioritization:Which Features and Classifier to Use?[C]∥12th International Conference on Machine Learning and Applications(ICMLA).Miami,FL,2013:112-116 [4] Kastner C,Dreiling A,Ostermann K.Variability Mining:Consistent Semi-automatic Detection of Product-Line Features[J].IEEE Transactions on Software Engineering,2014,40(1):67-82 [5] Rastkar S,Murphy G C,Murray G.Automatic Summarization of Bug Reports[J].IEEE Transactions on Software Engineering,2014,40(4):366-380 [6] Chen Xuan,Liu Jian,Feng Xin-qi,et al.Differential Private Synthesis Dataset Releasing Algorithm Based on Navie Bayes[J].Computer Science,2015,2(1):236-238(in Chinese) 陈旋,刘健,冯新淇,等.基于朴素贝叶斯的差分隐私合成数据集发布算法[J].计算机科学,2015,42(1):236-238 [7] Rastkar S,Murphy G C.Summarizing Software Artifacts[EB/OL].[2015-04-16].https://www.cs.ubc.ca/cs-research/software-practices-lab/projects/summarizing-software-artifacts [8] Lee S,Baker J,Song J,et al.An Empirical Comparison of Four Text Mining Methods[J]Journal of Computer Information System,2010,1(1):1-10 [9] Saari P,Eerola T.Semantic Computing of Moods Based on Tags in Social Media of Music[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(10):2548-2560 [10] Mishra A,Singh G.Improving keyphrase extraction by usingdocument topic information[C]∥IEEE International Conference on Granular Computing (GrC).Kaohsiung,2011:463-467 [11] Wijayasekara D,Manic M,Mcqueen M.Information gain based dimensionality selection for classifying text documents[C]∥IEEE Congress on Evolutionary Computation (CEC).Cancun,2013:440-445 [12] Kuan-Yu C,Shih-Hung L,Chen B,et al.A recurrent neural network language modeling framework for extractive speech summarization[C]∥IEEE International Conference on Multimedia and Expo (ICME).Chengdu,2014:1-6 [13] Loria S.TextBlob[EB/OL].[2015-04-16].http://textblob.readthedocs.org/en/dev |
No related articles found! |
|