计算机科学 ›› 2013, Vol. 40 ›› Issue (5): 164-167.

• 软件与数据库技术 • 上一篇    下一篇

统计策略序列模式挖掘及其在软件缺陷预测中的应用

唐磊,李春平,杨柳   

  1. 清华大学软件学院 北京100084;清华大学软件学院 北京100084;中南大学软件学院 长沙410075
  • 出版日期:2018-11-16 发布日期:2018-11-16

Statistically Significant Sequential Pattern Mining Applying to Software Defect Prediction

TANG Lei,LI Chun-ping and YANG Liu   

  • Online:2018-11-16 Published:2018-11-16

摘要: 人类的生活越来越依赖于高可靠性和可用性的软件系统,软件缺陷一直是软件工程领域中研究最活跃的内容之一。在研究序列模式挖掘技术的基础上,介绍了软件缺陷预测的相关技术,设计了一种基于统计策略的序列模式挖掘算法的软件缺陷预测方案,实现了InfoMiner和STAMP两种模式挖掘算法、卡方检验特征选择和SVM等分类算法;构造了一个软件缺陷预测模型,实现了预测和发现软件系统中的未知缺陷的功能。实验结果表明,所提软件预测模型可以获得良好的预测结果,具有一定的使用价值和应用前景。

关键词: 数据挖掘,序列模式,软件缺陷,信息增益,分类预测

Abstract: Nowadays the human beings are more and more reliant on software systems which have high reliability and usability,and the technology of software defect prediction has been one of the most active parts of software engineering.This paper introduced the technology of software defect prediction on the basis of sequential pattern mining and designed a model for software defect prediction with the technology of mining statistically significant pattern.It described the architecture and detailed implementation of the algorithms named “InfoMiner” and “STAMP”.The model using InfoMiner and STAMP to mine patterns,chi-square test to feature selection and SVM to classify can find unknown defects with high probability.Experimental results show that the model is able to get high prediction accuracy,so that it is valua-ble and has future prospects.

Key words: Data mining,Sequential pattern,Software defect,Information gain,Classification and prediction

[1] Agrawal R,Srikant R.Mining sequential patterns[C]∥Procee-dings of the Eleventh International Conference on Data Engineering.Washington DC,USA:IEEE Computer Society,1995:3-14
[2] Chen Yuan,Shen Xiang-heng,Du Peng,et al.Research on Software Defect Prediction Based on Data Mining[C]∥The 2nd International Conference on Computer and Automation Enginee-ring.Singapore:ICCAE,2010:563-567
[3] Yang Jiong,Wang Wei, Yu P S.Infominer:mining surprising periodic patterns [C]∥Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’01).New York,USA:ACM,2001:395-400
[4] Yang Jiong,Wang Wei,Yu P S.STAMP :on discovery of statistically important pattern repeats in long sequential data[C]∥Proceedings of the Third SIAM International Conference on Data Mining(SDM’03).San Francisco,CA,USA:SIAM,2003:224-238
[5] 张小康.基于数据挖掘和机器学习的恶意代码检测技术研究[D].合肥:中国科学技术大学,2009
[6] 周聚.基于网络信息审计的文本过滤的研究与实现[D].苏州:苏州大学,2010
[7] 杨明,张载鸿.决策树学习算法ID3的研究[J].微机发展,2002,2(5):6-9
[8] Quinlan J R.C4.5:Programs for Machine Learning[M].San Francisco:Morgan Kaufmann Publishers,1993
[9] 眭俊明,姜远,周志华,等.基于频繁项集挖掘的贝叶斯分类算法[J].计算机研究与发展,2007,4(8):1293-1300
[10] Han Jia-wei,Kamber M.Data Mining:Concepts and Techniques [M].San Francisco:Morgan Kaufmann Publishers,2006
[11] Lo D,Cheng Hong,Han Jia-wei,et al.Classification of Software Behaviors for Failure Detection:A Discriminative Pattern Mi-ning Approach [C]∥Proceedings of the 15th ACM SIGKDD(KDD’09).New York,USA:ACM,2009:557-565

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!