不平衡数据分类方法及其在入侵检测中的应用研究

Abstract

Abstract: The traditional classification algorithms always have low classification accuracy rate especially for the minorityclass when they are directly employed on classifying imbalanced datasets．A K-S statistic based new classification method for imbalanced data was proposed to enhance the performance of minority class recognition．At first,the K-S statistic was employed as a correlation measure to remove redundant variables．Then a K-S based decision tree was built to segment the training data into several subsets．Finally,two-way resampling methods,forward and backward,were used to rebuild the segmentation datasets as to implement more reasonable classification learning．The proposed K-S based method,with a realistic assumption,is very high efficient and widely applicable．The KDD99intrusion detection experimental analysis proves that the method has high classification accuracy rate of both minority and majority class for imbalanced datasets.

Key words: Imbalanced data,K-S statistic,Logistic regression,Intrusion detection

JIANG Jie,WANG Zhuo-fang,GONG Rong-sheng and CHEN Tie-ming. Imbalanced Data Classification Method and its Application Research for Intrusion Detection[J].Computer Science, 2013, 40(4): 131-135.

References

[1] Ling C X,Li C．Data mining for direct marketing:Problems and solutions[C]∥Proceedings of the 4th international conference on knowledge discovery and data mining．New York,NY,1998:73-79
[2] Sun Yan-min,Kamel M S,Wong A K C,et al．Cost-Sensitive Boosting for Classification of Imbalanced Data [J].Pattern Re-cognition,2007,40(12):3358-3378
[3] Estabrooks A,Jo T,Japkowicz N．A multiple resampling method for learning from imbalanced data sets [J]．Computational Intelligence,2004,20(1):18-36
[4] Japkowicz N,Stephen S．The class imbalance problem:A systematic study [J]．Intelligent Data Analysis,2002,6(5):429-450
[5] Chawla N V,Bowyer K W,Hall L O,et al．SMOTE:Synthetic minority over-sampling techniques [J]．Journal of Artificial Research,2002,16:321-357
[6] Drummond C,Holte R C．C4.5,Class imbalance,and cost sensitivity:Why under-sampling beats over-sampling [C]∥Procee-dings of the ICML’03Workshop on Learning from Imbalanced Data Sets.2003
[7] Kubat M,Matwin S．Addressing the curse of imbalanced training sets:one-sided selection [C]∥Proceedings of the 14th International Conference on Machine Learning．1997:179-186
[8] Holte R C,Acker L E,Porter B W．Concept learning and the problem of small disjuncts[C]∥Proceedings of the 11th joint international conference on artificial intelligence.1989:813-818
[9] Weiss G M．Mining with rarity:A unifying framework [J]．ACM SIGKDD Explorations Newsletter-Special Issue on Lear-ning from Imbalanced Datasets,2004,6(1):7-19
[10] Quinlan J R．Improved estimates for the accuracy of small disjuncts [J]．Machine Learning,1991,6(1):93-98
[11] Ling C X,Sheng V,Yang Q．Test strategies for cost-sensitive decision trees [J]．IEEE Transactions on Knowledge and Data Engineering,2006,18(8):1055-1067
[12] Veropoulos K,Campbell C,Cristianini N．Controlling the sensitivity of support vector machines [C]∥Proceedings of international joint conference on artificial intelligence.1999:55-66
[13] Zheng Z,Wu X,Srihari R．Feature selection for text categorization on imbalanced Data [J].SIGKDD Explorations,2004,6(1):80-89
[14] Larose D T.数据挖掘方法与模型[M].北京:高等教育出版社,2011:143-146
[15] Han H,Wang W Y,Mao B H．Borderline-SMOTE:A New Over-Sampling Method in imbalanced Data Sets Learning[C]∥Proceedings of the International Conference on Intelligent Computing．Hefei,China,2005:878-887

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Imbalanced Data Classification Method and its Application Research for Intrusion Detection

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0