Computer Science ›› 2015, Vol. 42 ›› Issue (9): 249-252.doi: 10.11896/j.issn.1002-137X.2015.09.048

Previous Articles     Next Articles

Classification Method of Imbalanced Data Based on RSBoost

LI Ke-wen, YANG Lei, LIU Wen-ying, LIU Lu and LIU Hong-tai   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The problem of class imbalance which is very common to many application domains becomes the research hotspot in data mining and machine learning.We presented a new classification method of imbalance data,called RSBoost,to increase the recognition rate of minority class and the classification efficiency.This approach uses SMOTE(synthetic minority over-sampling technique) and random under-sampling to balance the data sets,and then uses boosting method to optimize the classification performance.We conducted experiments using several public data sets to eva-luate the performances of RSBoost and other four methods.The experimental results show that the approach proposed in this article can improve the classification performance and efficiency of imbalance data sets.

Key words: Imbalanced data,Mixed data sampling,Boosting,RSBoost

[1] Batista G E A P A,Prati R C,Monard M C.A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29
[2] Gao Jia-wei,Liang Ji-ye.Research and Advancement of Classification Method of Imbalanced Data Sets [J].Computer Science,2008,5(4):10-13
[3] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-SamplingTechnique[J].Journal of Artificial Intelligence Research,2002,6(1):321-357
[4] Laurikkala J.Improving Identification of Difficult Small Classes by Balancing Class Distribution[C]∥Proceedings of the 8th Conference on AI in Medicine Europe:Artificial.2001:63-66
[5] Drummond C,Holte R C.C4.5,Class Imbalance and Cost Sensitivity:Why Under-Sampling beats Over-Sampling[C]∥Proceedings of the ICML’03 Workshop on Learning from.2003
[6] Seiffert C,Khoshgoftaar T M,Van Hulse J,et al.RUSBoost:A Hybrid Approach to Alleviating Class Imbalance[J].IEEE T ransactions on System,MAN,and Cybernetics-PART A:Systems and Humans,2010,0(1):185-197
[7] Batista G E,Prati R C,Monard M C.A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29
[8] Chawla N V,Cieslak D A,Hall L O,et al.Automatically Coun-tering Imbalance and Its Empirical Relationship to Cost[J].Data Mining and Knowledge Discovery,2008,17(2):225-252
[9] Wang C X,Pan Z M,Ma C S,et al.Classification for Imbalanced Dataset of Improved Weighted KNN Algorithm[J].Computer Engineering,2012,38(20):160-163
[10] Joshi M V,Kumar V,Agarwal R.Evaluating Boosting Algo-rithms to Classify Rare Classes:Comparison and Improvements[C]∥Proc of the 1st IEEE International Conference on Data Mining.San Jose,USA,2001:257-264
[11] Chawla N V,Lazarevic A,Hall L O,et al.Smoteboost:Improving Prediction of the Minority Class in Boosting[C]∥Proc.of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases.Dubrovnik,Croatia,2003:107-119
[12] Li X F,Li J,Dong Y F,et al.A new learning algorithm for imbalanced data-PCBoost[J].Chinese Journal of Computers,2012,35(2):202-209
[13] Hothorn T,Buehlmann P,Kneib T,et al.mboost:Model-based boosting 2.0[J].Journal of Machine Learning Research,2010(11):2109-2113
[14] Ganganwar V.An overview of classification algorithms for imbalanced datasets[J].International Journal of Emerging Technology and Advanced Engineering,2012,2(4):42-47
[15] Gao S.An ensemble classifier learning approach to ROC optimization;Pattern Recognition[C]∥18th International Conference on ICPR.2006:679-682
[16] Hand D J,TillR J.A simple generalization of the area under the ROC curve for multiple[J].Machine Learning,2001,45(2):172-186

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!