Computer Science ›› 2021, Vol. 48 ›› Issue (11): 184-191.doi: 10.11896/jsjkx.200900107

• Database & Big Data & Data Science • Previous Articles     Next Articles

Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin

LU Shu-xia1,2, ZHANG Zhen-lian1   

  1. 1 College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China
    2 Hebei Province Key Laboratory of Machine Learning and Computational Intelligence,Baoding,Hebei 071002,China
  • Received:2020-09-11 Revised:2021-02-06 Online:2021-11-15 Published:2021-11-10
  • About author:LU Shu-xia,born in 1966,Ph.D,professor,postgraduate supervisor,is a member of China Computer Federation.Her main research interests include machine learning and so on.
  • Supported by:
    National Natural Science Foundation of China(61672205) and Key R & D Program of Science and Technology Foundation of Hebei Province(19210310D).

Abstract: In order to solve the problem of imbalanced data classification,this paper proposes an AdaBoostv algorithm based on optimal margin.In this algorithm,the improved SVM is used as the base classifier,the margin mean term is introduced into the optimization model of SVM,and the margin mean term and loss function term are weighted by data imbalance ratio.The stochastic variance reduced gradient (SVRG) is used to solve the optimization model to improve the convergence rate.In the optimal margin AdaBoostv algorithm,a new adaptive cost sensitive function is introduced into the instance weight update formula,the minority instances,the misclassified instances and the borderline minority instances are assigned higher cost values.In addition,a new weight strategy of the base classifier is derived by combining the new weight formula and introducing the estimated value of the optimal margin under the given precision parameter v,so as to further improve the classification accuracy of the algorithm.The experimental results show that the classification accuracy of the AdaBoostv algorithm with optimal margin is better than other algorithms on imbalanced datasets in the case of linear and nonlinear,and it can obtain a larger minimum margin.

Key words: AdaBoostv, Adaptive cost sensitive function, Imbalanced data, Optimum margin, SVRG

CLC Number: 

  • TP181
[1]BACH M,WERNER A,YWIEC J,et al.The study of under and over-sampling methods utility in analysis of highly imbalanced data on osteoporosis[J].Information Sciences,2017,384(1):174-190.
[2]AMRINE D E,MCLELLAN J G,WHITE B J,et al.Evaluation of three classification models to predict risk class of cattle cohorts developing bovine respiratory disease within the first 14days on feed using on-arrival and/or pre-arrival information[J].Computers & Electronics in Agriculture,2019,156:439-446.
[3]VO D M,LEE S W.Robust face recognition via hierarchical collaborative representation[J].Information Sciences,2018,432:332-346.
[4]WANG W,LIU J,PITSILIS G,et al.Abstracting massive data for lightweight intrusion detection in computer networks[J].Information Sciences,2018,433:417-430.
[5]HAN X,CUI R B,LAN Y F,et al.A Gaussian mixture model based combined resampling algorithm for classification of imba-lanced credit data sets[J].International Journal of Machine Learning and Cybernetics,2019,10:3687-3699.
[6]SHAHEE S A,ANANTHAKUMAR U.An adaptive oversampling technique for imbalanced datasets[J].Computer and Information Engineering,2018,12:1-16.
[7]NIU Z,LI F L,ZHANG X Y,et al,et al.Improved under-sampling method and its application in the classification of imba-lanced data sets[J].Computer Engineering,2019,45(6):218-224.
[8]YANG H,CHEN H M.Mixed-sampling Method for Imbalanced Data Based on Quantum Evolutionary Algorithm[J].Computer Science,2020,47(11):88-94.
[9]VEROPOULOS K,CAMPBELL C,CRISTIANINI N,et al.Controlling the sensitivity of support vector machines[C]//Proceedings of the International Joint Conference Artificial Intelligence.1999:55-60.
[10]SUN Y,KAMELl M S,WONG A K C,et al.Cost-sensitiveboosting for classification of imbalanced data[J].Pattern Re-cognition,2007,40(12):3358-3378.
[11]TAO X,LI Q,GUO W,et al.Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification[J].Information Sciences,2019,52(4):132-140.
[12]SCHAPIRE R E,FREUND Y,BARTLETT P,et al.Boosting the margin:a new explanation for the effectiveness of voting methods[J].The Annals of Stats,1998,26(5):1651-1686.
[13]RUDIN C,SCHAPIRE R E.On the dynamics of boosting[C]//Advances in Neural Information Processing Systems.2004:32-40.
[14]GRONLUND A,LARSEN K G,MATHIASEN A.OptimalMinimal Margin Maximization with Boosting[C]//Proceedings of the 36th International Conference on Machine Learning(PMLR 97).2019:24-28.
[15]RATSCH G.Soft margins for AdaBoost[J].Machine Learning,2001,42(3):287-320.
[16]RATSCH G,WARMUTH M K.Maximizing the margin with boosting[C]//Proceedings of the Annual Conference on Computational Learning Theory(COLT 2002).2002:319-333.
[17]BREIMAN L.Predictiongames and arcing algorithms[J].Neural Computation,1999,11(7):1493-1518.
[18]RATSCH G,WARMUTH M K.Efficient Margin Maximizingwith Boosting[J].Journal of Machine Learning Research,2005,6:2131-2152.
[19]CHENG F,ZHANG J,WEN C,et al.Large Cost-Sensitive Margin Distribution Machine for Imbalanced Data Classification[J].Neurocomputing,2016,24(8):45-57.
[20]ZHANG P Z,ZHANG H Y.A Review of Features and Labels Dimensionality Reduction Methods of Multi Label Data[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2020,37(5):23-29.
[21]JOHNSON R,ZHANG T.Accelerating stochastic gradient descent using predictive variance reduction[C]//Advanced in Neural Information Systems.2013:315-323.
[22]NEUMANN J V.Zur Theorie der Gesellschaftsspiele[J].Ma-thematische Annalen,1928,100(1):295-320.
[23]STEFANO C D,MANIACI M,FONTANELLA F,et al.Reliable writer identification in medieval manuscripts through page layout features:The “Avila” Bible case[J].Engineering Applications of Artificial Intelligence,2018,72(1):99-110.
[24]KEEL:A software tool to assess evolutionary algorithms forData Mining problems [EB/OL].(2005-11-05)[2019-05-30].http://www.keel.es/.
[25]SHEN C,LI H.Boosting Through Optimization of Margin Distributions[J].IEEE Transactions on Neural Networks,2010,21(4):659-666.
[1] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[2] DONG Qi-da, WANG Zhe, WU Song-yang. Feature Fusion Framework Combining Attention Mechanism and Geometric Information [J]. Computer Science, 2022, 49(5): 129-134.
[3] JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun. Imbalanced Data Classification:A Survey and Experiments in Medical Domain [J]. Computer Science, 2022, 49(1): 80-88.
[4] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[5] ZHANG Ren-zhi, ZHU Yan. Malicious User Detection Method for Social Network Based on Active Learning [J]. Computer Science, 2021, 48(6): 332-337.
[6] CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan. New Associative Classification Algorithm for Imbalanced Data [J]. Computer Science, 2020, 47(6A): 488-493.
[7] SONG Ling-ling, WANG Shi-hui, YANG Chao, SHENG Xiao. Application Research of Improved XGBoost in Imbalanced Data Processing [J]. Computer Science, 2020, 47(6): 98-103.
[8] YANG Hao, CHEN HONG-mei. Mixed-sampling Method for Imbalanced Data Based on Quantum Evolutionary Algorithm [J]. Computer Science, 2020, 47(11): 88-94.
[9] CAI Li, LI Ying-zi, JIANG Fang, LIANG Yu. Study on Clustering Mining of Imbalanced Data Fusion Towards Urban Hotspots [J]. Computer Science, 2019, 46(8): 16-22.
[10] WU Yu-xi, WANG Jun-li, YANG Li, YU Miao-miao. Survey on Cost-sensitive Deep Learning Methods [J]. Computer Science, 2019, 46(5): 1-12.
[11] CAO Ya-xi, HUANG Hai-yan. Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning [J]. Computer Science, 2019, 46(5): 203-208.
[12] XIA Ying, LI Liu-jie, ZHANG XU, BAE Hae-young. Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data [J]. Computer Science, 2019, 46(4): 22-27.
[13] ZHOU Xiao-min, CAO Fu-yuan, YU Li-qin. Bi-directional Oversampling Method Based on Sample Stratification [J]. Computer Science, 2019, 46(12): 83-88.
[14] CHEN Sheng-ling ,SHEN Si-qi, LI Dong-sheng. Ensemble Learning Method for Imbalanced Data Based on Sample Weight Updating [J]. Computer Science, 2018, 45(7): 31-37.
[15] ZHAO Nan, ZHANG Xiao-fang, ZHANG Li-jun. Overview of Imbalanced Data Classification [J]. Computer Science, 2018, 45(6A): 22-27.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!