Computer Science ›› 2019, Vol. 46 ›› Issue (12): 8-12.doi: 10.11896/jsjkx.180901813

• Big Data & Data Science • Previous Articles     Next Articles

AdaBoostRS:Integration of High-dimensional Unbalanced Data Learning

YANG Ping-an, LIN Ya-ping, ZHU Tuan-fei   

  1. (College of Information Science and Engineering,Hunan University,Changsha 410000,China)
  • Received:2018-09-27 Online:2019-12-15 Published:2019-12-17

Abstract: The class imbalance problem in machine learning contains a skewed distribution of data samples among different classes,resulting in a learning bias toward the majority class.In high-dimensional data,the sparseness of the data makes the classification bias more obvious.For high-dimensional unbalanced data,the two challenging problems of dimensional disaster and class imbalance distribution are superimposed,making it more difficult to solve high-dimensional imbalance problems.This paper proposed an AdaBoost integration method combining random subspace and SMOTE oversampling technology,named AdaBoostRS (AdaBoost ensemble of Random subspace and SMOTE),to deal with the classification of high-dimensional unbalanced data.AdaBoostRS trains each classifier by selecting partial features in a random subspace to increase the diversity of the classification samples and reduce the dimensions of the high-dimensional data.Thena few classes of dimensionality reduction data are linearly interpolated through the SMOTE method to solve the class imbalance problem.The experiment is based on 8 high-dimensional unbalanced standard time series dataset.The results show that AdaBoostRS is superior to the traditional integrated learning method in terms of three performance indicators of F-measure,G-mean and AUC.

Key words: AdaBoost, High-dimensional imbalance, Random subspace, SMOTE

CLC Number: 

  • TP301.6
[1]PARVIN H,BEHROUZ M B,HOSEIN A.Detection of cancer patients using an innovative method for learning at imbalanced datasets[C]//International Conference on Rough Sets and Knowledge Technology.Springer,Berlin,Heidelberg,2011.
[2]CIESLAK D A,CHAWLA N V,STRIEGEL A.Combating im- balance in network intrusion datasets [C]//GrC.2006:732-737.
[3]JING X Y,WU F,DONG X W,et al.An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems[J].IEEE Transactions on Software Engineering,2017,43(4):321-339.
[4]ZHANG Y,ZHOU Z H.Cost-sensitive face recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(10):1758-1769.
[5]LIU C L,HSAIO W H,LEE C H,et al.Semi-supervised text classification with universumlearning[J].IEEE Transactions on Cybernetics,2016,46(2):462-473.
[6]LIU X Y,WU J X,ZHOU Z H.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2009,39(2):539-550.
[7]SÁEZ J A S,LUENGO J,STEFANOWSKI J,et al.SMOTE-IPF:Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J].Information Sciences,2009,21(9):184-203.
[8]HE H B,GARCIA E A.Learning from imbalanced data[J].IEEE Transactions on Knowledge & Data Engineering,2009,21(9):1263-1284.
[9]ALBERTO C,ZAFRA A,VENTURA S.Weighted data gravitation classification for standard and imbalanced data[J].IEEE Transactions on Cybernetics,2013,43(6):1672-1687.
[10]DANIELE C R,PORTINALE L.Dynamic Bayesian networks for fault detection,identification,and recovery in autonomous spacecraft[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2015,45(1):13-24.
[11]TANG Y,ZHANG Y Q,CHAWLA N V,et al.SVMs modeling for highly imbalanced classification[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2009,39(1):281-288.
[12]KANG Q,HUANG B Y,ZHOU M C.Dynamic behavior of artificial Hodgkin-Huxley neuron model subject to additive noise[J].IEEE Transactions on Cybernetics,2016,46(9):2083-2093.
[13]ZHANG X W,HU B G.A new strategy of cost-free learning in the class imbalance problem[J].IEEE Transactions on Know-ledge & Data Engineering,2014,26(12):2872-2885.
[14]LIU X Y,ZHOU Z H.The influence of class imbalance on cost-sensitive learning[C]//Sixth International Conference on Data Mining (ICDM’06).IEEE,2006:970-974.
[15]WEISS,GARY M.Mining with rarity:a unifying framework [J].ACM Sigkdd Explorations Newsletter,2004,6(1):7-19.
[16]PRATI,RONALDO C,BATISTA G E,et al.Class imbalances versus class overlapping:an analysis of a learning system beha-vior[C]//Mexican International Conference on Artificial Intelligence.Springer,Berlin,Heidelberg,2004.
[17]RAO,BHARAT R,KRISHNAN S,et al.Data mining for improved cardiac care[J].ACM SIGKDD Explorations Newsletter 2006,8(1):3-10.
[18]JAPKOWICZ,NATHALIE,MYERS C,et al.A novelty detection approach to classification[M].Morgan Kaufmann Publi-shers Inc,1995.
[19]DI MARTINO M,DECIA F,MOLINELLI J,et al.Improving Electric Fraud Detection using Class Imbalance Strategies [C]//ICPRAM.2012:135-141.
[20]VICTORIA L,SARA D R,MANUEL B J,et al.Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data [J].Fuzzy Sets and Systems,2015(258):5-38.
[21]BARTOSZ K,WOC'NIAK M,SCHAEFER G.Cost-sensitive decision tree ensembles for effective imbalanced classification[J].Applied Soft Computing,2014(14):554-562.
[22]MACIEJ Z,TOMCZAK J M.Boosted SVM with active learning strategy for imbalanced data[J].Soft Computing,2015,19(12):3357-3368.
[23]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[24]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Springer,Berlin,Heidelberg,2005.
[25]YOUNGW A,NYKL S L,WECKMAN G R,et al.Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets[J].Neural Computing and Applications,2015,26(5):1041-1054.
[26]LIU X Y,WU J,ZHOU Z H.Exploratory Under-sampling for class-imbalance learning,bioinformatics[J].Proceedings of the IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2009,39(2):539-550.
[27]VORRABOOT P,RASMEQUAN S,CHINNASARN K.Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms[J].Neurocomputing,2015(152):429-443.
[28]YU H L,NI J,ZHAO J.ACOSampling:an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data[J].Neurocomputing,2013(101):309-318.
[29]YIN Q Y,ZHANG J S,ZHANG C X,et al.A novel selective en- semble algorithm for imbalanced data classification based on exploratory undersampling[J].Mathematical Problems in Engineering,2014,71(3):741-764.
[30]YOAV F.Boosting a weak learning algorithm by majority[J].Information and Computation,1995,121(2):256-285.
[31]CHAWLA N V,LAZAREVIC A,HALL L O,et al.SMOTEBoost:Improving Prediction of the Minority Class in Boosting.[J].Lecture Notes in Computer Science,2003,2838:107-119.
[32]SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J,et al.RUSBoost:a hybrid approach to alleviating class imbalance[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2010,40(1):185-197.
[33]LIU X Y,WU J,ZHOU Z H.Exploratory Under-sampling for class-imbalance learning,bioinformatics [J].Proceedings of the IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2009,39(2):539-550.
[34]NANNI L,FANTOZZI C,LAZZARINI N.Coupling different methods for overcoming the class imbalance problem[J].Neurocomputing,2015,158:48-61.
[35]SUN Z,SONG Q,ZHU X.A novel ensemble method forclassi- fying imbalanced data[J].Pattern Recognition,2015,48:1623-1637.
[36]DÍEZ-PASTOR J F,RODRÍGUEZ J J,GARCÍA-OSORIO C, et al.Random balance:ensembles of variable prors classifiers for imbalanced data[J].Knowledge-Based Systems,2015,85:96-111.
[37]KRAWCZYK B,SCHAEFER G.An improved ensemble ap- proach for imbalanced classification problems[C]//IEEE,International Symposium on Applied Computational Intelligence and Informatics.IEEE,2013:423-426.
[38]ZIEBA M,TOMCZAK J M.Boosted SVM with active learning strategy for imbalanced data[J].Soft Computing,2015,19(12):3357-3368.
[39]BELLINGER C,JAPKOWICZ N,DRUMMOND C.Christopher Drummond.Synthetic Oversampling for Advanced Radioactive Threat Detection[C]//2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).IEEE,2015:948-953.
[40]MATHIEU B,SEKI K,UEHARA K.Tackling class imbalance and data scarcity in literature-based gene function annotation[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2011.
[41]NGUWI Y Y,CHO S Y.Support vector self-organizing learning for imbalanced medical data[C]//International Joint Conference on Neural Networks(IJCNN 2009).IEEE,2009:2250-2255.
[42]NASRABADI,NASSER M.Pattern recognition and machine learning[J].Journal of electronic imaging,2007,16(4):049901.
[43]YANG Q,WU X D.10 challenging problems in data mining research.International[J].Journal of Information Technology & Decision Making,2006,5(4):597-604.
[44]BELLINGER C,DRUMMOND C,JAPKOWICZ N.Manifold- based synthetic oversampling with manifold conformance estimation[J].Machine Learning,2018,107(3):605-637.
[45]CUI Y,MA H,SAHA T.Improvement of power transformer insulation diagnosis using oil characteristics data preprocessed by SMOTEBoosttechnique[J].IEEE Transactions on Dielectrics and Electrical Insulation,2014,21(5):2363-2373.
[46]GU J,JIAO L,LIU F,et al.Random subspace based ensemble sparse representation[J].Pattern Recognition,2018(74):544-555.
[47]KEOGH E,XI X,WEI L C A.Ratanamahatana.UCRTime Series Classification/ClusteringPage[OL].http://www.cs.ucr.edu/~eamonn/time_series_data.
[48]WEI L,KEOGH E J.Semi-Supervised Time Series Classification[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2006:748-753.
[49]GAO J W,LIANG J Y.Research and advancement of classification method of imbalanced data sets[J].Computer Sciense,2008,35:10-13.
[50]LI K W,YANG L,LIU W Y,et al.Unbalanced Data Classification Method Based on RSBoost Algorithm[J].Computer Scien-ce,2015,42(9):249-252.
[51]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[1] ZHOU Zhi-hao, CHEN Lei, WU Xiang, QIU Dong-liang, LIANG Guang-sheng, ZENG Fan-qiao. SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm [J]. Computer Science, 2022, 49(6A): 562-570.
[2] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[3] LIU Quan-ming, LI Yin-nan, GUO Ting, LI Yan-wei. Intrusion Detection Method Based on Borderline-SMOTE and Double Attention [J]. Computer Science, 2021, 48(3): 327-332.
[4] GONG Zhui-fei, WEI Chuan-jia. Link Prediction of Complex Network Based on Improved AdaBoost Algorithm [J]. Computer Science, 2021, 48(3): 158-162.
[5] LU Shu-xia, ZHANG Zhen-lian. Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin [J]. Computer Science, 2021, 48(11): 184-191.
[6] DONG Ming-gang,JIANG Zhen-long,JING Chao. Multi-class Imbalanced Learning Algorithm Based on Hellinger Distance and SMOTE Algorithm [J]. Computer Science, 2020, 47(1): 102-109.
[7] HAN Hui,WANG Li-ming,CHAI Yu-mei,LIU Zhen. Text Sentiment Classification Based on Deep Forests with Enhanced Features [J]. Computer Science, 2019, 46(7): 172-179.
[8] JIN Xu, WANG Lei, SUN Guo-zi, LI Hua-kang. Under-sampling Method for Unbalanced Data Based on Centroid Space [J]. Computer Science, 2019, 46(2): 50-55.
[9] WANG Li, CHEN Hong-mei. NKSMOTE Algorithm Based Classification Method for Imbalanced Dataset [J]. Computer Science, 2018, 45(9): 260-265.
[10] CHEN Sheng-ling ,SHEN Si-qi, LI Dong-sheng. Ensemble Learning Method for Imbalanced Data Based on Sample Weight Updating [J]. Computer Science, 2018, 45(7): 31-37.
[11] LI Shan and RAO Wen-bi. Video-based Detection of Human Motion Area in Mine [J]. Computer Science, 2018, 45(4): 291-295.
[12] XIONG Jing, GAO Yan and WANG Ya-yu. Software Defect Prediction Model Based on Adaboost Algorithm [J]. Computer Science, 2016, 43(7): 186-190.
[13] PI Jia-li, WU Zheng-zhong and CHEN Zhuo. Specific Target Tracking and Recognition Based on Adaboost-CSHG [J]. Computer Science, 2016, 43(4): 318-321.
[14] SONG Xiang-fa, CAO Zhi-wei, ZHENG Feng-bin and JIAO Li-cheng. Classification of Hyperspectral Remote Sensing Image Based on Random Subspace and Kernel Extreme Learning Machine Ensemble [J]. Computer Science, 2016, 43(3): 301-304.
[15] HUO Yu-lin and FU Yi-de. Face Detection Design Based on Zynq [J]. Computer Science, 2016, 43(10): 322-325.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!