Computer Science ›› 2017, Vol. 44 ›› Issue (8): 225-229.doi: 10.11896/j.issn.1002-137X.2017.08.038

Previous Articles     Next Articles

Imbalanced Data Classification Method Based on Neighborhood Hybrid Sampling and Dynamic Ensemble

GAO Feng and HUANG Hai-yan   

  • Online:2018-11-13 Published:2018-11-13

Abstract: The class imbalance problems severely affect the performance of the traditional classification algorithm,lea-ding to decrease the recognition rate of the minority.In order to solve this problem,a hybrid sampling technology based on neighborhood characteristic was proposed to enhance the classification accuracy of minority class.This technology changes the sampling weight according to the class distribution in the samples neighborhood,and uses the hybrid samp-ling to obtain the balanced data subset.Then the base classifiers are generated,for each test sample,a dynamic ensemble method based on local confidence is proposed to select the optimal base classifier sets.The experiments on UCI datasets show that the method has high classification accuracy rate of both minority and majority class for imbalance datasets.

Key words: Data mining,Imbalanced data,K-nearest neighbor,Hybrid sampling,Ensemble learning

[1] KRAWCZYK B,WOZ'niak M.Hypertension Type Classification Using Hierarchical Ensemble of One-Class Classifiers for Imba-lanced Data[M]∥ICT Innovations 2014.Springer International Publishing,2015:341-349.
[2] CAO P,LI B,LI W,et al.Hybrid Sampling Algorithm Based on Probability Distribution Estimation[J].Control and Decision,2014(5):815-520.(in Chinese) 曹鹏,李博,栗伟,等.基于概率分布估计的混合采样算法[J].控制与决策,2014(5):815-520.
[3] CHAO W L,LIU J Z,DING J J.Facial age estimation based on label-sensitive learning and age-oriented regression[J].Pattern Recognition,2013,46(3):628-641.
[4] ZHANG D,ISLAM M M,LU G.A review on automatic image annotation techniques[J].Pattern Recognition,2012,45(1):346-362.
[5] LI J,LI H,YU J L.Application of Random-SMOTE on Imba-lanced Data Mining[C]∥2011 Fourth International Conference on Business Intelligence and Financial Engineering(BIFE).2011:130-133.
[6] RAMENTOL E,CABALLERO Y,BELLO R,et al.SMOTE-RSB*:a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory[J].Knowledge & Information Systems,2011,33(2):245-265.
[7] CHAWLA N V,CIESLAK D A,HALL L O,et al.Automatically countering imbalance and its empirical relationship to cost[J].Data Mining and Knowledge Discovery,2008,17(2):225-252.
[8] WANG S,YAO X.Diversity analysis on imbalanced data sets by using ensemble models[C]∥IEEE Symposium on Computatio-nal Intelligence and Data Mining,2009(CIDM’09).IEEE,2009:535-548.
[9] BASZCZY S J,STEFANOWSKI J.Neighbourhood samplingin bagging for imbalanced data[J].Neurocomputing,2015,150:529-542.
[10] CHAWLA N V,LAZAREVIC A,HALL L O,et al.SMOTE-Boost:Improving Prediction of the Minority class in Boosting[J].Lecture Notes in Computer Science,2003,8:107-119.
[11] LI X F,LI J,DONG Y F,et al.A New Learning Algorithm for Imbalanced Data-PCBoost[J].Chinese Journal of Computers,2012,5(2):202-209.(in Chinese).李雄飞,李军,董元方,等.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):202-209.
[12] LI K W,YANG L,LIU W Y,et al.Classification Method of Imbalanced Data Based on RSBoost[J].Computer Science,2015,2(9):249-252.(in Chinese) 李克文,杨磊,刘文英,等.基于RSBoost算法的不平衡数据分类方法[J].计算机科学,2015,42(9):249-252.
[13] NAPIERA,KRYSTYNA A,STEFANOWSKI J,et al.Lear- ning from imbalanced data in presence of noisy and borderline examples[C]∥International Conference on Rough Sets and Current Trends in Computing.Springer-Verlag,2010:158-167.
[14] NAPIERALA K,STEFANOWSKI J.Identification of different types of minority class examples in imbalanced data[C]∥International Conference on Hybrid Artificial Intelligent Systems.Springer-Verlag,2012:139-150.
[15] WEISS G M.The impact of small disjuncts on classifier learning[M]∥Data Mining.Springer US,2010:193-226.
[16] NAPIERALA K.Improving rule classifiers for imbalanced data[D].Poznan University of Technology,2013.
[17] WILSON D R,MARTINEZ T R.Improved heterogeneous distance functions[J].Journal of Artificial Intelligence Research,2000,6(1):1-34.
[18] LI L,ZOU B,HU Q,et al.Dynamic classifier ensemble using classification confidence[J].Neurocomputing,2013,99(99):581-591.
[19] JAPKOWICZ N,SHAH M.Evaluating Learning Algorithms:A Classification Perspective.http://www.openisbn.com/download/0521196000.pdf.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .