Computer Science ›› 2019, Vol. 46 ›› Issue (1): 94-99.doi: 10.11896/j.issn.1002-137X.2019.01.014

• CCDM2018 • Previous Articles     Next Articles

Sample Adaptive Classifier for Imbalanced Data

CAI Zi-xin, WANG Xin-yue, XU Jian, JING Li-ping   

  1. (Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China)
  • Received:2018-04-23 Online:2019-01-15 Published:2019-02-25

Abstract: In the era of big data,the imbalanced data is ubiquitous and inevitable,which has been a critical classification issue.Taking binary classification as an example,traditional learning algorithms can not sufficiently learn the hidden patterns from the minority class and may be biased towards majority class.To solve this problem,an effective way is using the cost-sensitive learning to improve the performance of prediction for the minority class which assigns ahighercost to misclassification of the minority.However,these methods equally treat the instances within one class.Actually,different instances may make different contributions to learning process.In order to make the cost-sensitive learning more effective,this paper proposed a sample-adaptive and cost-sensitive strategy for the classification of imbalanced data,which assigns a different weight to every single instance if misclassification occurs.Firstly,the strategy determines the distances between the boundary and instances according to the local distribution of the instances.Then,it assigns higher weights to the instances nearer to the boundary on the top of the margin theory.In this paper,the proposed strategy was applied to the classical LDM method.And a series of experiments on the UCI datasets prove that the sample-adaptive and cost-sensitive strategy can effectively improve the classifier’s performance on imbalanced data classification.

Key words: Boundary sample, Classification, Cost-sensitive learning

CLC Number: 

  • TP391
[1]RADIVOJAC P,CHAWLA N V,DUNKER A K,et al.Classification and knowledge discovery in protein databases[J].Journal of Biomedical Informatics,2004,37(4):224-239.<br /> [2]ZOU Q,GUO M Z,LIU Y,et al.A classification method for class imbalanced data and its application on bioinformatics[J].Journal of Computer Research and Development,2010,47(8):1407-1414.(in Chinese)<br /> 邹权,郭茂祖,刘扬,等.类别不平衡的分类方法及在生物信息学中的应用[J].计算机研究与发展,2010,47(8):1407-1414.<br /> [3]EZAWA K J,SINGH M,NORTON S W.Learning goal oriented Bayesian networks for telecommunications risk management[C]//Proceedings of the International Conference on Machine Lear-ning.Bari,Italy:Morgan Kauffman,1996:139-147.<br /> [4]SANZ JA,BERNARDO D,HERRERA F,et al.A compact evolutionary interval valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data[C]//Proceedings of IEEE Trans on Fuzzy Systems,2015,23(4):973-990.<br /> [5]SU J S,ZHANG B F,XU X.Advances in machine learning based text categorization[J].Journal of Software,2006,17(9):1848-1859.(in Chinese)<br /> 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.<br /> [6]DEEBA F,MOHAMMED S K,BUI F M,et al.Learning from imbalanced data:a comprehensive comparison of classifier performance for bleeding detection in endoscopic video[C]//Proceedings of International Conference on Informatics,Electronics and Vision.IEEE,2016:1006-1009.<br /> [7]RANI K U,RAMADEVI G N,LAVANYA D.Performance of synthetic minority oversampling technique on imbalanced breast cancer data[C]//Proceedings of International Conference on Computing for Sustainable Global Development.IEEE,2016:1623-1627.<br /> [8]PROVOST F.Machine learning from imbalanced data sets 101[C]//Proceedings of the AAAI’2000 Workshop on Imbalanced Data.IEEE,2000.<br /> [9]RAO R B.Data mining for improved cardiac care[J].ACM SIGKDD Explorations Newsletter,2006,8(1):3-10.<br /> [10]DOMINGOS P.MetaCost:A general method for making classifiers cost-sensitive[C]//Proceedings of Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.San Diego:CA,ACM,1999:155-164.<br /> [11]VEROPOULOS K,CAMPBELL C,CRISTIANINI N.Controlling the sensitivity of support vector machines[C]//Proceedings of the International Joint Conference on Artificial Intelligence.Stockholm,Sweden,1999:55-60.<br /> [12]CHENG F Y,ZHANG J,WEN C H.Cost-sensitive large margin distribution machine for classification of imbalanced data[J].Pattern Recognition Letters,2016,80(C):107-112.<br /> [13]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(5):273-297.<br /> [14]STEFANOWSKI J.Dealing with data difficulty factors while learning from imbalanced data.http://www.cs.put.poznan.pl/jstefanowski/pub/jkbook7wersjaWWW.pdf.<br /> [15]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:Synthetic minority oversampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.<br /> [16]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]//Proceedings of International Conference on Intelligent Computing.Springer-Verlag,2005:878-887.<br /> [17]HE H,BAI Y,GARCIA E A,et al.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//Procee-dings of IEEE International Joint Conference on Neural Networks.IEEE,2008:1322-1328.<br /> [18]TANG B,HE H.KernelADASYN:Kernel based adaptive synthetic data generation for imbalanced learning[C]//Proceedings of Evolutionary Computation.IEEE,2015:664-671.<br /> [19]BATISTA G,PRATI R C,MONARD M C.A study of the behavior of several methods for balancing machine learningtrai-ning data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29.<br /> [20]CIESLAK D A,CHAWLA N V,STRIEGEL A.Combating imbalance in network intrusion datasets[C]//Proceedings of IEEE International Conference on Granular Computing.IEEE,2006:732-737.<br /> [21]BATUWITA R,PALADE V.Efficient resampling methods for training support vector machines with imbalanced datasets[C]//Proceedings of International Joint Conference on Neural Networks.IEEE,2010:1-8.<br /> [22]ZHOU Z H,LIU X Y.Training cost-sensitive neural networks with methods addressing the class imbalance problem[J].IEEE Trans on Knowledge and Data Engineering,2006,18(1):63-77.<br /> [23]SUN Z,SONG Q,ZHU X,et al.A novel ensemble method for classifying imbalanced data[J].Pattern Recognition,2015,48(5):1623-1637.<br /> [24]CHEN C,BREIMAN L.Using random forest to learn imbalanced data:Technical Report 666 .Berkeley:Department of Statistics,UC Berkeley,2004.<br /> [25]CHAN P K,STOLFO S J.Toward scalable learning with nonuniform class and cost distributions:a case study in credit card fraud detection[C]//International Conference on Knowledge Discovery and Data Mining.AAAI,1998:164-168.<br /> [26]YOAV F,SCHAPIRE R E.A desicion-theoretic generalization of online learning and an application to boosting[C]//Procee-dings of European Conference on Computational Learning Theory.Heidelberg,Berlin:Springer,1995:23-37.<br /> [27]WANG B X,JAPKOWICZ N.Boosting support vector machines for imbalanced data sets[J].Knowledge and Information Systems,2010,25(1):1-20.<br /> [28]SEIFFERT C,KHOSHGOFTAAR T M,HULSE J V,et al. RUSBoost:A hybrid approach to alleviating class imbalance[J].IEEE Trans on Systems Man and Cybernetics Part A Systems and Humans,2010,40(1):185-197.<br /> [29]GALAR M,BARRENECHEA E,HERRERA F.EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary under-sampling[J].Pattern Recognition,2013,46(12):3460-3471.<br /> [30]LIU X Y,WU J X,ZHOU Z H.Exploratory under-sampling for class-imbalance learning[J].IEEE Trans on System,Man and Cybernetics B,2009,39(2):539-550.<br /> [31]OH S,MIN S L,ZHANG B T.Ensemble learning with active example selection for imbalanced biomedical data classification[J].IEEE/ACM Trans on Computational Biology and Bioinforma-tics,2011,8(2):316-325.<br /> [32]ZHANG X X,YANG T B,SRINIVASAN P.Online asymmetric active learning with imbalanced data[C]//Proceedings of ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining.ACM,2016:2055-2064.<br /> [33]AKBANI R,KWEK S,JAPKOWICZ N.Applying support vector machines to imbalanced datasets[C]//Proceedings of the 15th European Conference on Machine Learning.Springer Berlin Heidelberg,2004:39-50.<br /> [34]GAO W,ZHOU Z H.On the doubt about margin explanation of boosting[J].Artificial Intelligence,2013,203:1-18.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[4] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[5] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[6] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[7] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[8] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[9] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[10] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[11] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[12] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[13] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[14] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[15] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!