Computer Science ›› 2021, Vol. 48 ›› Issue (9): 77-85.doi: 10.11896/jsjkx.200900013

Special Issue: Intelligent Data Governance Technologies and Systems

• Intelligent Data Governance Technologies and Systems • Previous Articles     Next Articles

Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification

HUANG Ying-qi, CHEN Hong-mei   

  1. School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,ChinaKey Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China
  • Received:2020-09-02 Revised:2021-01-21 Online:2021-09-15 Published:2021-09-10
  • About author:HUANG Ying-qi,born in 1988,postgraduate.Her main research interests include machine learning and data mi-ning.
    CHEN Hong-mei,born in 1971,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include granular calculation,rough sets and intelligent information processing.
  • Supported by:
    National Natural Science Foundation of China(61976182,62076171) and Key Program for International S&T Cooperation of Sichuan Province(2019YFH0097).

Abstract: The imbalance classification is a common problem in the field of data mining.In general,the skewed distribution of data makes the classification effect of the classifier unsatisfactory.As an efficient data mining tool,convolutional neural network is widely used in classification tasks.However,if the training process is adversely affected by data imbalance,it will cause the classification accuracy of minority classes to decrease.Aiming at the classification problem of two-class unbalanced data,this paper proposes a hybrid method for unbalanced classification problems based on cost-sensitive convolutional neural networks.The proposed method first combines the density peak clustering algorithm with SMOTE,and preprocesses the data through oversampling to reduce the imbalance of the original data set.Then the cost sensitive is used to give different weights to different categories in the unbalanced data.Additionally,the Euclidean distance between the predicted value and the label value is considered.The proposed method assigns different cost losses to the majority class and the minority class in the unbalanced data to construct cost sensitivity convolutional neural network model to improve the recognition rate of convolutional neural network for minority classes.Six different datasets are used to verify the effectiveness of the proposed method.The experimental results show that the proposed method is able to improve the classification performance of the convolutional neural network model on unbalanced data.

Key words: Convolutional neural network, Cost-sensitive loss function, Data preprocessing, Imbalance classification, Oversampling

CLC Number: 

  • TP391
[1]WAHAB N,KHAN A,LEE Y S.Two-phase deep convolutional neural network for reducing class skewness in histopathological images based breast cancer detection[J].Computers in Biology and Medicine,2017,85:86-97.
[2]WEI W,LI J J,CAO L B,et al.Effective detection of sophisticated online banking fraud on extremely imbalanced data[J].World Wide Web-internet and Webinformation Systems,2013,16(4):449-475.
[3]ENGEN V,VINCENT J,PHALP K.Enhancing network based intrusion detection for imbalanced data[J].International Journal of Knowledge-based and Intelligent Engineering Systems,2008,12(5/6):357-367.
[4]MAO W T,HE L,YAN Y J,et al.Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine[J].Mechanical Systems and Signal Processing,2017,83:450-473.
[5]CHAWLA N V,JAPKOWICZ N,KOTCZ A.Special issue onlearning from imbalanced data sets[J].ACM Sigkdd Explorations Newsletter,2004,6(1):1-6.
[6]GUO H X,LI Y J,SHANG J,et al.Learning from class-imba-lanced data:Review of methods and applications[J].Expert Systems with Applications,2017,73:220-239.
[7]WANG Q.A Hybrid Sampling SVM Approach to ImbalancedData Classification[J].Abstract and Applied Analysis,2014,11(6):1-7.
[8]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.A Review on Ensembles for the Class Imbalance Problem:Bagging-,Boosting-,and Hybrid-Based Approaches[J].IEEE Transactions on Systems Man & Cybernetics Part C Applications & Reviews,2012,42(4):463-484.
[9]BATISTA G E,PRATI R C,MONARD M C,et al.A study of the behavior of several methods for balancing machine learning training data[J].Sigkdd Explorations,2004,6(1):20-29.
[10]FERNANDEZ A,GARCIA S,JESUS M J,et al.A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced datasets[J].Fuzzy Sets and Systems,2008,159(18):2378-2398.
[11]PANT H,SRIVASTAVA R.A survey on feature selectionmethods for imbalanced datasets[J].International Journal of Computer Engineering & Application,2015,9:197-204.
[12]MOAYEDIKIA A,ONG K L,BOO Y L,et al.Feature selection for high dimensional imbalanced class data using harmony search[J].Engineering Applications of Artificial Intelligence,2017,57:38-49.
[13]MALDONADO S,LOPEZ J.Dealing with high-dimensionalclass-imbalanced datasets:Embedded feature selection for SVM classification[J].Applied Soft Computing,2018,67:94-105.
[14]THAINGHE N,GANTNER Z,SCHMIDTTHIEME L,et al.Cost-sensitive learning methods for imbalanced data[C]//The 2010 International Joint Conference on Neural Networks (IJCNN).IEEE,2010:1-8.
[15]KRAWCZYK B,WOZNIAK M,HERRERA F,et al.Weighted one-class classification for different types of minority class examples in imbalanced data[C]//2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).IEEE,2014:337-344.
[16]SUN Z B,SONG Q B,ZHU X Y,et al.A novel ensemble me-thod for classifying imbalanced data[J].Pattern Recognition,2015,48(5):1623-1637.
[17]LI F L,ZHANG X Y,ZHANG X Q,et al.Cost-sensitive andhybrid-attribute measure multi-decision tree over imbalanced data sets[J].Information Sciences,2018,422:242-256.
[18]KRAWCZYK B,WOŹNIAK M,SCHAEFER G.Cost-sensitive decision tree ensembles for effective imbalanced classification[J].Applied Soft Computing,2014,14:554-562.
[19]WANG C,YU Q,LUO R S,et al.Adaptive Ensemble of Classifiers with Regularization for Imbalanced Data Classification.[J].arXiv:Learning,2019.
[20]ZHU Z H,WANG Z,LI D D,et al.Geometric Structural Ensemble Learning for Imbalanced Problems[J].IEEE Transactions on Systems,Man,and Cybernetics,2020,50(4):1617-1629.
[21]ZHU W X,ZHONG P.A new one-class SVM based on hidden information[J].Knowledge Based Systems,2014,60:35-43.
[22]BUDA M,MAKI A,MAZUROWSKI M A.A systematic study of the class imbalance problem in convolutional neural networks[J].Neural Networks,2018,106:249-259.
[23]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[24]YU D,LIU G,GUO M,et al.Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J].IEEE Access,2019,7:34301-34317.
[25]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[26]DOUZAS G,BACAO F.Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J].Information Sciences,2019,501:118-135.
[27]PAN T,ZHAO J,WU W,et al.Learning imbalanced datasets based on SMOTE and Gaussian distribution[J].Information Sciences,2020,512:1214-1233.
[28]DOUZAS G,BACAO F.Self-Organizing Map Oversampling(SOMO) for imbalanced data set learning[J].Expert systems with Applications,2017,82:40-52.
[29]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20.
[30]GONG L,JIANG S,JIANG L.Tackling Class Imbalance Pro-blem in Software Defect Prediction Through Cluster-based Over-sampling with Filtering[J].IEEE Access,2019(99):1.
[31]KHAN S H,HAYAT M,BENNAMOUN M,et al.Cost Sensitive Learning of Deep Feature Representations from Imbalanced Data[J].IEEE Transactions on Neural Networks,2018,29(8):3573-3587.
[32]GENG Y,LUO X Y.Cost-sensitive convolution based neuralnetworks for imbalanced time-series classification[J].arXiv:1801.04396,2018.
[33]JIA F,LEI Y G,LU N,et al.Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization[J].Mechanical Systems and Signal Processing,2018,110:349-367.
[34]CHEN L T,XU G H,ZHANG Q,et al.Learning deep representation of imbalanced SCADA data for fault detection of wind turbines[J].Measurement,2019,139:370-379.
[35]TAGHANAKI S A,ZHENG Y F,ZHOU S K,et al.Combo loss:Handling input and output imbalance in multi-organ segmentation[J].Computerized Medical Imaging and Graphics,2019,75(4):24-33.
[36]BALOCH B K,KUMAR S,HARESH S,et al.Focused Anchors Loss:cost-sensitive learning of discriminative features for imba-lanced classification[C]//Asian Conference on MachineLear-ning.2019:822-835.
[37]WAN W T,ZHONG Y Y,LI T P,et al.Rethinking feature distribution for loss functions in image classification[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9117-9126.
[38]GOYAL P,KAIMING H.Focal loss for dense object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,39:2999-3007.
[39]PASUPA K,VATATHANAVARO S,TUNGJITNOB S,et al.Convolutional neural networks based focal loss for class imba-lance problem:A case study of canine red blood cells morphology classification[J].Journal of Ambient Intelligence and Humani-zed Computing,2020,56(4):1-17.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[5] LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[6] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[7] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[8] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[9] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[10] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[11] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[12] SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[13] ZHAO Zheng-peng, LI Jun-gang, PU Yuan-yuan. Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network [J]. Computer Science, 2022, 49(6): 199-209.
[14] HU Fu-yuan, WAN Xin-jun, SHEN Ming-fei, XU Jiang-lang, YAO Rui, TAO Zhong-ben. Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network [J]. Computer Science, 2022, 49(5): 10-24.
[15] XU Hua-chi, SHI Dian-xi, CUI Yu-ning, JING Luo-xi, LIU Cong. Time Information Integration Network for Event Cameras [J]. Computer Science, 2022, 49(5): 43-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!