计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 77-85.doi: 10.11896/jsjkx.200900013
所属专题: 智能数据治理技术与系统
黄颖琦, 陈红梅
HUANG Ying-qi, CHEN Hong-mei
摘要: 非平衡问题是数据挖掘领域中普遍存在的一个问题,数据的偏态分布会使得分类器的分类效果不理想。卷积神经网络作为一种高效的数据挖掘工具,被广泛应用于分类任务,但其训练过程若受到数据非平衡的不利影响,则将导致少数类的分类准确率下降。针对二分类非平衡数据分类问题,文中提出了一种基于代价敏感卷积神经网络的非平衡问题混合方法。首先将密度峰值聚类算法与SMOTE相结合,通过过采样对数据进行预处理,降低原始数据集的不平衡程度;然后利用代价敏感思想对非平衡数据中的不同类别给予不同权重,并考虑预测值与标签值之间的欧氏距离,对非平衡数据中多数类和少数类赋予不同的代价损失,构建代价敏感卷积神经网络模型,以提高卷积神经网络对少数类的识别率。选取6个不同的数据集,用于验证所提方法的有效性。实验结果表明,所提方法可以提高卷积神经网络模型对非平衡数据的分类性能。
中图分类号:
[1]WAHAB N,KHAN A,LEE Y S.Two-phase deep convolutional neural network for reducing class skewness in histopathological images based breast cancer detection[J].Computers in Biology and Medicine,2017,85:86-97. [2]WEI W,LI J J,CAO L B,et al.Effective detection of sophisticated online banking fraud on extremely imbalanced data[J].World Wide Web-internet and Webinformation Systems,2013,16(4):449-475. [3]ENGEN V,VINCENT J,PHALP K.Enhancing network based intrusion detection for imbalanced data[J].International Journal of Knowledge-based and Intelligent Engineering Systems,2008,12(5/6):357-367. [4]MAO W T,HE L,YAN Y J,et al.Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine[J].Mechanical Systems and Signal Processing,2017,83:450-473. [5]CHAWLA N V,JAPKOWICZ N,KOTCZ A.Special issue onlearning from imbalanced data sets[J].ACM Sigkdd Explorations Newsletter,2004,6(1):1-6. [6]GUO H X,LI Y J,SHANG J,et al.Learning from class-imba-lanced data:Review of methods and applications[J].Expert Systems with Applications,2017,73:220-239. [7]WANG Q.A Hybrid Sampling SVM Approach to ImbalancedData Classification[J].Abstract and Applied Analysis,2014,11(6):1-7. [8]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.A Review on Ensembles for the Class Imbalance Problem:Bagging-,Boosting-,and Hybrid-Based Approaches[J].IEEE Transactions on Systems Man & Cybernetics Part C Applications & Reviews,2012,42(4):463-484. [9]BATISTA G E,PRATI R C,MONARD M C,et al.A study of the behavior of several methods for balancing machine learning training data[J].Sigkdd Explorations,2004,6(1):20-29. [10]FERNANDEZ A,GARCIA S,JESUS M J,et al.A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced datasets[J].Fuzzy Sets and Systems,2008,159(18):2378-2398. [11]PANT H,SRIVASTAVA R.A survey on feature selectionmethods for imbalanced datasets[J].International Journal of Computer Engineering & Application,2015,9:197-204. [12]MOAYEDIKIA A,ONG K L,BOO Y L,et al.Feature selection for high dimensional imbalanced class data using harmony search[J].Engineering Applications of Artificial Intelligence,2017,57:38-49. [13]MALDONADO S,LOPEZ J.Dealing with high-dimensionalclass-imbalanced datasets:Embedded feature selection for SVM classification[J].Applied Soft Computing,2018,67:94-105. [14]THAINGHE N,GANTNER Z,SCHMIDTTHIEME L,et al.Cost-sensitive learning methods for imbalanced data[C]//The 2010 International Joint Conference on Neural Networks (IJCNN).IEEE,2010:1-8. [15]KRAWCZYK B,WOZNIAK M,HERRERA F,et al.Weighted one-class classification for different types of minority class examples in imbalanced data[C]//2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).IEEE,2014:337-344. [16]SUN Z B,SONG Q B,ZHU X Y,et al.A novel ensemble me-thod for classifying imbalanced data[J].Pattern Recognition,2015,48(5):1623-1637. [17]LI F L,ZHANG X Y,ZHANG X Q,et al.Cost-sensitive andhybrid-attribute measure multi-decision tree over imbalanced data sets[J].Information Sciences,2018,422:242-256. [18]KRAWCZYK B,WOŹNIAK M,SCHAEFER G.Cost-sensitive decision tree ensembles for effective imbalanced classification[J].Applied Soft Computing,2014,14:554-562. [19]WANG C,YU Q,LUO R S,et al.Adaptive Ensemble of Classifiers with Regularization for Imbalanced Data Classification.[J].arXiv:Learning,2019. [20]ZHU Z H,WANG Z,LI D D,et al.Geometric Structural Ensemble Learning for Imbalanced Problems[J].IEEE Transactions on Systems,Man,and Cybernetics,2020,50(4):1617-1629. [21]ZHU W X,ZHONG P.A new one-class SVM based on hidden information[J].Knowledge Based Systems,2014,60:35-43. [22]BUDA M,MAKI A,MAZUROWSKI M A.A systematic study of the class imbalance problem in convolutional neural networks[J].Neural Networks,2018,106:249-259. [23]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [24]YU D,LIU G,GUO M,et al.Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J].IEEE Access,2019,7:34301-34317. [25]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357. [26]DOUZAS G,BACAO F.Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J].Information Sciences,2019,501:118-135. [27]PAN T,ZHAO J,WU W,et al.Learning imbalanced datasets based on SMOTE and Gaussian distribution[J].Information Sciences,2020,512:1214-1233. [28]DOUZAS G,BACAO F.Self-Organizing Map Oversampling(SOMO) for imbalanced data set learning[J].Expert systems with Applications,2017,82:40-52. [29]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20. [30]GONG L,JIANG S,JIANG L.Tackling Class Imbalance Pro-blem in Software Defect Prediction Through Cluster-based Over-sampling with Filtering[J].IEEE Access,2019(99):1. [31]KHAN S H,HAYAT M,BENNAMOUN M,et al.Cost Sensitive Learning of Deep Feature Representations from Imbalanced Data[J].IEEE Transactions on Neural Networks,2018,29(8):3573-3587. [32]GENG Y,LUO X Y.Cost-sensitive convolution based neuralnetworks for imbalanced time-series classification[J].arXiv:1801.04396,2018. [33]JIA F,LEI Y G,LU N,et al.Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization[J].Mechanical Systems and Signal Processing,2018,110:349-367. [34]CHEN L T,XU G H,ZHANG Q,et al.Learning deep representation of imbalanced SCADA data for fault detection of wind turbines[J].Measurement,2019,139:370-379. [35]TAGHANAKI S A,ZHENG Y F,ZHOU S K,et al.Combo loss:Handling input and output imbalance in multi-organ segmentation[J].Computerized Medical Imaging and Graphics,2019,75(4):24-33. [36]BALOCH B K,KUMAR S,HARESH S,et al.Focused Anchors Loss:cost-sensitive learning of discriminative features for imba-lanced classification[C]//Asian Conference on MachineLear-ning.2019:822-835. [37]WAN W T,ZHONG Y Y,LI T P,et al.Rethinking feature distribution for loss functions in image classification[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9117-9126. [38]GOYAL P,KAIMING H.Focal loss for dense object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,39:2999-3007. [39]PASUPA K,VATATHANAVARO S,TUNGJITNOB S,et al.Convolutional neural networks based focal loss for class imba-lance problem:A case study of canine red blood cells morphology classification[J].Journal of Ambient Intelligence and Humani-zed Computing,2020,56(4):1-17. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[7] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[8] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[9] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[10] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[11] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[12] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
[13] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
[14] | 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210 |
[15] | 孙洁琪, 李亚峰, 张文博, 刘鹏辉. 基于离散小波变换的双域特征融合深度卷积神经网络 Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation 计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199 |
|