计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 48-58.doi: 10.11896/jsjkx.230600022
严远亭, 马迎澳, 任艳平, 张燕平
YAN Yuanting, MA Yingao, REN Yanping, ZHANG Yanping
摘要: 多数类欠采样是当前数据层面解决不平衡数据学习的主流技术之一,近年来,研究者们提出了一系列的欠采样方法,但大多都将重点放在如何选择代表性的样本,从而降低信息损失。然而,如何在欠采样过程中保持多数类内部的结构信息,仍然是欠采样面临的主要挑战。针对该挑战,提出了一种基于构造性神经网络和全局分布密度的不平衡数据集欠采样方法。该方法首先基于构造性神经网络,设计了一种多数类局部模式的学习方法;然后基于多数类局部模式,设计了两种具有结构保持特性的样本选择策略;最后针对局部模式学习的随机性可能导致的采样结果非优的问题,进一步引入了bagging集成策略,提升了方法的性能。在59个数据集上与13种对比方法进行了对比实验,验证了所提方法在G-mean,AUC和F1-score这3个常用指标上的有效性。
中图分类号:
[1]CHAMSEDDINE E,MANSOURI N,SOUI M,et al.Handling class imbalance in COVID-19 chest X-ray images classification:Using SMOTE and weighted loss[J].Applied Soft Computing,2022,129:109588. [2]GUO J F,WANG M S,SUN L,et al.New method of fault diagnosis for rolling bearing imbalance data set based on generative adversarial network[J].Computer Integrated Manufacturing Systems,2022,28(9):2825-2835. [3]CHEN Z,ZHU M,DU J W.Multi-view graph neural network for fraud detection algorithm[J].Journal on Communications,2022,43(11):225-232. [4]XIE Y X,QIU M,ZHANG H B,et al.Gaussian distributionbased oversampling for imbalanced data classification[J].IEEE Transactions on Knowledge and Data Engineering,2022,34(2):667-679. [5]LIN W C,TSAI C F,HU Y H,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409:17-26. [6]ZHANG Y Q,LU R Z,QIAO S J,et al.A Sampling Method of Imbalanced Data Based on Sample Space[J].Acta Automatica Sinica,2022,48(10):2549-2563. [7]DONG H C,WEN Z Y,WAN Y H,et al.An imbalanced dataclassification algorithm based on DPC clustering resampling combined with ELM[J].Computer Engineering & Science,2021,43(10):1856-1863. [8]DRUMMOND C,HOLTE R C.C4.5,class imbalance,and cost sensitivity:why under-sampling beats over-sampling[C]//Workshop on learning from imbalanceddatasets II.2003:1-8. [9]WANG S,MINKU L L,YAO X.Resampling-based ensemblemethods for online class imbalance learning[J].IEEE Transactions on Knowledge and Data Engineering,2014,27(5):1356-1368. [10]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357. [11]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:a newover-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin:Sprin-ger,2005:878-887. [12]HE H B,BAI Y,GARCIA E A,et al.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks(IEEE world Congress on Computational Intelligence).IEEE,2008:1322-1328. [13]BARUA S,ISLAM M M,YAO X,et al.MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning[J].IEEE Transactions on Knowledge and Data Engineering,2012,26(2):405-425. [14]BUNKHUMPORNPAT C,SINAPIROMSARAN K,LURSIN-SAP C.Safe-level-smote:Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Advances in Knowledge Discovery and Data Mining:13th Pacific-Asia Conference.Berlin:Springer,2009:475-482. [15]KOZIARSKI M.Radial-based undersampling for imbalanced data classification[J].Pattern Recognition,2020,102:107262. [16]ZHANG Y P,ZHANG L,WANG Y C.Cluster-based majority under-sampling approaches for class imbalance learning[C]//2010 2nd IEEE International Conference on Information and Financial Engineering.IEEE,2010:400-404. [17]BARANDELA R,VALDOVINOS R M,SÁNCHEZ J S.Newapplications of ensembles of classifiers[J].Pattern Analysis & Applications,2003,6:245-256. [18]LIU X Y,WU J X,ZHOU Z H.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),2008,39(2):539-550. [19]SUN Y M,KAMEL M S,WONG A K C,et al.Cost-sensitiveboosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378. [20]ZHU Z H,WANG Z,LI D D,et al.Geometric structural ensemble learning for imbalanced problems[J].IEEE Transactions on cybernetics,2018,50(4):1617-1629. [21]ZHANG L,ZHANG B.A geometrical representation of McCulloch-Pitts neural model and its applications[J].IEEE Transactions on Neural Networks,1999,10(4):925-929. [22]EFRAIMIDIS P S,SPIRAKIS P G.Weighted random sampling with a reservoir[J].Information Processing Letters,2006,97(5):181-185. [23]VUTTIPITTAYAMONGKOL P,ELYAN E.Neighbourhood-based undersampling approach for handling imbalanced and overlapped data[J].Information Sciences,2020,509:47-70. [24]SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J,et al.RUSBoost:A hybrid approach to alleviating class imbalance[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2009,40(1):185-197. [25]GALAR M,FERNÁNDEZ A,BARRENECHEA E,et al.EUSBoost:Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern recognition,2013,46(12):3460-3471. [26]BĿASZCZYHSKI J,DECKERT M,STEFANOWSKI J,et al.IIvotes ensemble for imbalanced data[J].Intelligent Data Analysis,2012,16(5):777-801. |
|