计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 220-231.doi: 10.11896/jsjkx.241000010
胡立彬1, 张云峰2, 刘培德3
HU Libin1, ZHANG Yunfeng2, LIU Peide3
摘要: 合成过采样方法(Synthetic Oversampling Method)是解决不平衡分类问题的重要手段,但当前的合成过采样方法在处理高维不平衡分类问题时仍面临诸多挑战。针对当前合成过采样方法未考虑噪声样本造成的误差累积、对样本空间距离过度依赖、合成样本的分布牺牲负类样本识别精度这3个问题,提出一种基于无噪梯度分布的合成过采样方法。首先,利用样本的梯度贡献属性作为度量样本标签置信度的指标并过滤数据集中的噪声标签样本,避免了噪声样本作为根样本造成的误差累积。其次,根据梯度贡献指标和安全梯度阈值将正类样本分配到不同的梯度区间,并选择安全梯度区间内的样本作为根样本,根样本的梯度右近邻作为辅助样本,不仅摆脱了对空间距离度量的依赖,而且保证了决策边界不断往负类样本移动。最后,设计了基于余弦相似度的安全梯度分布近似策略,用于计算每个安全梯度区间内需要生成的样本数量,该策略合成后的样本分布可以使决策边界以安全的方式向负类样本移动,因此不会明显牺牲负类样本的识别精度。在来自KEEL,UCI和Kaggle平台的数据集上的实验表明,所提出的算法在提升分类器Recall值的同时,也可以获得很好的F1-Score,G-Mean和MCC值。
中图分类号:
[1]TIAN Y,BIAN B,TANG X F,et al.A new non-kernel quadra-tic surface approach for imbalanced data classification in online credit scoring[J].Information Science,2021,563:150-165. [2]CHARIZANOS G,DEMIRHAN H,ICEN D.An online fuzzy fraud detection framework for credit card transactions[J].Expert Systems With Applications,2024,252(PA):124127. [3]REB H J,TANG Y H,DONG W Y,et al.Dynamic ensemble handling class imbalance in network intrusion detection[J].Expert Systems With Applications,2023,229(PA):120420. [4]WANG C J,XIN C,XU Z L.A novel deep metric learning model for imbalanced fault diagnosis and toward open-set classification[J].Knowledge-Based Systems,2021,220:106925. [5]BARUA S,ISIAM M M,YAO X,et al.MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning[J].IEEE Transactions on Knowledge and Data Engineering,2024,26(2):405-425. [6]NEKOOEIMEHR I,SUSANA K,LAI Y.Adaptive semi-unsupervised weighted oversampling(A-SUWO) for imbalanced datasets[J].Expert Systems With Applications,2016,46:405-416. [7]WANG X X,LI L X,LIN H.A Review of SMOTE Algorithm Research[J].Journal of Frontiers of Computer Science & Technology,2024,18(5):1135-1159. [8]CHAWLA N,BOWYER W K,HAALL O L,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].The Journal of Artificial Intelligence Research,2002,16:321-357. [9]HE H B,BAI Y,GARCIA E A,et al.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks.IEEE World Congress on Computational Intelligence.2008:1322-1328. [10]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A NewOver-Sampling Method in Imbalanced Data Sets Learning[C]//Lecture Notes in Computer Science.2005:878-887. [11]NGUYEN H M,COOPER E W,KAMEI K.Borderline over-sampling for imbalanced data classification[J].International Journal of Knowledge Engineering and Soft Data Paradigms,2011,3(1):4-21. [12]BUNKHUMPORNPAT C,SINAPIROMSARA K,LURSINASAP C.Safe-Level-SMOTE:Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]//13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.2009:475-482. [13]ZHENG J,QU H C,LI Z N,et al.A novel autoencoder approach to feature extraction with linear separability for high-dimensio-nal data[J].PeerJ Computer Science,2022,8:e1061. [14]LI B Y,LIU Y,WANG X G.Gradient Harmonized Single-Stage Detector[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8577-8584. [15]CHEN Y Q,PEDRYCZ W,YANG J.A new boundary-degreebased oversampling method for imbalanced data[J].Applied Intelligence,2023,53(22):26518-26541. [16]LI J N,ZHU Q S,WU Q W,et al.A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors[J].Information Science,2021,565:438-455. [17]WANG W T,YANG L J,ZHANG J H,et al.Natural local density-based adaptive oversampling algorithm for imbalanced classification[J].Knowledge-Based Systems,2024,295:111845. [18]LI M,ZHOU H,LIU Q,et al.WRND:A weighted oversampling framework with relative neighborhood density for imbalanced noisy classification[J].Expert Systems With Applications,2024,241:122593. [19]LENG Q K,GUO J M,JIAO E J,et al.NanBDOS:Adaptive and parameter-free borderline oversampling via natural neighbor search for class-imbalance learning[J].Knowledge-Based Systems,2023,274:110665. [20]YAN Y T,JIANG Y F,ZHENG Z,et al.LDAS:Local density based adaptive sampling for imbalanced data classification[J].Expert Systems with Applications,2022,191:116213. [21]TAO X,ZHANG X,ZHENG Y,et al.A Mean Shift-guidedoversampling with self-adaptive sizes for imbalanced data classification[J].Information Science,2024,672:120699. [22]ZHANG Z,TIAN H P,JIN J S.Multiple adaptive over-sampling for imbalanced data evidential classification[J].Engineering Applications of Artificial Intelligence,2024,133(F):108532. [23]SUN L,LI M M,DING W P,et al.AFNFS:Adaptive fuzzyneighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data[J].Information Science,2022,612:724-744. [24]MOUTAOUAKIL K,ROUDANI M,QUISSARI A.OptimalEntropy Genetic Fuzzy-C-Means SMOTE(OEGFCM-SMOTE)[J].Knowledge-Based Systems,2023,262:110235. [25]MENG D X,LI Y J.An imbalanced learning method by combining SMOTE with Center Offset Factor[J].Applied Soft Computing,2022,120:108618. [26]WANG X L,GONG J,SONG Y,et al.Adaptively weightedthree-way decision oversampling:A cluster imbalanced-ratio based approach[J].Applied Intelligence,2022,53(1):312-335. [27]XU Z Z,SHEN D R,KOU Y,et al.A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(3):3740-3753. [28]LI J N,ZHU Q S,WU Q W,et al.SMOTE-NaN-DE:Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution[J].Knowledge-Based Systems,2021,223:107056. [29]PARK S,LEE H,IM J.Relabeling & raking algorithm for imbalanced classification[J].Expert Systems With Applications,2024,247:123274. [30]LIU R J.A novel synthetic minority oversampling techniquebased on relative and absolute densities for imbalanced classification[J].Applied Intelligence,2023,53(1):786-803. [31]ZHENG Y F,WANG M N.Oversampling Method for imba-lanced Data based on Variance Transfer[J].Computer Science,2024,51(S1):657-662. |
|