Computer Science ›› 2025, Vol. 52 ›› Issue (9): 220-231.doi: 10.11896/jsjkx.241000010

• Database & Big Data & Data Science • Previous Articles     Next Articles

Synthetic Oversampling Method Based Noiseless Gradient Distribution

HU Libin1, ZHANG Yunfeng2, LIU Peide3   

  1. 1 School of Management Science and Engineering,Shandong University of Finance and Economics,Jinan 250014,China
    2 School of Computer Science and Technology,Shandong University of Finance and Economics,Jinan 250014,China
    3 Shandong Key Laboratory of Blockchain Finance,Shandong University of Finance and Economics,Jinan 250014,China
  • Received:2024-10-08 Revised:2025-02-15 Online:2025-09-15 Published:2025-09-11
  • About author:HU Libin,born in 1990,Ph.D,is a member of CCF(No.V6549G).His main research interests include data mining,artificial intelligence and financial intelligence risk controll.
    ZHANG Yunfeng,born in 1977,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.19888M).His main research interests include graphics,artificial intelligence,data mining and visua-lization.
  • Supported by:
    Natural Science Foundation of Shandong Province(ZR2022MF245) and Key R&D Program of Shandong Province(2023CXPT033).

Abstract: Synthetic Oversampling Method is an important means to solve imbalanced classification problem,but the current oversampling methods still have many problems when dealing with high-dimensional imbalanced classification problem.A synthetic oversampling method based on noiseless gradient distribution is proposed to address the three issues of error accumulation caused by noise samples,excessive dependence on sample space distance,and reduced recognition accuracy of negative class samples in current synthetic oversampling methods.Firstly,the gradient contribution attribute of the sample is used as the metric to mea-sure the label confidence of the sample and the noise label samples in the data set are filtered to avoid the error accumulation caused by the noise samples as the root samples.Secondly,the positive samples are assigned to different gradient intervals accor-ding to the gradient contribution metric and the safe gradient threshold,the samples in the safe gradient interval are selected as the root samples,and the gradient right nearest neighbor of the root sample are regarded as the auxiliary samples,which not only gets rid of the dependence on spatial distance measurement,but also ensures that the decision boundary moved to the negative class samples continuously.Finally,a safe gradient distribution approximation strategy based on cosine similarity is designed to calculate the number of samples to be generated in each safe gradient interval,and the synthesized sample distribution by which can make the decision boundary moved toward the negative class samples in a safe way,so the recognition accuracy of the negative class samples will not be significantly sacrificed.Experiments on datasets from KEEL,UCI and Kaggle platforms show that the proposed algorithm can not only improve the Recall value of the classifier,but also obtain satisfactory F1-Score,G-Mean and MCC values.

Key words: Gradient contribution, Noiseless gradient, Gradient right neighbor, Safe gradient distribution approximation, Synthetic oversampling

CLC Number: 

  • TP181
[1]TIAN Y,BIAN B,TANG X F,et al.A new non-kernel quadra-tic surface approach for imbalanced data classification in online credit scoring[J].Information Science,2021,563:150-165.
[2]CHARIZANOS G,DEMIRHAN H,ICEN D.An online fuzzy fraud detection framework for credit card transactions[J].Expert Systems With Applications,2024,252(PA):124127.
[3]REB H J,TANG Y H,DONG W Y,et al.Dynamic ensemble handling class imbalance in network intrusion detection[J].Expert Systems With Applications,2023,229(PA):120420.
[4]WANG C J,XIN C,XU Z L.A novel deep metric learning model for imbalanced fault diagnosis and toward open-set classification[J].Knowledge-Based Systems,2021,220:106925.
[5]BARUA S,ISIAM M M,YAO X,et al.MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning[J].IEEE Transactions on Knowledge and Data Engineering,2024,26(2):405-425.
[6]NEKOOEIMEHR I,SUSANA K,LAI Y.Adaptive semi-unsupervised weighted oversampling(A-SUWO) for imbalanced datasets[J].Expert Systems With Applications,2016,46:405-416.
[7]WANG X X,LI L X,LIN H.A Review of SMOTE Algorithm Research[J].Journal of Frontiers of Computer Science & Technology,2024,18(5):1135-1159.
[8]CHAWLA N,BOWYER W K,HAALL O L,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].The Journal of Artificial Intelligence Research,2002,16:321-357.
[9]HE H B,BAI Y,GARCIA E A,et al.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks.IEEE World Congress on Computational Intelligence.2008:1322-1328.
[10]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A NewOver-Sampling Method in Imbalanced Data Sets Learning[C]//Lecture Notes in Computer Science.2005:878-887.
[11]NGUYEN H M,COOPER E W,KAMEI K.Borderline over-sampling for imbalanced data classification[J].International Journal of Knowledge Engineering and Soft Data Paradigms,2011,3(1):4-21.
[12]BUNKHUMPORNPAT C,SINAPIROMSARA K,LURSINASAP C.Safe-Level-SMOTE:Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]//13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.2009:475-482.
[13]ZHENG J,QU H C,LI Z N,et al.A novel autoencoder approach to feature extraction with linear separability for high-dimensio-nal data[J].PeerJ Computer Science,2022,8:e1061.
[14]LI B Y,LIU Y,WANG X G.Gradient Harmonized Single-Stage Detector[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8577-8584.
[15]CHEN Y Q,PEDRYCZ W,YANG J.A new boundary-degreebased oversampling method for imbalanced data[J].Applied Intelligence,2023,53(22):26518-26541.
[16]LI J N,ZHU Q S,WU Q W,et al.A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors[J].Information Science,2021,565:438-455.
[17]WANG W T,YANG L J,ZHANG J H,et al.Natural local density-based adaptive oversampling algorithm for imbalanced classification[J].Knowledge-Based Systems,2024,295:111845.
[18]LI M,ZHOU H,LIU Q,et al.WRND:A weighted oversampling framework with relative neighborhood density for imbalanced noisy classification[J].Expert Systems With Applications,2024,241:122593.
[19]LENG Q K,GUO J M,JIAO E J,et al.NanBDOS:Adaptive and parameter-free borderline oversampling via natural neighbor search for class-imbalance learning[J].Knowledge-Based Systems,2023,274:110665.
[20]YAN Y T,JIANG Y F,ZHENG Z,et al.LDAS:Local density based adaptive sampling for imbalanced data classification[J].Expert Systems with Applications,2022,191:116213.
[21]TAO X,ZHANG X,ZHENG Y,et al.A Mean Shift-guidedoversampling with self-adaptive sizes for imbalanced data classification[J].Information Science,2024,672:120699.
[22]ZHANG Z,TIAN H P,JIN J S.Multiple adaptive over-sampling for imbalanced data evidential classification[J].Engineering Applications of Artificial Intelligence,2024,133(F):108532.
[23]SUN L,LI M M,DING W P,et al.AFNFS:Adaptive fuzzyneighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data[J].Information Science,2022,612:724-744.
[24]MOUTAOUAKIL K,ROUDANI M,QUISSARI A.OptimalEntropy Genetic Fuzzy-C-Means SMOTE(OEGFCM-SMOTE)[J].Knowledge-Based Systems,2023,262:110235.
[25]MENG D X,LI Y J.An imbalanced learning method by combining SMOTE with Center Offset Factor[J].Applied Soft Computing,2022,120:108618.
[26]WANG X L,GONG J,SONG Y,et al.Adaptively weightedthree-way decision oversampling:A cluster imbalanced-ratio based approach[J].Applied Intelligence,2022,53(1):312-335.
[27]XU Z Z,SHEN D R,KOU Y,et al.A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(3):3740-3753.
[28]LI J N,ZHU Q S,WU Q W,et al.SMOTE-NaN-DE:Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution[J].Knowledge-Based Systems,2021,223:107056.
[29]PARK S,LEE H,IM J.Relabeling & raking algorithm for imbalanced classification[J].Expert Systems With Applications,2024,247:123274.
[30]LIU R J.A novel synthetic minority oversampling techniquebased on relative and absolute densities for imbalanced classification[J].Applied Intelligence,2023,53(1):786-803.
[31]ZHENG Y F,WANG M N.Oversampling Method for imba-lanced Data based on Variance Transfer[J].Computer Science,2024,51(S1):657-662.
[1] ZHU Rui, YE Yaqin, LI Shengwen, TANG Zijian, XIAO Yue. Dynamic Community Detection with Hierarchical Modularity Optimization [J]. Computer Science, 2025, 52(8): 127-135.
[2] JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun. Clustering Algorithm Based on Improved SOM Model [J]. Computer Science, 2025, 52(8): 162-170.
[3] ZENG Xinran, LI Tianrui, LI Chongshou. Active Learning for Point Cloud Semantic Segmentation Based on Dynamic Balance and DistanceSuppression [J]. Computer Science, 2025, 52(8): 180-187.
[4] FU Wenhao, GE Liyong, WANG Wen, ZHANG Chun. Multi-UAV Path Planning Algorithm Based on Improved Dueling-DQN [J]. Computer Science, 2025, 52(8): 326-334.
[5] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[6] LI Jiawei , DENG Yuandan, CHEN Bo. Domain UML Model Automatic Construction Based on Fine-tuning Qwen2 [J]. Computer Science, 2025, 52(6A): 240900155-4.
[7] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[8] HUANG Ao, LI Min, ZENG Xiangguang, PAN Yunwei, ZHANG Jiaheng, PENG Bei. Adaptive Hybrid Genetic Algorithm Based on PPO for Solving Traveling Salesman Problem [J]. Computer Science, 2025, 52(6A): 240600096-6.
[9] SUN Yongqian, TANG Shouguo. Prediction of Moisture Content and Temperature of Tobacco Leaf Re-curing Outlet Based onImproved DBO-BP Neural Network [J]. Computer Science, 2025, 52(6A): 240900069-7.
[10] GAO Xinjun, ZHANG Meixin, ZHU Li. Study on Short-time Passenger Flow Data Generation and Prediction Method for RailTransportation [J]. Computer Science, 2025, 52(6A): 240600017-5.
[11] DU Yuanhua, CHEN Pan, ZHOU Nan, SHI Kaibo, CHEN Eryang, ZHANG Yuanpeng. Correntropy Based Multi-view Low-rank Matrix Factorization and Constraint Graph Learning for Multi-view Data Clustering [J]. Computer Science, 2025, 52(6A): 240900131-10.
[12] BAO Shenghong, YAO Youjian, LI Xiaoya, CHEN Wen. Integrated PU Learning Method PUEVD and Its Application in Software Source CodeVulnerability Detection [J]. Computer Science, 2025, 52(6A): 241100144-9.
[13] HUANG Xiaoyu, JIANG Hemeng, LING Jiaming. Privacy Preservation of Crowdsourcing Content Based on Adversarial Generative Networks [J]. Computer Science, 2025, 52(6A): 250200123-7.
[14] LI Zhijie, LIAO Xuhong, LI Qinglan, LIU Li. Cancer Pathogenic Gene Prediction Based on Differential Co-expression Adjacent Network [J]. Computer Science, 2025, 52(5): 161-170.
[15] SUN Jinyong, WANG Xuechun, CAI Guoyong, SHANG Zhiliang. Open Set Recognition Based on Meta Class Incremental Learning [J]. Computer Science, 2025, 52(5): 187-198.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!