Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250600199-8.doi: 10.11896/jsjkx.250600199

• Big Data & Data Science • Previous Articles     Next Articles

Imbalanced Data Learning Approach Utilizing Feature Value Based Class Overlap Degree

SUN Bo1, WANG Zhijun1, ZHOU Zhunan1, LI Qingjie2, WANG Yun1, GENG Xia1, ZHANG Yan1 , SUN Chenxuan1   

  1. 1 College of Information Science and Engineering,Shandong Agricultural University,Taian,Shandong 271018,China
    2 Shandong Taian Yingxiongshan Middle School,Taian,Shandong 271000,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:SUN Bo,born in 1987,Ph.D, associate professor,master's supervisor.His main research interests include machine learning,artificial neural network and security artificial intelligence.
    WANG Zhijun,born in 1974,Ph.D,professor. His main research interests include agricultural informationization,information security,and machine lear-ning.
  • Supported by:
    Natural Science Foundation of Shandong Province(ZR2023MF098,ZR2023MA011,ZR2021MC168,ZR2018QF002) and R&D and Application Demonstration of Key Technologies for Intelligent Production in Fruit Industry(2022TZXD0011).

Abstract: Class imbalance problem is an important challenge in supervised machine learning field.In an imbalanced training set,although the minority class is significantly outnumbered by the majority class,it usually attracts more attention from the practitioners and has higher misclassification cost than the latter one.Most classifier learning algorithms usually employ the overall classification accuracy as the optimization goal,and thus easily misclassify the minority class examples that make less contribution to overall classification accuracy.Existing imbalance learning approaches often utilize the class imbalance ratio(IR) of a training set as the classification complexity measure as well as the optimization goal.However,it has recently been indicated that,compared with IR,class overlap can more objectively measure the learning difficulty of an imbalanced dataset.Considering the importance of class overlap in evaluating the data complexity,the imbalance problem is solved from the class overlap perspective,and an imbalanced dataset learning approach FO-RBU utilizing the class overlap information of a training set is proposed.Specifically,the distribution concerning the ratios of feature based class overlap examples is employed to evaluate the learning difficulty of an imbalanced dataset,and further utilized as a theoretical guideline in determining the proper undersampling extent of Radial-Based Undersampling approach.Experimental results show that the feature values based class overlap information is a good indicator in the proper undersampling ratio determination process,and the proposed class imbalance learning approach FO-RBU is effective.

Key words: Classification, Class imbalanced data, Class imbalance ratio IR, Class overlap, Undersampling, Radial-based undersampling, Undersampling ratio, Machine learning

CLC Number: 

  • TP181
[1] DONG J,JIANG Z,PAN D,et al.A survey on confidence calibration of deep learning-based classification models under class imbalance data[J].IEEE Transactions on Neural Networks and Learning Systems,2025,3(1):1-21.
[2] SUN T H,ZHAO G,GUO M Q.Long-tail Distributed Medical Image Classification Based on Large Selective Nuclear Bilateral-branch Networks[J].Computer Science,2025,52(4):231-239.
[3] XIA T,DANG T,HAN J,et al.Uncertainty-Aware Health Diagnostics via Class-Balanced Evidential Deep Learning[J].IEEE Journal of Biomedical and Health Informatics,2024,28(11):6417-6428.
[4] DING H,SUN Y,HUANG N,et al.TMG-GAN:GenerativeAdversarial Networks-Based Imbalanced Learning for Network Intrusion Detection[J].IEEE Transactions on Information Forensics and Security,2023,19(1):1156-1167.
[5] VUTTIPITTAYAMONGKOL P,ELYAN E,PETROVSKI A.On the class overlap problem in imbalanced data classification[J].Knowledge-Based Systems,2021,212(1):1-17.
[6] LU Y,CHEUNG Y M,TANG Y Y.Bayes imbalance impact index:a measure of class imbalanced data set for classification problem[J].IEEE Transactions on Neural Networks and Learning Systems,2019,31(9):3525-3539.
[7] ZHANG R,ZHANG Z,WANG D.RFCL:A new under-sam-pling method of reducing the degree of imbalance and overlap[J].Pattern Analysis and Applications,2021,24(2):641-654.
[8] KOZIARSKI M.Radial-based undersampling for imbalanced data classification[J].Pattern Recognition,2020,102(1):1-11.
[9] MALDONADO S,VAIRETTI C,FERNANDEZ A,et al.FW-SMOTE:a feature-weighted oversampling approach for imbalanced classification[J].Pattern Recognition,2022,124(4):1-13.
[10] NG W W Y,XU S,ZHANG J,et al..Hashing-based undersampling ensemble for imbalanced pattern classification problems[J].IEEE Transactions on Cybernetics,2022,52(2):1269-1279.
[11] WANG A X,LE V T,TRUNG H N,et al.Addressing imbalance in health data:synthetic minority oversampling using deep learning[J].Computers in Biology and Medicine,2025,188(1):109830-109840.
[12] ZHENG J H,LI X M,LIU S Y,et al.Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling[J].Computer Science,2021,48(7):145-154.
[13] HUANG Z,SANG Y,SUN Y,et al.Neural Networks Learn Specified Information for Imbalanced Data Classification[J].IEEE Transactions on Knowledge and Data Engineering,2024,36(11):6719-6730.
[14] HOU,R,CHANG H,MA B,et al.Dual compensation residual networks for class imbalanced learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(10):11733-11752.
[15] CHEN C,SHEN W,YANG C,et al.A New Safe-Level Enabled Borderline-SMOTE for Condition Recognition of Imbalanced Dataset[J].IEEE Transactions on Instrumentation and Mea-surement,2023,72(1):1-10.
[16] LIU C L,CHANG Y H.Learning From Imbalanced Data With Deep Density Hybrid Sampling[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2022,52(11):7065-7077.
[17] YAN S,ZHAO Z,LIU S,et al.BO-SMOTE:A Novel Bayesian-Optimization-Based Synthetic Minority Oversampling Technique[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2023,12(1):1-13.
[18] XU Z,SHEN D,KOU Y,et al.A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification[J].IEEE Transactions on Neural Networks Learning Systems,2022,8(1):1-14.
[19] TAHIR MA,KITTLER J,YAN F.Inverse random under sampling for class imbalance problem and its application to multi-label classification[J].Pattern Recognition,2012,45(10):3738-3750.
[20] BACH M,WERNET A,PALT M.The proposal of undersampling method for learning from imbalanced datasets[J].Procedia Computer Science,2019,159(1):125-134.
[21] OFEK N,ROKACH L,STERN R,et al.Fast-CBUS:A fastclustering-based undersampling method for addressing the class imbalance problem[J].Neurocomputing,2017,243(1):88-102.
[22] TSAI C F,LIN W C,HU Y H,et al.Under-sampling class imbalanced datasets by combining clustering analysis and instance selection[J].Information Sciences,2019,477(1):47-54.
[23] DAI Q,WANG L H,XU K L,et al.Class-overlap detection based on heterogeneous clustering ensemble for multi-class imbalance problem[J].Expert Systems with Applications,2024,255(1):1-17.
[24] FERNANDES E,DE CARVALHO A C.Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning[J].Information Sciences,2019,494(8):141-154.
[25] XU Y,YU Z,CHEN C.Classifier Ensemble Based on Multiview Optimization for High-Dimensional Imbalanced Data Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(1):870-883.
[26] SANTOS M S,ABREU P H,JAPKOWICZ N,et al.On thejoint-effect of class imbalance and overlap:a critical review[J].Artificial Intelligence Review,2022,55(8):6207-6275.
[27] LORENA AC,GARCIA LPF,LEHMANN J,et al.How complex is your classification problem? a survey on measuring classification complexity[J].ACM Computing Surveys(CSUR),2019,52(5):1-34.
[28] DUA D,GRAFF C.UCI Machine Learning Repository [EB/OL].http://archive.ics.uci.edu/ml.
[29] SUN Z,WANG G,LI P,et al.An improved random forest based on the classification accuracy and correlation measurement of decision trees[J].Expert Systems with Applications,2024,237(1):121549.
[30] XU Z Z,SHEN D R,KOU Y,et al.Clinical prediction of C4.5 decision tree classification algorithm with embedded resampling technique[J].Control and Decision,2021,36(6):1342-1350.
[31] LUQUE A,CARRASCO A,MARTÍN A,et al.The impact of class imbalance in classification performance metrics based on the binary confusion matrix[J].Pattern Recognition,2019,91(1):216-231.
[1] DUAN Haiying, WANG Baohui, HUANG He. Malicious Traffic Detection Method of ICMP Covert Channel Based on Baseline Features [J]. Computer Science, 2026, 53(6A): 250200069-11.
[2] XU Rui, LIU Jin, LIU Xudong, GUAN Jian, DONG Wei. Exploring the Generalization Ability of Prompt-based Large Language Models for TextClassification [J]. Computer Science, 2026, 53(6A): 250400092-7.
[3] DUAN Pengsong, LUO Yu, WANG Chao. Q&A Model for Agricultural Diseases Based on Transformer [J]. Computer Science, 2026, 53(6A): 250400114-9.
[4] ZHONG Hao, KONG Qingxuan, CAI Xianqing, LI Zhizhong, SUN Hao. Intelligent Recognition Method Based on Multimodal Feature Fusion [J]. Computer Science, 2026, 53(6A): 250700065-10.
[5] XIE Hui, LIANG Dan, YANG Huiting, JIA Chunli, HE Jiangshan, DONG Zexiao, REN Ziqi, JIANG Mingzhe, CHEN Xueli. Research on Adaptive Disciplinary Learning Effectiveness Evaluation Model Driven by PrefrontalEEG [J]. Computer Science, 2026, 53(6): 39-49.
[6] LI Jinyou, ZHANG Wenshuai, SHEN Yu, ZHANG Yundong, LI Huimin, LI Jing. Machine Learning-based Parallel Parameter Optimization in High-performance ComputingApplications [J]. Computer Science, 2026, 53(6): 153-162.
[7] SUN Yifei, LI Yongan. Personalized Learning Resource Recommendation:Classifications,Algorithms,and Challenges [J]. Computer Science, 2026, 53(5): 1-12.
[8] LI Yili, YAO Jietong, LANG Jian, ZHU Guobin, CHEN Leiting, ZHOU Fan. Fake News Video Detection:Methods,Challenges,and Explainability Research [J]. Computer Science, 2026, 53(5): 174-192.
[9] GUO Jingchen, YANG Kuiwu, DING Mengdi, WEI Jianghong. Survey of Adversarial Sample Attacks for Vision Transformer [J]. Computer Science, 2026, 53(5): 404-418.
[10] CHEN Jun, TAO Wei, BAO Lei, TAO Qing. Momentum Method with Monotonical Coordinate-wise Step-sizes for Adversarial Attacks [J]. Computer Science, 2026, 53(5): 426-434.
[11] WANG Jinghong, LI Pengchao, MI Jusheng, WANG Wei. Multi-channel Graph Kolmogorov-Arnold Network Based on WL Graph Core [J]. Computer Science, 2026, 53(4): 224-234.
[12] ZHENG Yi, JIA Xinghao, ZHANG Junwen, REN Shuang. Image Classification Based on Hybrid Quantum-Classical Long-Short Range Feature Extension Network [J]. Computer Science, 2026, 53(4): 277-283.
[13] CHEN Han, XU Zefeng, JIANG Jiu, FAN Fan, ZHANG Junjian, HE Chu, WANG Wenwei. Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis [J]. Computer Science, 2026, 53(3): 41-51.
[14] GE Zeqing, HUANG Shengjun. Semi-supervised Learning Method for Multi-label Tabular Data [J]. Computer Science, 2026, 53(3): 151-157.
[15] WANG Jinghong, LI Pengchao, WANG Xizhao, ZHANG Zili. Dual-channel Graph Neural Network Based on KAN [J]. Computer Science, 2026, 53(3): 188-196.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!