Computer Science ›› 2023, Vol. 50 ›› Issue (10): 48-58.doi: 10.11896/jsjkx.230600022

• Granular Computing & Knowledge Discovery • Previous Articles     Next Articles

Imbalanced Undersampling Based on Constructive Neural Network and Global Density Information

YAN Yuanting, MA Yingao, REN Yanping, ZHANG Yanping   

  1. College of Computer Science and Technology,Anhui University,Hefei 230601,China
  • Received:2023-06-02 Revised:2023-08-08 Online:2023-10-10 Published:2023-10-10
  • About author:YAN Yuanting,born in 1986,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include data mining,machine learning and granular computing.
  • Supported by:
    National Natural Science Foundation of China(61806002).

Abstract: Undersampling is one of the mainstream data-level technologies to deal with imbalanced data.In recent years,researchers have proposed numerous undersampling methods,but most of them focus on how to select representative majority class samples to avoid the loss of informative data.However,how to maintain the structures of the original majority class in the process of undersampling is still an open challenge.To this end,an undersampling method for imbalanced data classification is proposed based on constructive neural network and data density.Firstly,it detects the majority local patterns with a simplified constructive process.Then,two sample selection strategies are designed to maintain the structure of the selected groups according to the original majority distribution information.Finally,to solve the problem that the randomness of local pattern learning may lead to non-optimal sampling results,the bagging technique is introduced to further improve the learning performance.Comparative experiments with 13 comparison methodson 59 datasets verify the effectiveness of the proposed method in terms of three metrics G-mean,AUC and F1-score.

Key words: Undersampling, Imbalanced data, Distribution density, Constructive neural network, Ensemble learning

CLC Number: 

  • TP311
[1]CHAMSEDDINE E,MANSOURI N,SOUI M,et al.Handling class imbalance in COVID-19 chest X-ray images classification:Using SMOTE and weighted loss[J].Applied Soft Computing,2022,129:109588.
[2]GUO J F,WANG M S,SUN L,et al.New method of fault diagnosis for rolling bearing imbalance data set based on generative adversarial network[J].Computer Integrated Manufacturing Systems,2022,28(9):2825-2835.
[3]CHEN Z,ZHU M,DU J W.Multi-view graph neural network for fraud detection algorithm[J].Journal on Communications,2022,43(11):225-232.
[4]XIE Y X,QIU M,ZHANG H B,et al.Gaussian distributionbased oversampling for imbalanced data classification[J].IEEE Transactions on Knowledge and Data Engineering,2022,34(2):667-679.
[5]LIN W C,TSAI C F,HU Y H,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409:17-26.
[6]ZHANG Y Q,LU R Z,QIAO S J,et al.A Sampling Method of Imbalanced Data Based on Sample Space[J].Acta Automatica Sinica,2022,48(10):2549-2563.
[7]DONG H C,WEN Z Y,WAN Y H,et al.An imbalanced dataclassification algorithm based on DPC clustering resampling combined with ELM[J].Computer Engineering & Science,2021,43(10):1856-1863.
[8]DRUMMOND C,HOLTE R C.C4.5,class imbalance,and cost sensitivity:why under-sampling beats over-sampling[C]//Workshop on learning from imbalanceddatasets II.2003:1-8.
[9]WANG S,MINKU L L,YAO X.Resampling-based ensemblemethods for online class imbalance learning[J].IEEE Transactions on Knowledge and Data Engineering,2014,27(5):1356-1368.
[10]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[11]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:a newover-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin:Sprin-ger,2005:878-887.
[12]HE H B,BAI Y,GARCIA E A,et al.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks(IEEE world Congress on Computational Intelligence).IEEE,2008:1322-1328.
[13]BARUA S,ISLAM M M,YAO X,et al.MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning[J].IEEE Transactions on Knowledge and Data Engineering,2012,26(2):405-425.
[14]BUNKHUMPORNPAT C,SINAPIROMSARAN K,LURSIN-SAP C.Safe-level-smote:Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Advances in Knowledge Discovery and Data Mining:13th Pacific-Asia Conference.Berlin:Springer,2009:475-482.
[15]KOZIARSKI M.Radial-based undersampling for imbalanced data classification[J].Pattern Recognition,2020,102:107262.
[16]ZHANG Y P,ZHANG L,WANG Y C.Cluster-based majority under-sampling approaches for class imbalance learning[C]//2010 2nd IEEE International Conference on Information and Financial Engineering.IEEE,2010:400-404.
[17]BARANDELA R,VALDOVINOS R M,SÁNCHEZ J S.Newapplications of ensembles of classifiers[J].Pattern Analysis & Applications,2003,6:245-256.
[18]LIU X Y,WU J X,ZHOU Z H.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),2008,39(2):539-550.
[19]SUN Y M,KAMEL M S,WONG A K C,et al.Cost-sensitiveboosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378.
[20]ZHU Z H,WANG Z,LI D D,et al.Geometric structural ensemble learning for imbalanced problems[J].IEEE Transactions on cybernetics,2018,50(4):1617-1629.
[21]ZHANG L,ZHANG B.A geometrical representation of McCulloch-Pitts neural model and its applications[J].IEEE Transactions on Neural Networks,1999,10(4):925-929.
[22]EFRAIMIDIS P S,SPIRAKIS P G.Weighted random sampling with a reservoir[J].Information Processing Letters,2006,97(5):181-185.
[23]VUTTIPITTAYAMONGKOL P,ELYAN E.Neighbourhood-based undersampling approach for handling imbalanced and overlapped data[J].Information Sciences,2020,509:47-70.
[24]SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J,et al.RUSBoost:A hybrid approach to alleviating class imbalance[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2009,40(1):185-197.
[25]GALAR M,FERNÁNDEZ A,BARRENECHEA E,et al.EUSBoost:Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern recognition,2013,46(12):3460-3471.
[26]BĿASZCZYHSKI J,DECKERT M,STEFANOWSKI J,et al.IIvotes ensemble for imbalanced data[J].Intelligent Data Analysis,2012,16(5):777-801.
[1] ZHANG Desheng, CHEN Bo, ZHANG Jianhui, BU Youjun, SUN Chongxin, SUN Jia. Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm [J]. Computer Science, 2023, 50(7): 317-324.
[2] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[3] HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
[4] HE Yulin, ZHU Penghui, HUANG Zhexue, Fournier-Viger PHILIPPE. Classification Uncertainty Minimization-based Semi-supervised Ensemble Learning Algorithm [J]. Computer Science, 2023, 50(10): 88-95.
[5] DING Xuhui, ZHANG Linlin, ZHAO Kai, WANG Xusheng. Android Application Privacy Disclosure Detection Method Based on Static and Dynamic Combination [J]. Computer Science, 2023, 50(10): 327-335.
[6] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[7] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[8] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[9] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[10] DONG Qi-da, WANG Zhe, WU Song-yang. Feature Fusion Framework Combining Attention Mechanism and Geometric Information [J]. Computer Science, 2022, 49(5): 129-134.
[11] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[12] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[13] WANG Bo, HUA Qing-yi, SHU Xin-feng. Study on Anomaly Detection and Real-time Reliability Evaluation of Complex Component System Based on Log of Cloud Platform [J]. Computer Science, 2022, 49(12): 125-135.
[14] XU Kun-cai, FENG Bao, CHEN Ye-hang, LIU Yu, ZHOU Hao-yang, CHEN Xiang-meng. Thymoma CT Image Prediction Method Based on Deep Learning and Improved Extreme Learning Machine Ensemble Learning [J]. Computer Science, 2022, 49(11A): 211200097-6.
[15] WANG Ying-hui, LI Wei-hua, LI Chuan, CHEN Wei, WEN Jun-ying. Prediction of Antigenic Similarity of Influenza A/H5N1 Virus Based on Attention Mechanism and Ensemble Learning [J]. Computer Science, 2022, 49(11A): 210900032-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!