Computer Science ›› 2023, Vol. 50 ›› Issue (10): 88-95.doi: 10.11896/jsjkx.230600048

• Granular Computing & Knowledge Discovery • Previous Articles     Next Articles

Classification Uncertainty Minimization-based Semi-supervised Ensemble Learning Algorithm

HE Yulin1,2, ZHU Penghui2, HUANG Zhexue1,2, Fournier-Viger PHILIPPE2   

  1. 1 Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ),Shenzhen,Guangdong 518107,China
    2 College of Computer Science & Software Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China
  • Received:2023-06-05 Revised:2023-07-28 Online:2023-10-10 Published:2023-10-10
  • About author:HE Yulin,born in 1982, Ph.D,research associate,is a member of China Computer Federation.His main research interests include big data approximate computing technologies,multi-sample statistics theories and methods,and data mining and machine algorithms and their applications.
  • Supported by:
    National Natural Science Foundation of China(61972261), Natural Science Foundation of Guangdong Province(2023A1515011667),Key Basic Research Foundation of Shenzhen(JCYJ20220818100205012) and Basic Research Foundations of Shenzhen (JCYJ20210324093609026).

Abstract: Semi-supervised ensemble learning(SSEL) is a combinatorial paradigm by fusing semi-supervised learning and ensemble learning together,which improves the diversity of ensemble learning by introducing unlabeled samples and at the same time solves the problem of insufficient sample size for ensemble learning.In addition,SSEL can improve the generalization capability of classification system by integrating multiple classifiers trained on the highly-credible labeled samples.The existing researches have proved the mutual benefit between semi-supervised learning and integrated learning from both theoretical and practical perspectives.The existing SSEL algorithms are unable to make full use of the unlabeled samples,which limit their prediction capabi-lities when handling the classification problems with less labeled samples.This paper proposes a novel classification uncertainty minimization-based semi-supervised ensemble learning(CUM-SSEL) algorithm,which introduces the information entropy as the criterion of confidence and uses the characteristics of information entropy to minimize the classification uncertainty in the process of predicting unlabeled samples.The feasibility,rationality and effectiveness of CUM-SSEL algorithm are verified based on a series of persuasive experiments.Experimental results demonstrate that CUM-SSEL is a valid algorithm to deal with the semi-supervised learning problems.

Key words: Semi-supervised ensemble learning, Ensemble learning, Semi-supervised learning, Classification uncertainty, Confidence, Information entropy

CLC Number: 

  • TP391
[1]MERZ C,CLAIR D S,BOND W.Semi-supervised adaptive resonance theory[C]//Proceedings of IJCNN International Joint Conference on Neural Networks.IEEE,1992,3:851-856.
[2]HADY M,SCHWENKER F.Semi-Supervised Learning [J].Journal of the Royal Statistical Society,2006,172(2):530.
[3]VAN ENGELEN J,HOOS H H.A survey on semi-supervisedlearning [J].MachineLearning,2020,109(2):373-440.
[4]BUHLMANN P,YU B.Analyzing bagging [J].Annals of Sta-tistics,2002,30(4):927-961.
[5]SCHAPIRE R E.The boosting approach to machine learning:An overview [J].Lecture Notes in Statistics:Nonlinear Estimation and Classification,2003,171:149-171.
[6]SAGI O,ROKACH L.Ensemble learning:A survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Disco-very,2018,8(4):e1249.
[7]BENNETT K P,DEMIRIZ A,MACLIN R.Exploiting unla-beled data in ensemble methods[C]//Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2002:289-296.
[8]ZHOU Z H.When semi-supervised learning meets ensemblelearning[J].Frontiers of Electrical and Electronic Engineering in China,2011,6:6-16.
[9]DONG X,YU Z,CAO W,et al.A survey on ensemble learning[J].Frontiers of Computer Science,2020,14:241-258.
[10]BREIMAN L.Bagging predictors [J].Machine Learning,1996,24:123-140.
[11]SEEDAT N,KANAN C.Towards calibrated and scalable uncertainty representations for neural networks [J].arXiv:1911.00104,2019.
[12]MALLAPRAGADA P K,JIN R,JAIN A K,et al.Semiboost:Boosting for semi-supervised learning [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,31(11):2000-2014.
[13]LUO Y,ZHU J,LI M,et al.Smooth neighbors on teachergraphs for semi-supervised learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8896-8905.
[14]HOU J,MAO Y,SUN J S.A Semi-supervised Boosting Algorithm for Maximizing Sample Separability [J].Journal of Nanjing University of Technology,2014,38(5):675-681.
[15]YANG J,ZHANG D,YANG J Y,et al.Globally maximizing,locally minimizing:unsupervised discriminant projection with applications to face and palm biometrics [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(4):650-664.
[16]TANHA J.MSSBoost:A new multiclass boosting to semi-supervised learning [J].Neurocomputing,2018,314:251-266.
[17]CHEN S,SU S,LI S Z,et al.Cooperative training target tra-cking algorithm based on online semi-supervised boosting [J].Journal of Electronics and Information,2014,36(4):888-895.
[18]ZHANG M L,ZHOU Z H.Exploiting unlabeled data to enhance ensemble diversity [J].Data Mining and Knowledge Discovery,2013,26:98-129.
[19]LI Y,SU L,CHEN J,et al.Semi-supervised learning for question classification in CQA [J].Natural Computing,2017,16:567-577.
[20]LIVIERIS I E,KANAVOS A,TAMPAKAS V,et al.An en-semble SSL algorithm for efficient chest X-ray image classification [J].Journal of Imaging,2018,4(7):95.
[21]YAROWSKY D.Unsupervised word sense disambiguation rivaling supervised methods[C]//33rd Annual Meeting of The Association for Computational Linguistics.1995:189-196.
[22]BLUM A,MITCHELL T.Combining labeled and unlabeled data with co-training[C]//Proceedings of The Eleventh Annual Conference on Computational Learning Theory.1998:92-100.
[23]ZHOU Z H,LI M.Tri-training:Exploiting unlabeled data using three classifiers [J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541.
[24]KENDALL A,GAL Y.Whatuncertainties do we need in Baye-sian deep learning for computer vision?[C]//Proceedings of the 31st Conference on Neural Information Processing Systems.2017:5580-5590.
[25]GE J,MA T.Semi-supervised learning based on ensemble algorithm[C]//Proceedings of the 29th China Database Academic Conference.2012:208-213.
[1] ZHANG Desheng, CHEN Bo, ZHANG Jianhui, BU Youjun, SUN Chongxin, SUN Jia. Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm [J]. Computer Science, 2023, 50(7): 317-324.
[2] LI Hui, LI Wengen, GUAN Jihong. Dually Encoded Semi-supervised Anomaly Detection [J]. Computer Science, 2023, 50(7): 53-59.
[3] ZHOU Zhiqiang, ZHU Yan. Local Community Detection Algorithm for Attribute Networks Based on Multi-objective Particle Swarm Optimization [J]. Computer Science, 2023, 50(6A): 220200015-6.
[4] GU Yuhang, HAO Jie, CHEN Bing. Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion [J]. Computer Science, 2023, 50(6A): 220500001-6.
[5] WANG Qingyu, WANG Hairui, ZHU Guifu, MENG Shunjian. Study on SQL Injection Detection Based on FlexUDA Model [J]. Computer Science, 2023, 50(6A): 220600172-6.
[6] QIN Liang, XIE Liang, CHEN Shengshuang, XU Haijiao. Online Semi-supervised Cross-modal Hashing Based on Anchor Graph Classification [J]. Computer Science, 2023, 50(6): 183-193.
[7] GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[8] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[9] ZHANG Renbin, ZUO Yicong, ZHOU Zelin, WANG Long, CUI Yuhang. Multimodal Generative Adversarial Networks Based Multivariate Time Series Anomaly Detection [J]. Computer Science, 2023, 50(5): 355-362.
[10] HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
[11] LI Haitao, WANG Ruimin, DONG Weiyu, JIANG Liehui. Semi-supervised Network Traffic Anomaly Detection Method Based on GRU [J]. Computer Science, 2023, 50(3): 380-390.
[12] WANG Xiangwei, HAN Rui, Chi Harold LIU. Hierarchical Memory Pool Based Edge Semi-supervised Continual Learning Method [J]. Computer Science, 2023, 50(2): 23-31.
[13] SONG Faxing, MIAO Duoqian, ZHANG Hongyun. Semi-supervised Object Detection with Sequential Three-way Decision [J]. Computer Science, 2023, 50(10): 1-6.
[14] YAN Yuanting, MA Yingao, REN Yanping, ZHANG Yanping. Imbalanced Undersampling Based on Constructive Neural Network and Global Density Information [J]. Computer Science, 2023, 50(10): 48-58.
[15] DING Xuhui, ZHANG Linlin, ZHAO Kai, WANG Xusheng. Android Application Privacy Disclosure Detection Method Based on Static and Dynamic Combination [J]. Computer Science, 2023, 50(10): 327-335.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!