计算机科学 ›› 2017, Vol. 44 ›› Issue (Z6): 7-13.doi: 10.11896/j.issn.1002-137X.2017.6A.002

• 综述研究 • 上一篇    下一篇

半监督集成学习综述

蔡毅,朱秀芳,孙章丽,陈阿娇   

  1. 北京师范大学地表过程与资源生态国家重点实验室 北京100875;北京师范大学资源学院 北京100875,北京师范大学地表过程与资源生态国家重点实验室 北京100875;北京师范大学资源学院 北京100875,北京师范大学地表过程与资源生态国家重点实验室 北京100875;北京师范大学资源学院 北京100875,湖南师范大学资源与环境科学学院 长沙410081
  • 出版日期:2017-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学青年基金项目(41401479),高分辨率对地观测重大专项(民用部分)(02-Y30B06-9001-13115)资助

Semi-supervised and Ensemble Learning:A Review

CAI Yi, ZHU Xiu-fang, SUN Zhang-li and CHEN A-jiao   

  • Online:2017-12-01 Published:2018-12-01

摘要: 半监督学习和集成学习是目前机器学习领域中两个非常重要的研究方向,半监督学习注重利用有标记样本与无标记样本来获得高性能分类器,而集成学习旨在利用多个学习器进行集成以提升弱学习器的精度。半监督集成学习是将半监督学习和集成学习进行组合来提升分类器泛化性能的机器学习新方法。首先,在分析半监督集成学习发展过程的基础上,发现半监督集成学习起源于基于分歧的半监督学习方法;然后,综合分析现有半监督集成学习方法,将其分为基于半监督的集成学习与基于集成的半监督学习两大类,并对主要的半监督集成方法进行了介绍;最后,对现有研究进了总结,并讨论了未来值得研究的问题。

关键词: 半监督学习,集成学习,半监督集成学习,boosting,Bagging,泛化性能

Abstract: Semi-supervised learning (SSL) and ensemble learning are two important paradigms in the field of machine learning research.SSL attempts to achieve strong generalization by exploiting both labeled and unlabeled instances,while ensemble learning aims to improve the performance of weak learner by making use of multiple classifiers.SSL ensemble learning is a novel paradigm which can improve the generalization performance of classifier by combining SSL and ensemble learning.Firstly the development process of SSL ensemble learning was analyzed and it was found that SSL ensemble learning is derived from disagreement-based SSL.Then,classify SSL Ensemble learning methods were classified into two categories:SSL-based ensemble learning and ensemble-based SSL.A detailed description for the main methods of SSL Ensemble learning was given.Finally,the current research status of SSL ensemble learning was summarized and some issues which are worth of further study were given.

Key words: Semi-supervised learning,Ensemble learning,Semi-supervised ensemble learning,Boosting,Bagging,Generalization performance

[1] BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
[2] SHAHSHAHANI B M,LANDGREBE D.The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J].IEEE Transactions on Geoscience and Remote Sensing,1994,32(5):1087-1095.
[3] KEARNS M,VALIANT L.Cryptographic limitations on lear-ning Boolean formulae and finite automata[J].Journal of the ACM (JACM),1994,41(1):67-95.
[4] ZHOU Z.When Semi-supervised Learning Meets EnsembleLearning[M]∥Benediktsson J A,Kittler J,Roli F.Multiple Clas-sifier Systems:8th International Workshop,MCS 2009,Reyk-javik,Iceland.Berlin,Heidelberg:Springer Berlin Heidelberg,2009:529-538.
[5] 周志华.机器学习[M].北京:清华大学出版社,2016.
[6] 张晨光,张燕.半监督学习[M].北京:中国农业科学技术出版社,2013.
[7] ERGER J O.Statistical decision theory and Bayesian analysis[M].Springer Science & Business Media,2013.
[8] CHAPELLE O,SCHLKOPF B,ZIEN A.Semi-SupervisedLearning[M].Cambridge,Massachusetts,USA:The MIT Press,2006.
[9] MERZ C J,CLAIR D S,BOND W E.Semi-supervised adaptive resonance theory (smart2)[C]∥IEEE.1992:851-856.
[10] 唐焕玲,鲁明羽.利用置信度重取样的SemiBoost-CR分类模型[J].计算机科学与探索,2011(11):1048-1056.
[11] 李亚楠.基于Self-training的步态识别研究[D].济南:山东大学,2013.
[12] 谭琨.高光谱遥感影像半监督分类研究[M].徐州:中国矿业大学出版社,2014.
[13] OPITZ D,MACLIN R.Popular ensemble methods:An empirical study[J].Journal of Artificial Intelligence Research,1999,11:169-198.
[14] 张燕平,张玲.机器学习理论与算法[M].北京:科学出版社,2012.
[15] VALIANT L G.A theory of the learnable[J].Communications of the ACM,1984,27(11):1134-1142.
[16] 夏俊士.基于集成学习的高光谱遥感影像分类[D].徐州:中国矿业大学,2013:138.
[17] BLUM A,MITCHELL T.Combining labeled and unlabeled data with co-training[C]∥11th Conference on Computational Lear-ning Theory.ACM,1998:92-100.
[18] NIGAM K,GHANI R.Analyzing the effectiveness and applicability of co-training[C]∥ACM.2000:86-93.
[19] BREFELD U,SCHEFFER T.Co-EM support vector learning[C]∥International Conference on DBLP.2004:16.
[20] ZHOU Z,LI M.Tri-training:Exploiting unlabeled data usingthree classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541.
[21] 周志华.基于分歧的半监督学习[J].自动化学报,2013(11):1871-1878.
[22] LI M,ZHOU Z.Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples[J].IEEE Transa-ctions on Systems,Man and Cybernetics,Part A:Systems and Humans,2007,37(6):1088-1098.
[23] HADY M F A,SCHWENKER F.Co-Training by Committee:A Generalized Framework for Semi-Supervised Learning with Committees[J].Int.J.Software and Informatics,2008,2(2):95-124.
[24] HADY M F A,SCHWENKER F,PALM G.Semi-supervisedlearning for tree-structured ensembles of RBF networks with co-training[J].Neural Networks,2010,23(4):497-509.
[25] 邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法[J].计算机学报,2007(8):1213-1226.
[26] ZHOU Z,LI M.Semi-supervised learning by disagreement[J].Knowledge and Information Systems,2010,24(3):415-439.
[27] BENNETT K P,DEMIRIZ A,MACLIN R.Exploiting unla-beled data in ensemble methods[C]∥ Acm Int.Conf.Know-ledge Discovery & Data Mining.2002:289-296.
[28] GRANDVALET Y,AMBROISE C.Semi-supervised margin-boost[M]∥ Advances in Neural Information Processing Systems.2001:553-560.
[29] MALLAPRAGADA P K,JIN R,JAIN A K,et al.Semiboost:Boosting for semi-supervised learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(11):2000-2014.
[30] LI Y,SU L,CHEN J,et al.Semi-supervised Question Classification Based on Ensemble Learning[M]∥Advances in Swarm and Computational Intelligence .2015:341-348.
[31] LEISTNER C,SAFFARI A,SANTNER J,et al.Semi-super-vised random forests[C]∥IEEE Conf.on Computer Vision.IEEE.2009:506-513.
[32] LIU X,SONG M,TAO D,et al.Semi-supervised node splitting for random forest construction[C]∥CVPR 2013.2013:492-499.
[33] LIU X,SONG M,TAO D,et al.Random Forest Constructionwith Robust Semisupervised Node Splitting[J].IEEE Transactions on Image Processing,2015,24(1):471-483.
[34] FREUND Y S R E.Experiments with a new boosting algorithm[M].San Francisco,California:Morgan Kaufmann,1996.
[35] FRIEDMAN J,HASTIE T,TIBSHIRANI R.Additive logisticregression:a statistical view of boosting[J].The Annals of Statistics,2000,28(2):337-407.
[36] CAI Y,FENG K,LU W,et al.Using LogitBoost classifier to predict protein structural classes[J].Journal of Theoretical Bio-logy,2006,238(1):172-176.
[37] CHEN S,ZHU S,YAN Y.Robust visual tracking via onlinesemi-supervised co-boosting[J].Multimedia Systems,2015:1-17.
[38] ZEISL B,LEISTNER C,SAFFARI A,et al.On-line semi-supervised multiple-instance boosting[C]∥ IEEE Conference on Computer Vision & Pattern Recognition.IEEE.2010:1879.
[39] ZHENG L,WANG S,LIU Y,et al.Information theoretic regularization for semi-supervised boosting[C]∥Acm Sigkdd Internation Conference on Knowledge Discovery and Data Mining.2009:1017-1026.
[40] GRABNER H,LEISTNER C,BISCHOF H.Semi-supervised on-line boosting for robust tracking[M]∥Computer Vision-ECCV 2008.Springer,2008:234-247.
[41] 杜培军,夏俊士,薛朝辉,等.高光谱遥感影像分类研究进展[J].遥感学报,2016,20(2):236-256.
[42] TAO D,TANG X,LI X,et al.Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(7):1088-1099.
[43] HO T K.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
[44] SKURICHIAN M,DUIN R P.Bagging,boosting and the random subspace method for linear classifiers[J].Pattern Analysis & Applications,2002,5(2):121-135.
[45] LIAW A,WIENER M.Classification and regression by randomForest[J].R News,2002,2(3):18-22.
[46] XIA J,LIAO W,CHANUSSOT J,et al.Improving random fo-rest with ensemble of features and semisupervised feature extraction[J].IEEE Geoscience and Remote Sensing Letters,2015,12(7):1471-1475.
[47] SHI L,MA X,XI L,et al.Rough set and ensemble learning based semi-supervised algorithm for text classification[J].Expert Systems with Applications,2011,38(5):6300-6306.
[48] BELLAL F,ELGHAZEL H,AUSSEM A.A semi-supervisedfeature ranking method with ensemble learning[J].Pattern Re-cognition Letters,2012,33(10):1426-1433.
[49] 王轶初.基于集成学习的半监督学习算法研究[D].西安:西安电子科技大学,2011.
[50] 葛荐.基于集成算法的半监督学习研究[D].南京:南京信息工程大学,2012.
[51] EFRON B,TIBSHIRANI R J.An introduction to the bootstrap[M].CRC Press,1994.
[52] CHEN K,WANG S.Regularized boost for semi-supervisedlearning[C]∥Conference on Neural Information Processing Systems.2008:281-288.
[53] CHEN K,WANG S.Semi-supervised learning via regularizedboosting working on multiple semi-supervised assumptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(1):129-143.
[54] CUNNINGHAM P,CARNEY J.Diversity versus quality inclassification ensembles based on feature selection[M].Sprin-ger,2000:109-116.
[55] COZMAN F G,COHEN I,CIRELO M.Unlabeled Data Can Degrade Classification Performance of Generative Classifiers[C]∥IEEE Asia-pacific Service Computing Conference.2002:327-331.
[56] 孙博,王建东,陈海燕,等.集成学习中的多样性度量[J].控制与决策,2014(3):385-395.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!