Computer Science ›› 2017, Vol. 44 ›› Issue (Z6): 7-13.doi: 10.11896/j.issn.1002-137X.2017.6A.002

Previous Articles     Next Articles

Semi-supervised and Ensemble Learning:A Review

CAI Yi, ZHU Xiu-fang, SUN Zhang-li and CHEN A-jiao   

  • Online:2017-12-01 Published:2018-12-01

Abstract: Semi-supervised learning (SSL) and ensemble learning are two important paradigms in the field of machine learning research.SSL attempts to achieve strong generalization by exploiting both labeled and unlabeled instances,while ensemble learning aims to improve the performance of weak learner by making use of multiple classifiers.SSL ensemble learning is a novel paradigm which can improve the generalization performance of classifier by combining SSL and ensemble learning.Firstly the development process of SSL ensemble learning was analyzed and it was found that SSL ensemble learning is derived from disagreement-based SSL.Then,classify SSL Ensemble learning methods were classified into two categories:SSL-based ensemble learning and ensemble-based SSL.A detailed description for the main methods of SSL Ensemble learning was given.Finally,the current research status of SSL ensemble learning was summarized and some issues which are worth of further study were given.

Key words: Semi-supervised learning,Ensemble learning,Semi-supervised ensemble learning,Boosting,Bagging,Generalization performance

[1] BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
[2] SHAHSHAHANI B M,LANDGREBE D.The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J].IEEE Transactions on Geoscience and Remote Sensing,1994,32(5):1087-1095.
[3] KEARNS M,VALIANT L.Cryptographic limitations on lear-ning Boolean formulae and finite automata[J].Journal of the ACM (JACM),1994,41(1):67-95.
[4] ZHOU Z.When Semi-supervised Learning Meets EnsembleLearning[M]∥Benediktsson J A,Kittler J,Roli F.Multiple Clas-sifier Systems:8th International Workshop,MCS 2009,Reyk-javik,Iceland.Berlin,Heidelberg:Springer Berlin Heidelberg,2009:529-538.
[5] 周志华.机器学习[M].北京:清华大学出版社,2016.
[6] 张晨光,张燕.半监督学习[M].北京:中国农业科学技术出版社,2013.
[7] ERGER J O.Statistical decision theory and Bayesian analysis[M].Springer Science & Business Media,2013.
[8] CHAPELLE O,SCHLKOPF B,ZIEN A.Semi-SupervisedLearning[M].Cambridge,Massachusetts,USA:The MIT Press,2006.
[9] MERZ C J,CLAIR D S,BOND W E.Semi-supervised adaptive resonance theory (smart2)[C]∥IEEE.1992:851-856.
[10] 唐焕玲,鲁明羽.利用置信度重取样的SemiBoost-CR分类模型[J].计算机科学与探索,2011(11):1048-1056.
[11] 李亚楠.基于Self-training的步态识别研究[D].济南:山东大学,2013.
[12] 谭琨.高光谱遥感影像半监督分类研究[M].徐州:中国矿业大学出版社,2014.
[13] OPITZ D,MACLIN R.Popular ensemble methods:An empirical study[J].Journal of Artificial Intelligence Research,1999,11:169-198.
[14] 张燕平,张玲.机器学习理论与算法[M].北京:科学出版社,2012.
[15] VALIANT L G.A theory of the learnable[J].Communications of the ACM,1984,27(11):1134-1142.
[16] 夏俊士.基于集成学习的高光谱遥感影像分类[D].徐州:中国矿业大学,2013:138.
[17] BLUM A,MITCHELL T.Combining labeled and unlabeled data with co-training[C]∥11th Conference on Computational Lear-ning Theory.ACM,1998:92-100.
[18] NIGAM K,GHANI R.Analyzing the effectiveness and applicability of co-training[C]∥ACM.2000:86-93.
[19] BREFELD U,SCHEFFER T.Co-EM support vector learning[C]∥International Conference on DBLP.2004:16.
[20] ZHOU Z,LI M.Tri-training:Exploiting unlabeled data usingthree classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541.
[21] 周志华.基于分歧的半监督学习[J].自动化学报,2013(11):1871-1878.
[22] LI M,ZHOU Z.Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples[J].IEEE Transa-ctions on Systems,Man and Cybernetics,Part A:Systems and Humans,2007,37(6):1088-1098.
[23] HADY M F A,SCHWENKER F.Co-Training by Committee:A Generalized Framework for Semi-Supervised Learning with Committees[J].Int.J.Software and Informatics,2008,2(2):95-124.
[24] HADY M F A,SCHWENKER F,PALM G.Semi-supervisedlearning for tree-structured ensembles of RBF networks with co-training[J].Neural Networks,2010,23(4):497-509.
[25] 邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法[J].计算机学报,2007(8):1213-1226.
[26] ZHOU Z,LI M.Semi-supervised learning by disagreement[J].Knowledge and Information Systems,2010,24(3):415-439.
[27] BENNETT K P,DEMIRIZ A,MACLIN R.Exploiting unla-beled data in ensemble methods[C]∥ Acm Int.Conf.Know-ledge Discovery & Data Mining.2002:289-296.
[28] GRANDVALET Y,AMBROISE C.Semi-supervised margin-boost[M]∥ Advances in Neural Information Processing Systems.2001:553-560.
[29] MALLAPRAGADA P K,JIN R,JAIN A K,et al.Semiboost:Boosting for semi-supervised learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(11):2000-2014.
[30] LI Y,SU L,CHEN J,et al.Semi-supervised Question Classification Based on Ensemble Learning[M]∥Advances in Swarm and Computational Intelligence .2015:341-348.
[31] LEISTNER C,SAFFARI A,SANTNER J,et al.Semi-super-vised random forests[C]∥IEEE Conf.on Computer Vision.IEEE.2009:506-513.
[32] LIU X,SONG M,TAO D,et al.Semi-supervised node splitting for random forest construction[C]∥CVPR 2013.2013:492-499.
[33] LIU X,SONG M,TAO D,et al.Random Forest Constructionwith Robust Semisupervised Node Splitting[J].IEEE Transactions on Image Processing,2015,24(1):471-483.
[34] FREUND Y S R E.Experiments with a new boosting algorithm[M].San Francisco,California:Morgan Kaufmann,1996.
[35] FRIEDMAN J,HASTIE T,TIBSHIRANI R.Additive logisticregression:a statistical view of boosting[J].The Annals of Statistics,2000,28(2):337-407.
[36] CAI Y,FENG K,LU W,et al.Using LogitBoost classifier to predict protein structural classes[J].Journal of Theoretical Bio-logy,2006,238(1):172-176.
[37] CHEN S,ZHU S,YAN Y.Robust visual tracking via onlinesemi-supervised co-boosting[J].Multimedia Systems,2015:1-17.
[38] ZEISL B,LEISTNER C,SAFFARI A,et al.On-line semi-supervised multiple-instance boosting[C]∥ IEEE Conference on Computer Vision & Pattern Recognition.IEEE.2010:1879.
[39] ZHENG L,WANG S,LIU Y,et al.Information theoretic regularization for semi-supervised boosting[C]∥Acm Sigkdd Internation Conference on Knowledge Discovery and Data Mining.2009:1017-1026.
[40] GRABNER H,LEISTNER C,BISCHOF H.Semi-supervised on-line boosting for robust tracking[M]∥Computer Vision-ECCV 2008.Springer,2008:234-247.
[41] 杜培军,夏俊士,薛朝辉,等.高光谱遥感影像分类研究进展[J].遥感学报,2016,20(2):236-256.
[42] TAO D,TANG X,LI X,et al.Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(7):1088-1099.
[43] HO T K.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
[44] SKURICHIAN M,DUIN R P.Bagging,boosting and the random subspace method for linear classifiers[J].Pattern Analysis & Applications,2002,5(2):121-135.
[45] LIAW A,WIENER M.Classification and regression by randomForest[J].R News,2002,2(3):18-22.
[46] XIA J,LIAO W,CHANUSSOT J,et al.Improving random fo-rest with ensemble of features and semisupervised feature extraction[J].IEEE Geoscience and Remote Sensing Letters,2015,12(7):1471-1475.
[47] SHI L,MA X,XI L,et al.Rough set and ensemble learning based semi-supervised algorithm for text classification[J].Expert Systems with Applications,2011,38(5):6300-6306.
[48] BELLAL F,ELGHAZEL H,AUSSEM A.A semi-supervisedfeature ranking method with ensemble learning[J].Pattern Re-cognition Letters,2012,33(10):1426-1433.
[49] 王轶初.基于集成学习的半监督学习算法研究[D].西安:西安电子科技大学,2011.
[50] 葛荐.基于集成算法的半监督学习研究[D].南京:南京信息工程大学,2012.
[51] EFRON B,TIBSHIRANI R J.An introduction to the bootstrap[M].CRC Press,1994.
[52] CHEN K,WANG S.Regularized boost for semi-supervisedlearning[C]∥Conference on Neural Information Processing Systems.2008:281-288.
[53] CHEN K,WANG S.Semi-supervised learning via regularizedboosting working on multiple semi-supervised assumptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(1):129-143.
[54] CUNNINGHAM P,CARNEY J.Diversity versus quality inclassification ensembles based on feature selection[M].Sprin-ger,2000:109-116.
[55] COZMAN F G,COHEN I,CIRELO M.Unlabeled Data Can Degrade Classification Performance of Generative Classifiers[C]∥IEEE Asia-pacific Service Computing Conference.2002:327-331.
[56] 孙博,王建东,陈海燕,等.集成学习中的多样性度量[J].控制与决策,2014(3):385-395.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!