Computer Science ›› 2022, Vol. 49 ›› Issue (6): 127-133.doi: 10.11896/jsjkx.211100043

• Database & Big Data & Data Science • Previous Articles     Next Articles

Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment

WANG Yu-fei, CHEN Wen   

  1. School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China
  • Received:2021-11-03 Revised:2022-03-02 Online:2022-06-15 Published:2022-06-08
  • About author:WANG Yu-fei,born in 1996,postgra-duate.His main research interests include semi-supervised learning,cyber security and data mining.
    CHEN Wen,born in 1983,Ph.D,asso-ciate professor,master supervisor,is a member of China Computer Federation.His main research interests include network security,information hiding and data mining.
  • Supported by:
    National Key Research and Development Program of China(020YFB1805405,2019QY0800),National Natural Science Foundation of China(U1736212,61872255,U19A2068) and Key Laboratory of Pattern Recognition and Intelligent Information Proces-sing,Institutions of Higher Education of Sichuan Province(MSSB-2020-01).

Abstract: Tri-training is a disagreement-based semi-supervised learning algorithm,in which both semi-supervised learning and ensemble learning mechanisms are simultaneously applied.It can improve the model performance by effectively leveraging some labeled samples along with a large amount of unlabeled ones through collaborations and iterations among basic classifiers.How-ever,when the labeled sample size is insufficient,the initial classifiers generated by Tri-training are not sufficiently trained.Furthermore,mislabeled noisy data might be generated during the collaborative labeling process among the classifiers.Aiming at these problems,a collaborative learning algorithm is proposed,which combines DECORATE ensemble learning,diversity mea-sure and credibility assessment.In our method,to improve the generalization performance,multiple preference classifiers are generated based on DECORATE with differentiated artificial data and labels,and the diversities of classifiers are measured and selected by Jensen-Shannon divergence to maxmize the diversity of the classifiers.At the same time,the credibility of the pseudo labeled samples is assessed during the iterations by a label propagation algorithm to reduce the noisy data.The results of classification experiment on UCI data sets demonstrate that the proposed algorithm achieves higher accuracy and F1-score than Tri-trai-ning algorithm and its improved versions.

Key words: Credibility assessment, Disagreement-based semi-supervised learning, Diversity measure, Ensemble learning

CLC Number: 

  • TP181
[1] GONG S,ZHAO C.Intrusion detection system based on classification[C]//IEEE International Conference on Intelligent Control.IEEE,2012:78-83.
[2] MAZEL J,CASAS P,LABIT Y,et al.Sub-Space clustering,Inter-Clustering Results Association & anomaly correlation for unsupervised network anomaly detection[C]//7th International Conference on Network and Service Management(CNSM 2011).IEEE,Paris,France,2011:1-8.
[3] ZHOU Z H,LI M.Semi-supervised learning by disagreement[J].Knowledge & Information Systems,2010,24(3):415-439.
[4] ZHU X J,GHAHRAMANI Z,LAFFERTY J D.Semi-Super-vised Learning Using Gaussian Fields and Harmonic Functions[C]//Machine Learning,Proceedings of the Twentieth International Conference(ICML 2003).Washington,DC,USA.2003:912-919.
[5] BLUM A,MITCHELL T.Combining Labeled and UnlabeledData with Co-Training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory.Madison:ACM,1998:92-100.
[6] CHEN S J,LIU J F,HUANG Q C,et al.Conditional Value-based Co-training[J].Acta Automatica Sinica,2013,39(10):1665-1673.
[7] KATZ G,CARAGEA C,SHABTAI A.Vertical Ensemble Co-Training for Text Classification[J].ACM Transactions on Intelligent Systems and Technology,2017,9(2):1-23.
[8] LU J,GONG Y.A co-training method based on entropy and multi-criteria[J].Applied Intelligence,2021,51(6):1-14.
[9] ZHOU Z H,LI M.Tri-training:exploiting unlabeled data using three classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541.
[10] XU G,ZHAO J,HUANG D.An improved social spammer detection based on tri-training[C]//2016 IEEE International Conference on Big Data(Big Data).IEEE,2016:4040-4042.
[11] LI J,WEI Z,LI K.A Novel Semi-supervised SVM Based on Tri-training for Intrusition Detection[J].Journal of Computers,2010,5(4):638-645.
[12] SØGAARD A.Simple semi-supervised training of part-of-speech taggers[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:ACL,2010:205-208.
[13] RUDER S,PLANK B.Strong Baselines for Neural Semi-supervised Learning under Domain Shift[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:ACL,2018:1044-1054.
[14] ZHANG Y,CHEN R R,ZHANG J.Safe Tri-training Algorithm Based on Cross Entropy[J].Journal of Computer Research and Development,2021,58(1):60-69.
[15] MELVILLE P,MOONEY R J.Creating diversity in ensembles using artificial data[J].Information Fusion,2005,6(1):99-111.
[16] ZHU X J,GHAHRAMANI Z.Learning from labels and unlabeled data with label propagation[J].Tech Report,2002,3175(2004):237-244.
[17] ZHOU Z H.Disagreement-based Semi-supervised Learning[J].Acta Automatica Sinica,2013,39(11):1871-1878.
[18] ANGLUIN D,LAIRD P.Learning From Noisy Examples[J].Machine Learning,1988,2(4):343-370.
[19] ZHANG C X,WANG G W,ZHANG J S.An empirical bias-variance analysis of DECORATE ensemble method at different training sample sizes[J].Journal of Applied Statistics,2012,39(3/4):829-850.
[20] SUN B,WANG J D,CHEN H Y,et al.Diversity measures in ensemble learning[J].Control and Decision,2014(3):385-395.
[21] WANG W,ZHOU Z H.Analyzing Co-training Style Algorithms[C]//European Conference on Machine Learning.Springer-Verlag,2007:454-465.
[22] KUNCHEVA L I,WHITAKER C J.Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy[J].Machine Learning,2003,51(2):181-207.
[23] CHU R,WANG M,ZENG X Q,et al.A New Diverse Measure in Ensemble Learning Using Unlabeled Data[C]//2012 Fourth International Conference on Computational Intelligence,Communication Systems and Networks(CICSyN).IEEE,2012:18-21.
[24] ZHANG M L,ZHOU Z H.Exploiting unlabeled data to enhance ensemble diversity[J].Data Mining and Knowledge Discovery,2013,26(1):98-129.
[25] DUA D,GRAFF C.UCI Machine Learning Repository[DB/OL].[2019-12-10].https://archive.ics.uci.edu/ml/.
[1] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[2] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[3] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[4] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[5] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[6] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[7] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[8] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[9] DAI Zong-ming, HU Kai, XIE Jie, GUO Ya. Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets [J]. Computer Science, 2021, 48(6A): 270-274.
[10] HUAN Wen-ming, LIN Hai-tao. Design of Intrusion Detection System Based on Sampling Ensemble Algorithm [J]. Computer Science, 2021, 48(11A): 705-712.
[11] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[12] YANG Xiao-jun, XU Zhong-fu, ZHANG Xing, SUN Dan-hui. Overview and Difficulties Analysis on Credibility Assessment of Simulation Models [J]. Computer Science, 2019, 46(6A): 23-29.
[13] CAO Ya-xi, HUANG Hai-yan. Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning [J]. Computer Science, 2019, 46(5): 203-208.
[14] HU Hai-gen, KONG Xiang-yong, ZHOU Qian-wei, GUAN Qiu, CHEN Sheng-yong. Melanoma Classification Method by Integrating Deep Convolutional Residual Network [J]. Computer Science, 2019, 46(5): 247-253.
[15] YUAN Ding, WANG Qian, DENG Li-wei. Clustering Assist Feature Alignment for Unsupervised Domain Adaptation [J]. Computer Science, 2019, 46(3): 221-226.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!