计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 48-52.doi: 10.11896/j.issn.1002-137X.2017.01.009

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

基于分层筛选和动态更新的并行选择集成算法

吴梅红,郭佳盛,鞠颖,林子雨,邹权   

  1. 厦门大学计算机科学系 厦门361005,厦门大学计算机科学系 厦门361005,厦门大学计算机科学系 厦门361005,厦门大学计算机科学系 厦门361005,厦门大学计算机科学系 厦门361005;天津大学计算机科学与技术学院 天津300072
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61370010,4,31200769)资助

Selective Ensemble Learning Algorithm Based on Hierarchical Selection and Dynamic Updating in Parallel

WU Mei-hong, GUO Jia-sheng, JU Ying, LIN Zi-yu and ZOU Quan   

  • Online:2018-11-13 Published:2018-11-13

摘要: 提出一种选择性集成学习算法,该算法利用多线程并行优化基分类器的参数,通过多层筛选和动态更新筛选信息获取最优的候选基分类器集合,解决了以往在集成学习中选择分类器效率低下的问题。集成分类器采用分解合并的策略进行加权投票,通过使用二分法将大数据集的投票任务递归分解成多个子任务,并行运行子任务后合并投票结果以缩短集成分类器的投票运行时间。实验结果表明, 相对于传统方法, 所提出的算法在平均精度、F1-Measure以及AUC指标上都有着显著提升。

关键词: 选择性集成学习,分治算法,并行计算,分类

Abstract: In this paper,a selective ensemble learning algorithm was proposed based on hierarchical selection and dynamic updating,which can optimize the parameters of classifier with multi-thread technique and select the sub sequence set of classifiers based on hierarchical selection and dynamical information.It can solve the problem in the past for choosing classifier to ensemble learning inefficiently.In addition,divide-and-conquer strategy is employed to reduce the time cost for ensemble voting.The big voting task can be divided recursively into small child task by dichotomy,then the tasks are executed in parallel and it would conquer the voting result.Experimental results show that the selective algorithm can outperform the traditional classification algorithms on F1-Measure and AUC.

Key words: Selective ensemble learning,Divide-and-conquer,Parallel computation,Classification

[1] ZHANG Min-ling,ZHOU Zhi-hua.A review on multi-labellearning algorithms [J].IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837.
[2] WEI Le-yi,LIAO Ming-hong,GAO Yue,et al.Improved andPromising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(1):192-201.
[3] HU Yao,JIN Zhong-ming,SHI Yi,et al.Large scale multi-class classification with truncated nuclear norm regularization [J].Neurocomputing,2015,148:310-317.
[4] DENG Chao,GUO Mao-zu.Tri-training and data editing based semi-supervised clustering algorithm [J].Journal of Software,2008,19(3):663-673.(in Chinese) 邓超,郭茂祖.基于 Tri-Training 和数据剪辑的半监督聚类算法[J].软件学报,2008,19(3):663-673 .
[5] ZHANG Chun-xia.A Survey of Selective Ensemble Learning Al-gorithms [J].Chinese Journal of Computers,2011,4(8):1399-1410.(in Chinese) 张春霞.选择性集成学习算法综述.计算机学报[J],2011,4(8):1399-1410.
[6] LEOoB.Bagging predictors [J].Machine learning,1996,24(2):123-140.
[7] RTSCH,GUNNAR,ONODA T,et al.Soft margins for AdaBoost [J].Machine learning,2001,42(3):287-320.
[8] ROBERT S.The strength of weak learnability[J].MachineLearning,1990,5(2):197-227.
[9] YOAV F.Boosting a weak learning algorithm by majority [J].Information and Computation,1995,121(2):256-285.
[10] LIN Chen,CHEN Wen-qiang,QIU Cheng,et al.LibD3C:Ensemble classifiers with a clustering and dynamic selection strategy [J].Neurocomputing,2014,3:424-435.
[11] ZHOU Zhi-hua,WU Jian-xin,TANG Wei.Ensembling neuralnetworks:many could be better than all [J].Artificial intelligence,2002,137(1):239-263.
[12] HAO Hong-wei,WANG Zhi-bin,YIN Xu-cheng,et al.Dynamic selection and circulating combination for multipleclassifier systems[J].Acta Automatica Sinica,2011,7(11):1290-1295.(in Chinese) 郝红卫,王志彬,殷绪成,等,分类器的动态选择与循环集成方法[J].自动化学报,2011,7(11):1290-1295.
[13] CAI Deng,ZHANG Chi-yuan,HE Xiao-fei.Unsupervised fea-ture selection for multi-cluster data[C]∥Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2010:333-342.
[14] ZOU Quan,LI Xu-bin,JIANG Wen-rui,et al.Survey of MapReduce Frame Operation in Bioinformatics[J].Briefings in Bioinformatics,2014,5(4):637-647.
[15] ZOU Quan,GUO Jia-sheng,JU Ying,et al.Improving tR-NAscan-SE annotation results via ensemble classifiers [J].Molecular Informatics,2015,4(11/12):761-770.
[16] YANG Chun,YIN Xu-cheng,HAO Hong-wei.Classifier En-semble with Diversity:Effectiveness Analysis and Ensemble Optimization [J].Acta Automatica Sinica,2014,40(4):660-674.(in Chinese) 杨春,殷绪成,郝红卫,基于差异性的分类器集成:有效性分析及优化集成[J].自动化学报,2014,0(4):660-674.
[17] LIN Chen,ZOU Ying,QIN Ji,et al.Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier [J].PLoS One,2013,8(2):e56499 .
[18] MOSHE L.UCI machine learning repository .http://archive.ics.uci.edu/ml.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!