计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 121-125.

• 智能计算 • 上一篇    下一篇

梯度优化决策树的集成学习及其应用

王延斌, 武优西, 刘洪普   

  1. 河北工业大学人工智能与数据科学学院 天津300401
    河北省大数据计算重点实验室 天津300401
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 作者简介:王延斌(1974-),男,硕士生,主要研究方向为机器学习;武优西(1974-),男,博士,教授,博士生导师,CCF高级会员,主要研究方向为数据挖掘与机器学习;刘洪普(1977-),男,博士生,讲师,主要研究方向为机器学习。
  • 基金资助:
    本文受河北省自然科学基金(F2016202145)资助。

Research and Application of Ensemble Learning Using Gradient Optimization Decision Tree

WANG Yan-bin, WU You-xi, LIU Hong-pu   

  1. School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    Hebei Province Key Laboratory of Big Data Calculation,Tianjin 300401,China
  • Online:2019-02-26 Published:2019-02-26

摘要: 集成学习通过构建具有一定互补功能的多个分类器来完成学习任务,以减少分类误差。但是当前研究未能考虑分类器的局部有效性。为此,在基于集成学习的框架下,提出了一个分层结构的多分类算法。该算法按预测类别分解问题,在分层的基础上,集成多个分类器以提高分类准确度。在美国某高校招生录取这一个实际应用的数据集及3个UCI数据集上进行实验,实验结果验证了该算法的有效性。

关键词: 层次化结构, 分类器融合, 集成学习, 梯度优化

Abstract: Ensemble learning completes the learning task by building multiple classifiers with certain complementary performance to reduce the classification error.However,the current research fails to consider the local validity of the classifier.In this paper,a hierarchical multi-class classification algorithm was proposed in the framework of ensemble learning.The algorithm decomposes the problem by predicted category,and integrates several weak classifiers on the basis of stratification to improve the prediction accuracy.The experimental results on a real data set of American College Matriculation Set and three UCI datasets verified the effectiveness of the algorithm.

Key words: Classifier fusion, Ensemble learning, Gradient optimization, Hierarchical structure

中图分类号: 

  • TP181
[1]DIETTERICH T G.Machine learning research four current directions[J].AI Magazine,1997,18(4):97-136.
[2]唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502.
[3]周志华.机器学习[M].北京:清华大学出版社,2016.
[4]唐春生,金以慧.基于全信息矩阵的多分类器集成方法[J].软件学报,2003,14(6):1103-1109.
[5]WOLPERT D H.Stacked generalization[J].Neural Networks,1992,5(2):241-259.
[6]ZHOU Z H,WU J X,TANG W.Ensembling neural networks:many could be better than all[J].Artificial Intelligence,2002,137(1):239-263.
[7]KO A R,SABOURIN R,BRITTO A S.From dynamic classifier selection to dynamic ensemble selection[J].Pattern Recognition,2008,41(5):1718-1731.
[8]方敏.集成学习的多分类器动态融合方法研究[J].系统工程与电子技术,2006,28(11):1759-1762.
[9]MITCHELL H B.Ensemble learning in data fusion:Concepts and ideas[M].Springer Berlin Heidelberg,2012.
[10]ROJARATH A,SONGPAN W,PONG-INWONG C.Improved ensemble learning for classification techniques based on majority voting[C]∥IEEE International Conference on Software Engineering and Service Science (ICSESS).IEEE,2017:107-110.
[11]ZHANG L,ZHOU W.Sparse ensembles using weighted combination methods based on linear programming[J].Pattern Recognition,2011,44(1):97-106.
[12]朱波,陈科,徐君,等.平均分布集成策略:一种新的分类器融合方法[J].小型微型计算机系统,2016,37(7):1546-1550.
[13]YU Z W,WANG D X,JANE Y,et al.Progressive subspace ensemble learning[J].Pattern Recognition,2016,60(C):692-705.
[14]DUTTA A,DASGUPTA P.Ensemble learning with weak classifiers for fast and reliable unknown terrain classification using mobile robots[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2017,47(11):2933-2944.
[15]TOLOMEI G,SILVESTRI F,HAINES A,et al.Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking[C]∥Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2017:465-474.
[16]高锋,黄海燕.基于邻域混合抽样和动态集成的不平衡数据分类方法[J].计算机科学,2017,44(8):225-229.
[17]LUO H Y,WANG D Y,YUE C Q,et al.Research and application of a novel hybrid decomposition-ensemble learning paradigm with errorcorrection for daily PM10 forecasting[J].Atmospheric Research,2018,201:34-45.
[18]ZHANG L,SHAH S K,KAKADIARIS I A.Hierarchical multi-label classification using fully associative ensemble learning[J].Pattern Recognition,2017,70:89-103.
[19]张春霞,张讲社.选择性集成学习算法综述[J].计算机学报,2011,34(8):1399-1410.
[20]BREIMAN L.Random forests [J].Machine Learning,2001, 45(1):5-32.
[21]WU Y,LIU D,JIANG H.Length-changeable incremental extreme learning machine[J].Journal of Computer Science and Technology,2017,32(3):630-643.
[22]LIU D,WU Y,JIANG H.FP-ELM:An online sequential lear-ning algorithm for dealing with concept drift[J].Neurocompu-ting,2016,207(26):322-334.
[1] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[2] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[3] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[4] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[5] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[6] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[7] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[8] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[9] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[10] 戴宗明, 胡凯, 谢捷, 郭亚.
基于直觉模糊集的集成学习算法
Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets
计算机科学, 2021, 48(6A): 270-274. https://doi.org/10.11896/jsjkx.200700036
[11] 郇文明, 林海涛.
基于采样集成算法的入侵检测系统设计
Design of Intrusion Detection System Based on Sampling Ensemble Algorithm
计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101
[12] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲.
FS-CRF:基于特征切分与级联随机森林的异常点检测模型
FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest
计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162
[13] 钟熙, 孙祥娥.
基于Kmeans++聚类的朴素贝叶斯集成方法研究
Research on Naive Bayes Ensemble Method Based on Kmeans++ Clustering
计算机科学, 2019, 46(6A): 439-441.
[14] 曹雅茜, 黄海燕.
基于概率采样和集成学习的不平衡数据分类算法
Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning
计算机科学, 2019, 46(5): 203-208. https://doi.org/10.11896/j.issn.1002-137X.2019.05.031
[15] 田振坤, 傅莺莺, 刘素红.
基于异构机器学习算法融合的遥感影像分类
Remote Sensing Image Classification Based on Heterogeneous Machine Learning Algorithm Fusion
计算机科学, 2019, 46(5): 235-240. https://doi.org/10.11896/j.issn.1002-137X.2019.05.036
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!