基于贡献增益的森林剪枝

计算机科学 ›› 2013, Vol. 40 ›› Issue (11): 236-241.

基于贡献增益的森林剪枝

郭华平,范明

郑州大学信息工程学院郑州450052;郑州大学信息工程学院郑州450052

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然科学基金项目(偏好学习的若干关键技术研究,60901078)资助

Forest Thinning via Contribution Gain

GUO Hua-ping and FAN Ming

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 基于决策树的组合分类器可以看作一个森林。提出了一种森林剪枝算法来对森林进行剪枝,以简化组合分类器的结构,并提高其分类准确率。传统的决策树剪枝只考虑剪枝对单棵决策树的影响,而森林剪枝则把所有决策树看作一个整体,更加关注剪枝对组合分类器的性能影响。为了确定森林的哪些分枝可以被剪枝,提出一种称作贡献增益的度量。子树的贡献增益不仅与它所在的决策树的分类准确率有关,而且也与诸决策树的差异性有关,因此它较好地度量了一个结点扩展为一棵子树对组合分类器分类准确率的提高程度。借助于贡献增益,设计了一种基于结点贡献增益的森林剪枝算法FTCG。实验表明,无论森林是基于某种算法(如bagging)构建的还是某种组合分类器选择算法(如EPIC[1])的结果,无论每棵决策树是未剪枝的还是剪枝后的,FTCG都能进一步降低每棵决策树的规模,并且在大部分数据集上显著提高了剪枝后的组合分类器的分类准确率。

关键词: 森林剪枝,组合选择,贡献增益

Abstract: An ensemble consisting of decision trees can be treated as a forest．We proposed a new strategy called forest thinning to reduce the ensemble size and improve its accuracy．Unlike traditional decision tree pruning methods,forest thinning treats all decision trees in a forest as a whole and try to evaluate the performance influence on the ensemble when a certain branch is pruned．To determine which branches should be pruned,we proposed a new metric called contribution gain．The contribution gain of a subtree is related not only to the accuracy of the host tree,but also to the diversity of trees in the ensemble,so it reasonably well measures how much improvement of the ensemble accuracy can be achieved when a subtree is pruned．With the contribution gain,we designed a forest thinning algorithm named FTCG(Forest Thinning via Contribution Gain)．Our experiments show that forest thinning can significantly reduce forests structure complexity and improve their accuracy in most of data sets,no matter ensembles are constructed by a certain algorithm such as bagging,or obtained by an ensemble selection algorithm such as EPIC [1],no matter whether each decision tree is pruned or unpruned.

Key words: Forest thinning,Ensemble selection,Contribution gain

郭华平,范明. 基于贡献增益的森林剪枝[J]. 计算机科学, 2013, 40(11): 236-241. https://doi.org/

GUO Hua-ping and FAN Ming. Forest Thinning via Contribution Gain[J]. Computer Science, 2013, 40(11): 236-241. https://doi.org/

参考文献

[1] Lu Z Y,Wu X D,Zhu X Q,et al．Ensemble Pruning via Individual Contribution Ordering[A]∥Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010[C]．2010:871-880
[2] Breiman L．Bagging predictors[J]．Machine Learning,1996,24(2):123-144
[3] Freund Y,Schapire R F．A decision-theoretic generalization ofon-line learning and an application to boosting[J]．Journal of Computer and System Sciences,1997,55(1):119-139
[4] Rodriguez J J,Kuncheva L I,Alonso C J．Rotation forest:A newclassifier ensemble method[J]．IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI),2006,28(10):1619-1630
[5] Martinez-Munoz G,Suacutearez A．Pruning in ordered bagging ensembles[C]∥Proceedings of the Twenty-Third International Conference on Machine Learning(ICML 2006)．2006:609-616
[6] Partalas I,Tsoumakas G,Vlahavas I P．An ensemble uncertainty aware measure for directed hill climbing ensemble pruning[J]．Machine Learning,2010,81(3):267-282
[7] Guo H P,Fan M．Ensemble Pruning via Base-Classifier Replacement[A]∥The 12th International Conference on Web-Age Information Management(WAIM)[C]．Springer 2011:505-516
[8] Quinlan J R．C4.5:programs for machine learning[M]．Morgan Kaufmann,1993
[9] Webb G I．Further Experimental Evidence against the Utility ofOccam''s Razor[J]．Journal of Artificial Intelligence Research,1996,4:397-417
[10] Kuncheva L I．Combining Pattern Classifiers:Methods and Algorithms[J]．John Wiley and Sons,2004
[11] Schapire R,Freund Y,Bartlett P,et al.Boosting the margin:Anew explanation for the effectiveness of voting methods[J].The Annals of Statistics,1998:1651-1686
[12] Breiman L．Arcing the edge[R]．University of California,Berkeley,CA,1997
[13] Asuncion D N A．UCI machine learning repository．http://archive.ics.uci.edu/ml/,2007
[14] Witten I H,Frank E．Data Mining:Practical Machine Learning Tools and Techniques(Second Edition)[M]．Morgan Kaufmann,2005

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed