基于贡献增益的森林剪枝

Abstract

Abstract: An ensemble consisting of decision trees can be treated as a forest．We proposed a new strategy called forest thinning to reduce the ensemble size and improve its accuracy．Unlike traditional decision tree pruning methods,forest thinning treats all decision trees in a forest as a whole and try to evaluate the performance influence on the ensemble when a certain branch is pruned．To determine which branches should be pruned,we proposed a new metric called contribution gain．The contribution gain of a subtree is related not only to the accuracy of the host tree,but also to the diversity of trees in the ensemble,so it reasonably well measures how much improvement of the ensemble accuracy can be achieved when a subtree is pruned．With the contribution gain,we designed a forest thinning algorithm named FTCG(Forest Thinning via Contribution Gain)．Our experiments show that forest thinning can significantly reduce forests structure complexity and improve their accuracy in most of data sets,no matter ensembles are constructed by a certain algorithm such as bagging,or obtained by an ensemble selection algorithm such as EPIC [1],no matter whether each decision tree is pruned or unpruned.

Key words: Forest thinning,Ensemble selection,Contribution gain

GUO Hua-ping and FAN Ming. Forest Thinning via Contribution Gain[J].Computer Science, 2013, 40(11): 236-241.

References

[1] Lu Z Y,Wu X D,Zhu X Q,et al．Ensemble Pruning via Individual Contribution Ordering[A]∥Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010[C]．2010:871-880
[2] Breiman L．Bagging predictors[J]．Machine Learning,1996,24(2):123-144
[3] Freund Y,Schapire R F．A decision-theoretic generalization ofon-line learning and an application to boosting[J]．Journal of Computer and System Sciences,1997,55(1):119-139
[4] Rodriguez J J,Kuncheva L I,Alonso C J．Rotation forest:A newclassifier ensemble method[J]．IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI),2006,28(10):1619-1630
[5] Martinez-Munoz G,Suacutearez A．Pruning in ordered bagging ensembles[C]∥Proceedings of the Twenty-Third International Conference on Machine Learning(ICML 2006)．2006:609-616
[6] Partalas I,Tsoumakas G,Vlahavas I P．An ensemble uncertainty aware measure for directed hill climbing ensemble pruning[J]．Machine Learning,2010,81(3):267-282
[7] Guo H P,Fan M．Ensemble Pruning via Base-Classifier Replacement[A]∥The 12th International Conference on Web-Age Information Management(WAIM)[C]．Springer 2011:505-516
[8] Quinlan J R．C4.5:programs for machine learning[M]．Morgan Kaufmann,1993
[9] Webb G I．Further Experimental Evidence against the Utility ofOccam''s Razor[J]．Journal of Artificial Intelligence Research,1996,4:397-417
[10] Kuncheva L I．Combining Pattern Classifiers:Methods and Algorithms[J]．John Wiley and Sons,2004
[11] Schapire R,Freund Y,Bartlett P,et al.Boosting the margin:Anew explanation for the effectiveness of voting methods[J].The Annals of Statistics,1998:1651-1686
[12] Breiman L．Arcing the edge[R]．University of California,Berkeley,CA,1997
[13] Asuncion D N A．UCI machine learning repository．http://archive.ics.uci.edu/ml/,2007
[14] Witten I H,Frank E．Data Mining:Practical Machine Learning Tools and Techniques(Second Edition)[M]．Morgan Kaufmann,2005

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Forest Thinning via Contribution Gain

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0