计算机科学 ›› 2013, Vol. 40 ›› Issue (11): 236-241.
郭华平,范明
GUO Hua-ping and FAN Ming
摘要: 基于决策树的组合分类器可以看作一个森林。提出了一种森林剪枝算法来对森林进行剪枝,以简化组合分类器的结构,并提高其分类准确率。传统的决策树剪枝只考虑剪枝对单棵决策树的影响,而森林剪枝则把所有决策树看作一个整体,更加关注剪枝对组合分类器的性能影响。为了确定森林的哪些分枝可以被剪枝,提出一种称作贡献增益的度量。子树的贡献增益不仅与它所在的决策树的分类准确率有关,而且也与诸决策树的差异性有关,因此它较好地度量了一个结点扩展为一棵子树对组合分类器分类准确率的提高程度。借助于贡献增益,设计了一种基于结点贡献增益的森林剪枝算法FTCG。实验表明,无论森林是基于某种算法(如bagging)构建的还是某种组合分类器选择算法(如EPIC[1])的结果,无论每棵决策树是未剪枝的还是剪枝后的,FTCG都能进一步降低每棵决策树的规模,并且在大部分数据集上显著提高了剪枝后的组合分类器的分类准确率。
[1] Lu Z Y,Wu X D,Zhu X Q,et al.Ensemble Pruning via Individual Contribution Ordering[A]∥Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010[C].2010:871-880 [2] Breiman L.Bagging predictors[J].Machine Learning,1996,24(2):123-144 [3] Freund Y,Schapire R F.A decision-theoretic generalization ofon-line learning and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139 [4] Rodriguez J J,Kuncheva L I,Alonso C J.Rotation forest:A newclassifier ensemble method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI),2006,28(10):1619-1630 [5] Martinez-Munoz G,Suacutearez A.Pruning in ordered bagging ensembles[C]∥Proceedings of the Twenty-Third International Conference on Machine Learning(ICML 2006).2006:609-616 [6] Partalas I,Tsoumakas G,Vlahavas I P.An ensemble uncertainty aware measure for directed hill climbing ensemble pruning[J].Machine Learning,2010,81(3):267-282 [7] Guo H P,Fan M.Ensemble Pruning via Base-Classifier Replacement[A]∥The 12th International Conference on Web-Age Information Management(WAIM)[C].Springer 2011:505-516 [8] Quinlan J R.C4.5:programs for machine learning[M].Morgan Kaufmann,1993 [9] Webb G I.Further Experimental Evidence against the Utility ofOccam''s Razor[J].Journal of Artificial Intelligence Research,1996,4:397-417 [10] Kuncheva L I.Combining Pattern Classifiers:Methods and Algorithms[J].John Wiley and Sons,2004 [11] Schapire R,Freund Y,Bartlett P,et al.Boosting the margin:Anew explanation for the effectiveness of voting methods[J].The Annals of Statistics,1998:1651-1686 [12] Breiman L.Arcing the edge[R].University of California,Berkeley,CA,1997 [13] Asuncion D N A.UCI machine learning repository.http://archive.ics.uci.edu/ml/,2007 [14] Witten I H,Frank E.Data Mining:Practical Machine Learning Tools and Techniques(Second Edition)[M].Morgan Kaufmann,2005 |
No related articles found! |
|