Computer Science ›› 2017, Vol. 44 ›› Issue (Z11): 98-101.doi: 10.11896/j.issn.1002-137X.2017.11A.019

Previous Articles     Next Articles

Cost-sensitive Random Forest Classifier with New Impurity Measurement

SHI Yan-wen and WANG Hong-jie   

  • Online:2018-12-01 Published:2018-12-01

Abstract: For the problem of effective classification on imbalanced data sets,a classifier combining cost-sensitive learning and random forest algorithm is proposed.Firstly,a new impurity measure is proposed,taking into account not only the total cost of the decision tree,but also the cost difference of the same node for different samples.Then,the random forest algorithm is executed,K times sampling for the data set is performed,and K basic classifiers are built.Then,the decision tree is constructed by the classification regression tree (CART) algorithm based on the proposed impurity measure,so as to form the decision tree forest.Finally,the random forest algorithm makes the data classification decision by voting mechanism.In the UCI database,compared with the traditional random forest and the existing cost-sensitive random forest classifier,this classifier has good performance in the classification accuracy,AUC area and Kappa coefficient.

Key words: Cost-sensitive learning,Random forest,Impurity measurement,Classification regression tree (CART),Imbalanced data

[1] 刘偲,秦亮曦.模糊决策粗糙集代价敏感属性约简研究[J].计算机科学,2016,43(S2):67-72.
[2] LPEZ V,FERNNDEZ A,MORENO-TORRES J G,et al.Analysis of preprocessing vs.cost-sensitive learning for imbalanced classification.Open problems on intrinsic data characteristics[J].Expert Systems with Applications,2012,39(7):6585-6608.
[3] AODHA O M,BROSTOW G J.Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees[J].2013,25(6):193-200.
[4] 邓生雄,雒江涛,刘勇,等.集成随机森林的分类模型[J].计算机应用研究,2015,32(6):1621-1624.
[5] 赵士伟,卓力,王素玉,等.一种基于NNIA多目标优化的代价敏感决策树构建方法[J].电子学报,2011,39(10):2348-2352.
[6] BAHNSEN A C,AOUADA D,OTTERSTEN B.Example-dependent cost-sensitive decision trees[J].Expert Systems with Applications,2015,42(19):6609-6619.
[7] 邓少军,冯少荣,林子雨.一种新的多分类代价敏感算法[J].厦门大学学报(自然科学版),2017,56(2):231-236.
[8] THAI -NGHE N,GANTNER Z,SCHMIDT-T HIEME L.Cost-sensitive learning methods for imbalanced data[C]∥International Joint Conference on Neural Networks.IEEE,2010:1-8.
[9] ZHOU Q,ZHOU H,LI T.Cost-sensitive feature selection using random forest:Selecting low-cost subsets of informative features[J].Knowledge-Based Systems,2016,95(3):1-11.
[10] 王爱平,万国伟,程志全,等.支持在线学习的增量式极端随机森林分类器[J].软件学报,2011,22(9):2059-2074.
[11] 张钰,陈珺,王晓峰,等.随机森林在滚动轴承故障诊断中的应用[J].计算机工程与应用,2017,3(6):312-319.
[12] 胡记兵.基于决策树的组合分类器的构建和部署[D].杭州:浙江工业大学,2008:17-18.
[13] SOFEIKOV K I,TYUKIN I Y,GORBAN A N,et al.Learning optimization for decision tree classification of non-categorical data with information gain impurity criterion[C]∥International Joint Conference on Neural Networks.IEEE,2014:3548-3555.
[14] D’AMBROSIO A,TUTORE V A.Conditional Classification Treesby Weighting the Gini Impurity Measure[M]∥New Perspectives in Statistical Modeling and Data Analysis.Springer Berlin Heidelberg,2011:273-280.
[15] 黄光鑫.支持向量数据描述与支持向量机及其应用[D].成都:电子科技大学,2011:64-66.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!