计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 98-101.doi: 10.11896/j.issn.1002-137X.2017.11A.019

• 智能计算 • 上一篇    下一篇

基于新型不纯度度量的代价敏感随机森林分类器

师彦文,王宏杰   

  1. 西南石油大学计算机科学学院 成都610500,西南石油大学计算机科学学院 成都610500
  • 出版日期:2018-12-01 发布日期:2018-12-01

Cost-sensitive Random Forest Classifier with New Impurity Measurement

SHI Yan-wen and WANG Hong-jie   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对不平衡数据集的有效分类问题,提出一种结合代价敏感学习和随机森林算法的分类器。首先提出了一种新型不纯度度量,该度量不仅考虑了决策树的总代价,还考虑了同一节点对于不同样本的代价差异;其次,执行随机森林算法,对数据集作K次抽样,构建K个基础分类器;然后,基于提出的不纯度度量,通过分类回归树(CART)算法来构建决策树,从而形成决策树森林;最后,随机森林通过投票机制做出数据分类决策。在UCI数据库上进行实验,与传统随机森林和现有的代价敏感随机森林分类器相比,该分类器在分类精度、AUC面积和Kappa系数这3种性能度量上都具有良好的表现。

关键词: 代价敏感学习,随机森林,不纯度度量,分类回归树(CART),不平衡数据

Abstract: For the problem of effective classification on imbalanced data sets,a classifier combining cost-sensitive learning and random forest algorithm is proposed.Firstly,a new impurity measure is proposed,taking into account not only the total cost of the decision tree,but also the cost difference of the same node for different samples.Then,the random forest algorithm is executed,K times sampling for the data set is performed,and K basic classifiers are built.Then,the decision tree is constructed by the classification regression tree (CART) algorithm based on the proposed impurity measure,so as to form the decision tree forest.Finally,the random forest algorithm makes the data classification decision by voting mechanism.In the UCI database,compared with the traditional random forest and the existing cost-sensitive random forest classifier,this classifier has good performance in the classification accuracy,AUC area and Kappa coefficient.

Key words: Cost-sensitive learning,Random forest,Impurity measurement,Classification regression tree (CART),Imbalanced data

[1] 刘偲,秦亮曦.模糊决策粗糙集代价敏感属性约简研究[J].计算机科学,2016,43(S2):67-72.
[2] LPEZ V,FERNNDEZ A,MORENO-TORRES J G,et al.Analysis of preprocessing vs.cost-sensitive learning for imbalanced classification.Open problems on intrinsic data characteristics[J].Expert Systems with Applications,2012,39(7):6585-6608.
[3] AODHA O M,BROSTOW G J.Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees[J].2013,25(6):193-200.
[4] 邓生雄,雒江涛,刘勇,等.集成随机森林的分类模型[J].计算机应用研究,2015,32(6):1621-1624.
[5] 赵士伟,卓力,王素玉,等.一种基于NNIA多目标优化的代价敏感决策树构建方法[J].电子学报,2011,39(10):2348-2352.
[6] BAHNSEN A C,AOUADA D,OTTERSTEN B.Example-dependent cost-sensitive decision trees[J].Expert Systems with Applications,2015,42(19):6609-6619.
[7] 邓少军,冯少荣,林子雨.一种新的多分类代价敏感算法[J].厦门大学学报(自然科学版),2017,56(2):231-236.
[8] THAI -NGHE N,GANTNER Z,SCHMIDT-T HIEME L.Cost-sensitive learning methods for imbalanced data[C]∥International Joint Conference on Neural Networks.IEEE,2010:1-8.
[9] ZHOU Q,ZHOU H,LI T.Cost-sensitive feature selection using random forest:Selecting low-cost subsets of informative features[J].Knowledge-Based Systems,2016,95(3):1-11.
[10] 王爱平,万国伟,程志全,等.支持在线学习的增量式极端随机森林分类器[J].软件学报,2011,22(9):2059-2074.
[11] 张钰,陈珺,王晓峰,等.随机森林在滚动轴承故障诊断中的应用[J].计算机工程与应用,2017,3(6):312-319.
[12] 胡记兵.基于决策树的组合分类器的构建和部署[D].杭州:浙江工业大学,2008:17-18.
[13] SOFEIKOV K I,TYUKIN I Y,GORBAN A N,et al.Learning optimization for decision tree classification of non-categorical data with information gain impurity criterion[C]∥International Joint Conference on Neural Networks.IEEE,2014:3548-3555.
[14] D’AMBROSIO A,TUTORE V A.Conditional Classification Treesby Weighting the Gini Impurity Measure[M]∥New Perspectives in Statistical Modeling and Data Analysis.Springer Berlin Heidelberg,2011:273-280.
[15] 黄光鑫.支持向量数据描述与支持向量机及其应用[D].成都:电子科技大学,2011:64-66.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!