计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 196-201.doi: 10.11896/j.issn.1002-137X.2019.02.030
关晓蔷, 庞继芳, 梁吉业
GUAN Xiao-qiang, PANG Ji-fang, LIANG Ji-ye
摘要: 随机森林是数据挖掘和机器学习领域中一种常用的分类方法,已成为国内外学者共同关注的研究热点,并被广泛应用到各种实际问题中。传统的随机森林方法没有考虑类别个数对分类效果的影响,忽略了基分类器和类别之间的关联性,导致随机森林在处理多分类问题时的性能受到限制。为了更好地解决该问题,结合多分类问题的特点,提出一种基于类别随机化的随机森林算法(RCRF)。从类别的角度出发,在随机森林两种传统随机化的基础上增加类别随机化,为不同类别设计具有不同侧重点的基分类器。由于不同的分类器侧重区分的类别不同,所生成的决策树的结构也不同,这样既能够保证单个基分类器的性能,又可以进一步增大基分类器的多样性。为了验证所提算法的有效性,在UCI数据库中的21个数据集上将RCRF与其他算法进行了比较分析。实验从两个方面进行,一方面,通过准确率、F1-measure和Kappa系数3个指标来验证RCRF算法的性能;另一方面,利用κ-误差图从多样性角度对各种算法进行对比与分析。实验结果表明,所提算法能够有效提升集成模型的整体性能,在处理多分类问题时具有明显优势。
中图分类号:
[1]BREIMAN L.Random Forests [J].Machine Learning,2001,45(1):5-23. [2]FERNANDEZ-DELGADO M,CERNADAS E,BARRO S,et al. Do we need hundreds of classifiers to solve real world classification problems [J].Journal of Machine Learning Research,2014,15(1):3133-3181. [3]MEHER P K,SAHU T K,RAO A R.Identification of species based on DNA barcode using k-mer feature vector and random forest classifier [J].Gene,2016,592(2):316-324. [4]JOG A,CARASS A,ROY S,et al.Random forest regression for magnetic resonance image synthesis [J].Medical Image Analysis,2017,35:475-488. [5]WANG S,LIU J,BI Y Y,et al.Automatic recognition of breast gland based on two-step clustering and random forest [J].Computer Science,2018,45(3):247-252.(in Chinese) 王帅,刘娟,毕姚姚,等.基于两步聚类和随机森林的乳腺腺管自动识别方法 [J].计算机科学,2018,45(3):247-252. [6]FANELLI G,DANTONE M,GALL J,et al.Random forests for real time 3D face analysis [J].International Journal of Computer Vision,2013,101(3):437-458. [7]GALL J,YAO A,RAZAVI N,et al.Hough forests for object detection,tracking,and action recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(11):2188-2202. [8]GEURTS P,ERNST D,WEHENKEL L.Extremely randomized trees [J].Machine Learning,2006,63(1):3-42. [9]RODRIGUEZ J J,KUNCHEVA L I,ALONSO C J.Rotation forest:a new classifier ensemble method [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2006,28(10):1619-1630. [10]ZHANG L,SUGANTHAN P N.Random forests with ensemble of feature spaces [J].Pattern Recognition,2014,47(10):3429-3437. [11]ABELLÁN J,MANTAS C J,CASTELLANO J G.A random forest approach using imprecise probabilities [J].Knowledge- Based Systems,2017,134:72-84. [12]WANG Y,XIA S T,TANG Q,et al.A novel consistent random forest framework:bernoulli random forests [J].IEEE Transactions on Neural Networks & Learning Systems,2018,29(8):3510-3523. [13]YE Y,WU Q,HUANG J Z,et al.Stratified sampling for feature subspace selection in random forests for high dimensional data [J].Pattern Recognition,2013,46(3):769-787. [14]XIA J,LI L,LI L,et al.Adjusted weight voting algorithm for random forests in handling missing values [J].Pattern Recognition,2017,69(C):52-60. [15]HU C,CHEN Y,HU L,et al.A novel random forests based class incremental learning method for activity recognition [J].Pattern Recognition,2018,78:277-290. [16]BREIMAN L.Bagging predictors [J].Machine Learning,1996,24(2):123-140. [17]HO T K.The random subspace method for constructing decision forests [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844. [18]DEMSAR J.Statistical comparisons of classifiers over multiple data sets [J].Journal of Machine Learning Research,2006,7(1):1-30. [19]MARGINEANTU D D,DIETTERICH T G.Pruning adaptive boosting [C]∥Fourteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1997:211-218. |
[1] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏. Grassberger熵随机森林在窃电行为检测的应用 Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection 计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032 |
[4] | 王文强, 贾星星, 李朋. 自适应的集成定序算法 Adaptive Ensemble Ordering Algorithm 计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108 |
[5] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[6] | 陈壮, 邹海涛, 郑尚, 于化龙, 高尚. 基于用户覆盖及评分差异的多样性推荐算法 Diversity Recommendation Algorithm Based on User Coverage and Rating Differences 计算机科学, 2022, 49(5): 159-164. https://doi.org/10.11896/jsjkx.210300263 |
[7] | 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江. 基于眼前节相干光断层扫描成像的核性白内障分类算法 Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image 计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085 |
[8] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[9] | 刘意, 毛莺池, 程杨堃, 高建, 王龙宝. 基于邻域一致性的异常检测序列集成方法 Locality and Consistency Based Sequential Ensemble Method for Outlier Detection 计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156 |
[10] | 杨小琴, 刘国军, 郭建慧, 马文涛. 基于随机森林的空域-频域联合特征全参考彩色图像质量评价方法 Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest 计算机科学, 2021, 48(8): 99-105. https://doi.org/10.11896/jsjkx.200700106 |
[11] | 郑建华, 李小敏, 刘双印, 李迪. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法 Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling 计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120 |
[12] | 曹扬晨, 朱国胜, 祁小云, 邹洁. 基于随机森林的入侵检测分类研究 Research on Intrusion Detection Classification Based on Random Forest 计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161 |
[13] | 李娜娜, 王勇, 周林, 邹春明, 田英杰, 郭乃网. 基于特征重要度二次筛选的DDoS攻击随机森林检测方法 DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance 计算机科学, 2021, 48(6A): 464-467. https://doi.org/10.11896/jsjkx.200900101 |
[14] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[15] | 徐佳庆, 胡小月, 唐付桥, 王强, 何杰. 基于随机森林的高性能互连网络阻塞故障检测 Detecting Blocking Failure in High Performance Interconnection Networks Based on Random Forest 计算机科学, 2021, 48(6): 246-252. https://doi.org/10.11896/jsjkx.201200142 |
|