计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 108-114.doi: 10.11896/jsjkx.201200189
刘振宇1, 宋晓莹2
LIU Zhen-yu1, SONG Xiao-ying2
摘要: 针对线性回归、SVR以及大部分多变量回归树等回归模型不能直接利用分类型属性进行回归分析的问题,提出了一种可联合多种类型属性的决策树结点划分方法。该方法通过定义样本集合在分类型属性上的中心以及样本到中心的距离,使得分类型属性也可以像数值型属性一样参与样本的聚类过程,从而形成样本集的划分。之后,文中又为由该方法产生的决策树选择了合适的集成方案,生成的集成器被称为聚类回归森林(CRF)。最后,在12个UCI公开数据集上对比CRF与其他9个回归模型的回归平均绝对误差(MAE)和均方根误差(RMSE),实验结果表明,CRF在10个回归模型中具有最好的表现。
中图分类号:
[1]PAN J H,WANG Y H,WU W.Physical quantity regression method based on optimized BP neural network[J].Computer Science,2018,45(12):170-176. [2]CHEN W,LI H,HOU E K,et al.GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models[J].Science of the Total Environment,2018,634(9):853-867. [3]WANG N,LI Z,CHENG X Y.Reversible visible watermarkingalgorithm for medical image based on support vector regression[J].Computer Science,2019,34(10):2243-2248. [4]LOH W Y,SHIH Y S.Split selection methods for classification trees[J].Statistica Sinica,1999,7(4):815-840. [5]QUINLAN J R.C4.5:programs for machine learning[J].Machine Learning,1994,16(3):235-240. [6]BUNTINE W L.Learning classification trees[J].Statistics & Computing,1992,2(2):63-73. [7]BUCY R S,DIESPOSTI R S.Decision tree design by simulated annealing[J].ESAIM Mathematical Modelling and Numerical Analysis,1993,27(5):515-534. [8]MURTHY S K,KASIF S,SALZBERG S.A System for Induction of Oblique Decision Trees[J].Journal of Artificial Intelligence Research,1996,2(1):1-32. [9]LÓPEZ-CHAU A,CERVANTES J,LÓPEZ-GARCÍA L,et al.Fisher's decision tree[J].Expert Systems with Applications,2013,40(16):6283-6291. [10]HONG K S,OOI P L,YE C K,et al.Multivariate alternating decision trees[J].Pattern Recognition,2016,50(C):195-209. [11]WICKRAMARACHCHI D C,ROBERTSON B L,REALE M,et al.HHCART:An Oblique Decision Tree[J].Computational Statistics & Data Analysis,2015,96:12-23. [12]BJOERN H M,KELM B M,DANIEL N S,et al.On Oblique Random Forests[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer,2011:453-469. [13]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32. [14]BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140. [15]HO T K.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844. [16]FREUND Y,SCHAPIRE R.A decision-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computing System,1997,55:119-139. [17]FRIEDMAN J H.Greedy function approximation:a gradientboosting machine[J].The Annals of Statistics,2001,29(5):1189-1232. [18]WANG X H,ZHANG L,LI J Q,et al.Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest[J].Computer Science,2020,47(S2):454-458. [19]QU W L,CHEN X Y,LI Y Y,et al.A regression prediction model of depth gradient boosting[J].Computer Applications and Software,2020,37(9):194-201. [20]LIU Z Y,SONG X Y.An applicable multivariate decision tree algorithm for categorical attribute data[J].Journal of Northeastern University (Natural Science),2020,41(11):1521-1527. [21]GENRIKHOV I E,DJUKOVA E V,ZHURAVLEV V I.On full regression decision trees[J].Pattern Recognition and Image Analysis,2017,27(1):1-7. [22]LICHMAN M.UCI machine learning repository[EB/OL].(2019-09-23) [2019-10-11]. http://archive.ics.uci.edu/ml/index.php. |
[1] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[4] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[5] | 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏. Grassberger熵随机森林在窃电行为检测的应用 Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection 计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032 |
[6] | 王文强, 贾星星, 李朋. 自适应的集成定序算法 Adaptive Ensemble Ordering Algorithm 计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108 |
[7] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[8] | 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157 |
[9] | 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江. 基于眼前节相干光断层扫描成像的核性白内障分类算法 Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image 计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085 |
[10] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[11] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[12] | 陈乐, 高岭, 任杰, 党鑫, 王祎昊, 曹瑞, 郑杰, 王海. 基于自适应码率移动增强现实应用的能效优化研究 Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality 计算机科学, 2022, 49(1): 194-203. https://doi.org/10.11896/jsjkx.201100107 |
[13] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220 |
[14] | 杨小琴, 刘国军, 郭建慧, 马文涛. 基于随机森林的空域-频域联合特征全参考彩色图像质量评价方法 Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest 计算机科学, 2021, 48(8): 99-105. https://doi.org/10.11896/jsjkx.200700106 |
[15] | 郑建华, 李小敏, 刘双印, 李迪. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法 Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling 计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120 |
|