Computer Science ›› 2022, Vol. 49 ›› Issue (1): 108-114.doi: 10.11896/jsjkx.201200189

• Database & Big Data & Data Science • Previous Articles     Next Articles

Multivariate Regression Forest for Categorical Attribute Data

LIU Zhen-yu1, SONG Xiao-ying2   

  1. 1 School of Computer Science and Engineering,Northeastern University,Shenyang 110819,China
    2 School of Computer,Dalian Neusoft University of Information,Dalian,Liaoning 116023,China
  • Received:2020-12-22 Revised:2021-03-14 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Zhen-yu,born in 1978,postgra-duate,professor.His main research in-terests include machine learning and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61772101).

Abstract: As categorical attributes cannot be utilized directly in some regression models like the linear regression,SVR and most multivariate regression trees,a multivariate split method dealing with multiple types of data is prompted in this paper.We define the centers of the sample sets on the categorical attributes and the distances from the samples to the centers in order that thecate-gorical attributes can also participate in the clustering process like the numerical attributes.Then a reasonable ensemble scheme is selected for the decision trees generated by the method to get the ensemble called cluster regression forest(CRF).Finally,we use CRF and other 9 regression models to compare regression mean absolute error (MAE) and root mean square error (RMSE) on 12 UCI public data sets.The experimental results show that CRF has the best performance among the 10 regression models.

Key words: Decision trees, Ensemble learning, Gradient boosting, Multi-variable regression trees, Random forest

CLC Number: 

  • TP393
[1]PAN J H,WANG Y H,WU W.Physical quantity regression method based on optimized BP neural network[J].Computer Science,2018,45(12):170-176.
[2]CHEN W,LI H,HOU E K,et al.GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models[J].Science of the Total Environment,2018,634(9):853-867.
[3]WANG N,LI Z,CHENG X Y.Reversible visible watermarkingalgorithm for medical image based on support vector regression[J].Computer Science,2019,34(10):2243-2248.
[4]LOH W Y,SHIH Y S.Split selection methods for classification trees[J].Statistica Sinica,1999,7(4):815-840.
[5]QUINLAN J R.C4.5:programs for machine learning[J].Machine Learning,1994,16(3):235-240.
[6]BUNTINE W L.Learning classification trees[J].Statistics & Computing,1992,2(2):63-73.
[7]BUCY R S,DIESPOSTI R S.Decision tree design by simulated annealing[J].ESAIM Mathematical Modelling and Numerical Analysis,1993,27(5):515-534.
[8]MURTHY S K,KASIF S,SALZBERG S.A System for Induction of Oblique Decision Trees[J].Journal of Artificial Intelligence Research,1996,2(1):1-32.
[9]LÓPEZ-CHAU A,CERVANTES J,LÓPEZ-GARCÍA L,et al.Fisher's decision tree[J].Expert Systems with Applications,2013,40(16):6283-6291.
[10]HONG K S,OOI P L,YE C K,et al.Multivariate alternating decision trees[J].Pattern Recognition,2016,50(C):195-209.
[11]WICKRAMARACHCHI D C,ROBERTSON B L,REALE M,et al.HHCART:An Oblique Decision Tree[J].Computational Statistics & Data Analysis,2015,96:12-23.
[12]BJOERN H M,KELM B M,DANIEL N S,et al.On Oblique Random Forests[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer,2011:453-469.
[13]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[14]BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
[15]HO T K.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
[16]FREUND Y,SCHAPIRE R.A decision-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computing System,1997,55:119-139.
[17]FRIEDMAN J H.Greedy function approximation:a gradientboosting machine[J].The Annals of Statistics,2001,29(5):1189-1232.
[18]WANG X H,ZHANG L,LI J Q,et al.Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest[J].Computer Science,2020,47(S2):454-458.
[19]QU W L,CHEN X Y,LI Y Y,et al.A regression prediction model of depth gradient boosting[J].Computer Applications and Software,2020,37(9):194-201.
[20]LIU Z Y,SONG X Y.An applicable multivariate decision tree algorithm for categorical attribute data[J].Journal of Northeastern University (Natural Science),2020,41(11):1521-1527.
[21]GENRIKHOV I E,DJUKOVA E V,ZHURAVLEV V I.On full regression decision trees[J].Pattern Recognition and Image Analysis,2017,27(1):1-7.
[22]LICHMAN M.UCI machine learning repository[EB/OL].(2019-09-23) [2019-10-11]. http://archive.ics.uci.edu/ml/index.php.
[1] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[4] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[5] QUE Hua-kun, FENG Xiao-feng, LIU Pan-long, GUO Wen-chong, LI Jian, ZENG Wei-liang, FAN Jing-min. Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection [J]. Computer Science, 2022, 49(6A): 790-794.
[6] WANG Wen-qiang, JIA Xing-xing, LI Peng. Adaptive Ensemble Ordering Algorithm [J]. Computer Science, 2022, 49(6A): 242-246.
[7] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[8] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[9] ZHANG Xiao-qing, FANG Jian-sheng, XIAO Zun-jie, CHEN Bang, Risa HIGASHITA, CHEN Wan, YUAN Jin, LIU Jiang. Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image [J]. Computer Science, 2022, 49(3): 204-210.
[10] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[11] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[12] CHEN Le, GAO Ling, REN Jie, DANG Xin, WANG Yi-hao, CAO Rui, ZHENG Jie, WANG Hai. Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality [J]. Computer Science, 2022, 49(1): 194-203.
[13] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[14] YANG Xiao-qin, LIU Guo-jun, GUO Jian-hui, MA Wen-tao. Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest [J]. Computer Science, 2021, 48(8): 99-105.
[15] ZHENG Jian-hua, LI Xiao-min, LIU Shuang-yin, LI Di. Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling [J]. Computer Science, 2021, 48(7): 145-154.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!