计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 454-458.doi: 10.11896/jsjkx.200600002
王晓晖1, 张亮1, 李俊清1,2, 孙玉翠1, 田捷1, 韩睿毅1
WANG Xiao-hui1, ZHANG Liang1, LI Jun-qing1,2, SUN Yu-cui1, TIAN Jie1, HAN Rui-yi1
摘要: 回归预测是机器学习中重要的研究方向之一,有着广阔的应用领域。为了进一步提升回归预测的精度,提出了基于遗传算法与随机森林的XGBoost改进方法(GA_XGBoost_RF)。首先利用遗传算法(Genetic Algorithm,GA)良好的搜索能力和灵活性,以交叉验证平均得分为目标函数值,对XGBoost算法和随机森林算法(Random Forest,RF)的参数进行调优,选出较好的参数集,分别建立GA_XGBoost和GA_RF模型。然后对GA_XGBoost和GA_RF进行变权组合,利用训练集的预测值与真实值的均方误差为目标函数,使用遗传算法确定模型的权重。在UCI数据集上进行了实验,结果表明,与XGBoost,Random Forest,GA_XGBoost,GA_RF算法相比,在大部分数据集上GA_XGBoost_RF方法的均方误差、绝对误差和拟合度均优于单一模型,其中在拟合度方面所提方法在不同数据集上提高了约0.01%~2.1%,是一种有效的回归预测方法。
中图分类号:
[1] YUAN B,LIU S,JIANG L X,et al.Housing rent prediction model based on random forest regression algorithm[J].ComputerProgramming Skills & Maintenance,2020(1):23-25. [2] ZHANG C F,WANG S,WU Y D,et al.Diabetes Risk Prediction Based on GA_Xgboost Model[J].Computer Engineering,2020(3):315-320. [3] WANG Y,GUO Y K.Application of Improved XGBoost Model in Stock Forecasting[J].Computer Engineering and Applications,2019(20):202-207. [4] CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794. [5] CHEN H,WANG R T,XIAO C L,et al.Research on Intrusion Detection Model Based on DBN-XGBDT[J/OL].Computer Engineering and Application.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=JSGG20200107004&v=UVJbamaWiqN%25mmd2F9O2vyqQDdcTYYvCJ1fZFijf%25mmd2FWeamhJm61AxhCjVV6r5HZkDoH4xo. [6] CHEN Z Y,LIU J B,LI C,et al.Ultra Short-term Power Load Forecasting Based on Combined LSTM and XGBoost Model[J].Power System Technology,2020(2):1-8. [7] LI H,ZHU Y.Improving Xgboost Based on Gradient Distribution Regulation Strategy[J].Journal of Computer Applications,2020(1):1-6. [8] YUE P,HOU L Y,YANG D L,et al.XLC-Stacking method for disease diagnosis based on XGBoost feature selection[J].Computer Engineering and Applications,2020(17):136-141. [9] WANG Q S,XIE X S,SHE H.Short-term Traffic Flow Prediction Based on CNN-XGBoost Hybrid Model[J].Measurement &Control Technology,2019(4):37-40,67. [10] LI B,HAN R,HE Y G,et al.Application of Improved Random Forest Algorithm in Fault Diagnosis of Motor Bearings[J].Proceedings of the CSEE,2020(4):1310-1319,1422. [11] DING D D,SUI L,CHEN S.Machine learning-dynamically coupled vehicle following models[J].Journal of Transportation Systems Engineering and Information Technology,2017(6):33-39. [12] YUE Y C,HUANG Y Z.A Method for Error Reciprocal Variable Weight Combined Forecasting[J].Journal of University of Electronic Science and Technology of China,2007(S1):349-351. [13] ZHOU Y S,CUI J Y,ZHOU L Y,et al.Study on the Evaluation of Personal Credit Risk Based on the Improved Random Forest Model[J].Credit Reference,2020(1):25-30. [14] SONG K,YAN F,DING T,et al.A steel property optimization model based on the XGBoost algorithm and improved PSO[J].Computational Materials Science,2020,174(C). [15] SHI X P,WONG Y D,LI Z F,et al.A feature learning approach based on XGBoost for driving assessment and risk prediction[J].Accident Analysis and Prevention,2019,129(129). [16] LIU Z X,WANG X.Flight Delay Prediction Based on Random Forest Regression[J].Modern Computer,2019(15):20-24. [17] XIE K,RONG Y T,HU F P,et al.Random Forest based on Data Ensembling[J/OL].Computer Engineering.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=JSJC20191206002&v=0pB3H536puZ4tfXwxmctFHXG08jgxGF4%25mmd2BPhds%25mmd2BTvGl4wpi4FuIthY5Id9ogKmt1A. [18] SHI J Q,ZHANG J H.Load Forecasting Based on Multi-model by Stacking Ensemble Learning[J].Proceedings of the CSEE,2019(14):4032-4042. [19] LIU X Z Y,GAN L,XU J H,et al.Automatic Optimization of Parallel Parameters for Sunway TaihuLight SupercomputerApplication Program[J/OL].Journal of Frontiers of Computer Science and Technology.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=KXTS20200117000&v=jyKKAwjXo98Ft%25mmd2FhCSfCvhikiL1CADBYEajg0LyXpY1lp8Jk8Psm5yiUOe5IvYF23. [20] LIU J,CHEN H H,ZHANG F F,et al.Multi-parameter identification of river water quality model based on animproved genetic algorithm[J].Journal of Northeast Agricultural University,2020(1):73-82. [21] XING Z W,HAN D H,LUO Q.Estimationof Flight Support Time Based on improved GA neural network[J].Computer Engineering and Design,2020(1):107-114. [22] LIN L C.Improved k-means algorithm based on genetic algorithm[J].Electronic Technology & Software Engineering,2020(1):111-112. [23] NIU W N,LI T,ZHANG X S,et al.Using XGBoost to Discover Infected Hosts Based on HTTP Traffic[J/OL].https://schlr.cnki.net/Detail/index/WWMERGEJ02/SJHDD74B5ADB931A22462D32E1F64048A4BC. [24] ZHONG Y,SHAO Y M,HU W W,et al.Short-term Traffic Flow Prediction Model Based on XGBoost[J].Science Technology and Engineering,2019(30):337-342. [25] XIE Y,XIANG Y,JI M Z,et al.An application and analysis of forecast housing rentalbased on xgboost and lightgbm algorithms [J].Computer Applications and Software,2019(9):151-155,191. [26] WANG M H,LIANG X C.Personal Credit Evaluation Based on CPSO-XGBoost [J].Computer Engineering and Design,2019(7):1891-1895. [27] HE B,MA J,GAO H Y.A research on forecasting urban daily water-supply based on multi-granularityfeature and XGBoost integrated model[J].Journal of Yangtze River Scientific Research Institute,2020(5):43-49. [28] LUO X,QIAN Q,FU Y F.Improved Genetic Algorithm for Solving Flexible Job Shop Scheduling Problem[J].Procedia Computer Science,2020,166(166). [29] MIRALLES-PECHUÁN L,PONCE H,MARTÍNEZ-VILLA-SEÑOR L.A 2020 perspective on “A novel methodology for optimizing display advertising campaigns using genetic algorithms”[J].Electronic Commerce Research and Applications,2020,40(40). [30] BAI B G,ZHU H L,FAN Q X.Research on Early Warning of Dairy Product Quality and Safety Risk Based on GeneticOptimization BP Neural Network[J/OL].Food Science.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=SPKX2020032000O&v=17WwU59A5kA%25mmd2FsWQldVPlWn%25mmd2FoewnrOzprziVfNRH9%25mmd2FVKtFqM2kjlkDOesG4Rrkydj. [31] LI Y F,LI K W,PAN Y T,et al.A Dynamic Fusion Algorithm of Path Planning Based on Genetic andAnt Colony for Ground Autonomous Combat Robot[J].Journal of Gun Launch & Control,2019(4):42-46,50. [32] LIU J W,CHANG Z G,DENG H B,et al.Energy-saving operation model for urban rail train based onimproved genetic algorithm[J].Journal of Railway Science and Engineering,2019(11):2881-2888. [33] CHEN Z X,DONG R X,HAO Y N.Modeling and Optimization of Picking Location Allocation in AutomaticPicking System Based on Improved Genetic Algorithm[J].Industrial Engineering Journal,2019(6):40-44,56. [34] SHEN W S,ZHAO H C,SUI Y W.Sales Forecasting Model Based on BP Neural Network Optimized by Improved Genetic Algorithms[J].Computer Systems & Applications,2019,(12):200-204. [35] MO T P,JIN H,SHI K,et al.The Fault Diagnosis of Analog Circuit Based on Wavelet Packet and SGD-XGBoost [J].Microelectronics & Computer,2019(4):38-42. |
[1] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 杨浩雄, 高晶, 邵恩露. 考虑一单多品的外卖订单配送时间的带时间窗的车辆路径问题 Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery 计算机科学, 2022, 49(6A): 191-198. https://doi.org/10.11896/jsjkx.210400005 |
[4] | 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏. Grassberger熵随机森林在窃电行为检测的应用 Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection 计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032 |
[5] | 王文强, 贾星星, 李朋. 自适应的集成定序算法 Adaptive Ensemble Ordering Algorithm 计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108 |
[6] | 孙福权, 梁莹. 基于XGBoost算法的水稻基因组6mA位点识别研究 Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm 计算机科学, 2022, 49(6A): 309-313. https://doi.org/10.11896/jsjkx.210700262 |
[7] | 李京泰, 王晓丹. 基于代价敏感激活函数XGBoost的不平衡数据分类方法 XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function 计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064 |
[8] | 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江. 基于眼前节相干光断层扫描成像的核性白内障分类算法 Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image 计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085 |
[9] | 沈彪, 沈立炜, 李弋. 空间众包任务的路径动态调度方法 Dynamic Task Scheduling Method for Space Crowdsourcing 计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249 |
[10] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[11] | 杨小琴, 刘国军, 郭建慧, 马文涛. 基于随机森林的空域-频域联合特征全参考彩色图像质量评价方法 Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest 计算机科学, 2021, 48(8): 99-105. https://doi.org/10.11896/jsjkx.200700106 |
[12] | 吴善杰, 王新. 基于AGA-DBSCAN优化的RBF神经网络构造煤厚度预测方法 Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks 计算机科学, 2021, 48(7): 308-315. https://doi.org/10.11896/jsjkx.200800110 |
[13] | 郑建华, 李小敏, 刘双印, 李迪. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法 Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling 计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120 |
[14] | 陈静杰, 王琨. 不平衡油耗数据的区间预测方法 Interval Prediction Method for Imbalanced Fuel Consumption Data 计算机科学, 2021, 48(7): 178-183. https://doi.org/10.11896/jsjkx.200500145 |
[15] | 曹扬晨, 朱国胜, 祁小云, 邹洁. 基于随机森林的入侵检测分类研究 Research on Intrusion Detection Classification Based on Random Forest 计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161 |
|