Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 454-458.doi: 10.11896/jsjkx.200600002

• Big Data & Data Science • Previous Articles     Next Articles

Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest

WANG Xiao-hui1, ZHANG Liang1, LI Jun-qing1,2, SUN Yu-cui1, TIAN Jie1, HAN Rui-yi1   

  1. 1 School of Information Science and Engineering,Shandong Agricultural University,Taian,Shangdong 271018,China
    2 Agricultural Big Data Research Center,Shandong Agricultural University,Taian,Shangdong 271018,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:WANG Xiao-hui,born in 1998,undergraduate.His main research interests include machine learning and so on.
    LI Jun-qing,born in 1984,postgra-duate,associate professor.His main research interests include artificial intelligence and bigdata.
  • Supported by:
    This work was supported by the Joint Flood Control Operation of Reservoir Groups in River Basin Driven by Digdata (2019GSF111043).

Abstract: Regression prediction is one of the important research directions in machine learning and has a broad application field.In order to improve the accuracy of regression prediction,an improved XGBoost method (GA_XGBoost_RF) based on genetic algorithm and random forest is proposed.Firstly,with the good search ability and flexibility of Genetic Algorithm (GA),the XGBoost Algorithm and Random Forest Algorithm (RF) parameters are optimized with the average score of cross-validation as the objective function value,and the better parameter set is selected to establish GA_XGBoost and GA_RF models,respectively.Then the variable weight combination of GA_XGBoost and GA_RF is performed.The mean square error between the predicted value and the real value of the training set is used as the objective function,and the weight of the model is determined by genetic algorithm.On UCI data sets and the results show that the XGBoost and Random Forest,GA_XGBoost,GA_RF algorithm compared to GA_XGBoost_RF method in most of the data set is the fit of the mean square error (mse) and absolute error and are superior to single model,the proposed method on fitting on different data sets improves by about 0.01%~2.1%,is a kind of effective regression forecast method.

Key words: Combination prediction, Genetic algorithm, Random forest, Regression prediction, XGBoost

CLC Number: 

  • TP181
[1] YUAN B,LIU S,JIANG L X,et al.Housing rent prediction model based on random forest regression algorithm[J].ComputerProgramming Skills & Maintenance,2020(1):23-25.
[2] ZHANG C F,WANG S,WU Y D,et al.Diabetes Risk Prediction Based on GA_Xgboost Model[J].Computer Engineering,2020(3):315-320.
[3] WANG Y,GUO Y K.Application of Improved XGBoost Model in Stock Forecasting[J].Computer Engineering and Applications,2019(20):202-207.
[4] CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794.
[5] CHEN H,WANG R T,XIAO C L,et al.Research on Intrusion Detection Model Based on DBN-XGBDT[J/OL].Computer Engineering and Application.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=JSGG20200107004&v=UVJbamaWiqN%25mmd2F9O2vyqQDdcTYYvCJ1fZFijf%25mmd2FWeamhJm61AxhCjVV6r5HZkDoH4xo.
[6] CHEN Z Y,LIU J B,LI C,et al.Ultra Short-term Power Load Forecasting Based on Combined LSTM and XGBoost Model[J].Power System Technology,2020(2):1-8.
[7] LI H,ZHU Y.Improving Xgboost Based on Gradient Distribution Regulation Strategy[J].Journal of Computer Applications,2020(1):1-6.
[8] YUE P,HOU L Y,YANG D L,et al.XLC-Stacking method for disease diagnosis based on XGBoost feature selection[J].Computer Engineering and Applications,2020(17):136-141.
[9] WANG Q S,XIE X S,SHE H.Short-term Traffic Flow Prediction Based on CNN-XGBoost Hybrid Model[J].Measurement &Control Technology,2019(4):37-40,67.
[10] LI B,HAN R,HE Y G,et al.Application of Improved Random Forest Algorithm in Fault Diagnosis of Motor Bearings[J].Proceedings of the CSEE,2020(4):1310-1319,1422.
[11] DING D D,SUI L,CHEN S.Machine learning-dynamically coupled vehicle following models[J].Journal of Transportation Systems Engineering and Information Technology,2017(6):33-39.
[12] YUE Y C,HUANG Y Z.A Method for Error Reciprocal Variable Weight Combined Forecasting[J].Journal of University of Electronic Science and Technology of China,2007(S1):349-351.
[13] ZHOU Y S,CUI J Y,ZHOU L Y,et al.Study on the Evaluation of Personal Credit Risk Based on the Improved Random Forest Model[J].Credit Reference,2020(1):25-30.
[14] SONG K,YAN F,DING T,et al.A steel property optimization model based on the XGBoost algorithm and improved PSO[J].Computational Materials Science,2020,174(C).
[15] SHI X P,WONG Y D,LI Z F,et al.A feature learning approach based on XGBoost for driving assessment and risk prediction[J].Accident Analysis and Prevention,2019,129(129).
[16] LIU Z X,WANG X.Flight Delay Prediction Based on Random Forest Regression[J].Modern Computer,2019(15):20-24.
[17] XIE K,RONG Y T,HU F P,et al.Random Forest based on Data Ensembling[J/OL].Computer Engineering.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=JSJC20191206002&v=0pB3H536puZ4tfXwxmctFHXG08jgxGF4%25mmd2BPhds%25mmd2BTvGl4wpi4FuIthY5Id9ogKmt1A.
[18] SHI J Q,ZHANG J H.Load Forecasting Based on Multi-model by Stacking Ensemble Learning[J].Proceedings of the CSEE,2019(14):4032-4042.
[19] LIU X Z Y,GAN L,XU J H,et al.Automatic Optimization of Parallel Parameters for Sunway TaihuLight SupercomputerApplication Program[J/OL].Journal of Frontiers of Computer Science and Technology.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=KXTS20200117000&v=jyKKAwjXo98Ft%25mmd2FhCSfCvhikiL1CADBYEajg0LyXpY1lp8Jk8Psm5yiUOe5IvYF23.
[20] LIU J,CHEN H H,ZHANG F F,et al.Multi-parameter identification of river water quality model based on animproved genetic algorithm[J].Journal of Northeast Agricultural University,2020(1):73-82.
[21] XING Z W,HAN D H,LUO Q.Estimationof Flight Support Time Based on improved GA neural network[J].Computer Engineering and Design,2020(1):107-114.
[22] LIN L C.Improved k-means algorithm based on genetic algorithm[J].Electronic Technology & Software Engineering,2020(1):111-112.
[23] NIU W N,LI T,ZHANG X S,et al.Using XGBoost to Discover Infected Hosts Based on HTTP Traffic[J/OL].https://schlr.cnki.net/Detail/index/WWMERGEJ02/SJHDD74B5ADB931A22462D32E1F64048A4BC.
[24] ZHONG Y,SHAO Y M,HU W W,et al.Short-term Traffic Flow Prediction Model Based on XGBoost[J].Science Technology and Engineering,2019(30):337-342.
[25] XIE Y,XIANG Y,JI M Z,et al.An application and analysis of forecast housing rentalbased on xgboost and lightgbm algorithms [J].Computer Applications and Software,2019(9):151-155,191.
[26] WANG M H,LIANG X C.Personal Credit Evaluation Based on CPSO-XGBoost [J].Computer Engineering and Design,2019(7):1891-1895.
[27] HE B,MA J,GAO H Y.A research on forecasting urban daily water-supply based on multi-granularityfeature and XGBoost integrated model[J].Journal of Yangtze River Scientific Research Institute,2020(5):43-49.
[28] LUO X,QIAN Q,FU Y F.Improved Genetic Algorithm for Solving Flexible Job Shop Scheduling Problem[J].Procedia Computer Science,2020,166(166).
[29] MIRALLES-PECHUÁN L,PONCE H,MARTÍNEZ-VILLA-SEÑOR L.A 2020 perspective on “A novel methodology for optimizing display advertising campaigns using genetic algorithms”[J].Electronic Commerce Research and Applications,2020,40(40).
[30] BAI B G,ZHU H L,FAN Q X.Research on Early Warning of Dairy Product Quality and Safety Risk Based on GeneticOptimization BP Neural Network[J/OL].Food Science.https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=SPKX2020032000O&v=17WwU59A5kA%25mmd2FsWQldVPlWn%25mmd2FoewnrOzprziVfNRH9%25mmd2FVKtFqM2kjlkDOesG4Rrkydj.
[31] LI Y F,LI K W,PAN Y T,et al.A Dynamic Fusion Algorithm of Path Planning Based on Genetic andAnt Colony for Ground Autonomous Combat Robot[J].Journal of Gun Launch & Control,2019(4):42-46,50.
[32] LIU J W,CHANG Z G,DENG H B,et al.Energy-saving operation model for urban rail train based onimproved genetic algorithm[J].Journal of Railway Science and Engineering,2019(11):2881-2888.
[33] CHEN Z X,DONG R X,HAO Y N.Modeling and Optimization of Picking Location Allocation in AutomaticPicking System Based on Improved Genetic Algorithm[J].Industrial Engineering Journal,2019(6):40-44,56.
[34] SHEN W S,ZHAO H C,SUI Y W.Sales Forecasting Model Based on BP Neural Network Optimized by Improved Genetic Algorithms[J].Computer Systems & Applications,2019,(12):200-204.
[35] MO T P,JIN H,SHI K,et al.The Fault Diagnosis of Analog Circuit Based on Wavelet Packet and SGD-XGBoost [J].Microelectronics & Computer,2019(4):38-42.
[1] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] WANG Wen-qiang, JIA Xing-xing, LI Peng. Adaptive Ensemble Ordering Algorithm [J]. Computer Science, 2022, 49(6A): 242-246.
[4] SUN Fu-quan, LIANG Ying. Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm [J]. Computer Science, 2022, 49(6A): 309-313.
[5] QUE Hua-kun, FENG Xiao-feng, LIU Pan-long, GUO Wen-chong, LI Jian, ZENG Wei-liang, FAN Jing-min. Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection [J]. Computer Science, 2022, 49(6A): 790-794.
[6] YANG Hao-xiong, GAO Jing, SHAO En-lu. Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery [J]. Computer Science, 2022, 49(6A): 191-198.
[7] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[8] ZHANG Xiao-qing, FANG Jian-sheng, XIAO Zun-jie, CHEN Bang, Risa HIGASHITA, CHEN Wan, YUAN Jin, LIU Jiang. Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image [J]. Computer Science, 2022, 49(3): 204-210.
[9] SHEN Biao, SHEN Li-wei, LI Yi. Dynamic Task Scheduling Method for Space Crowdsourcing [J]. Computer Science, 2022, 49(2): 231-240.
[10] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[11] YANG Xiao-qin, LIU Guo-jun, GUO Jian-hui, MA Wen-tao. Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest [J]. Computer Science, 2021, 48(8): 99-105.
[12] ZHENG Jian-hua, LI Xiao-min, LIU Shuang-yin, LI Di. Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling [J]. Computer Science, 2021, 48(7): 145-154.
[13] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[14] WU Shan-jie, WANG Xin. Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks [J]. Computer Science, 2021, 48(7): 308-315.
[15] LI Na-na, WANG Yong, ZHOU Lin, ZOU Chun-ming, TIAN Ying-jie, GUO Nai-wang. DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance [J]. Computer Science, 2021, 48(6A): 464-467.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!