Computer Science ›› 2024, Vol. 51 ›› Issue (8): 242-255.doi: 10.11896/jsjkx.230600164

• Artificial Intelligence • Previous Articles     Next Articles

Evaluation of Hyperparameter Optimization Techniques for Traditional Machine Learning Models

LI Haixia1, SONG Danlei2, KONG Jianing2, SONG Yafei3, CHANG Haiyan1   

  1. 1 North Automatic Control Technology Institute,Taiyuan 030006,China
    2 School of Mathematics and Statistics,Xi’an Jiaotong University,Xi’an 710049,China
    3 Air and Missile Defense College,Air Force Engineering University,Xi’an 710051,China
  • Received:2023-06-20 Revised:2023-11-23 Online:2024-08-15 Published:2024-08-13
  • About author:LI Haixia,born in 1984,bachelor,researcher.Her main research interests include information system design and information fusion technology.
    SONG Yafei,born in 1988,Ph.D,asso-ciate professor,postgraduate supervisor.His main research interests include intelligent reasoning and decision-ma-king.
  • Supported by:
    National Natural Science Foundation of China(61806219,61876189),Young Talent Fostering Program of the Science and Technology Associations of Universities in Shaanxi Province,China(20220106) and Innovation Capacity Support Program of Shaanxi Province, China(2020KJXX-065).

Abstract: Reasonable hyperparameters ensure that machine learning models can adapt to different backgrounds and tasks.In order to avoid the inefficiency caused by manual adjustment of a large number of model hyperparameters and a vast search space,various hyperparameter optimization techniques have been developed and applied in machine learning model training.At first,Paper reviews eight common hyperparameter optimization techniques:grid search,random search,Bayesian optimization,Hyperband,Bayesian optimization and Hyperband(BOHB),genetic algorithms,particle swarm optimization algorithm,and covariance matrix adaptation evolutionary strategy(CMA-ES).The advantages and disadvantages of these methods are analyzed from five aspects:time performance,final results,parallel capability,scalability,robustness and flexibility.Subsequently,these eight methods are applied to four traditional machine learning models:LightGBM,XGBoost,Random Forest,and K-Nearest Neighbors(KNN).Regression,binary classification and multi-classification experiments are performed on four standard datasets:Boston house price dataset,kin8nm power arm dataset,credit card default customer dataset and handwritten digit dataset.Different methods are compared by evaluating their performance using output evaluation metrics.Finally,pros and cons of each method and are summarized,and the application scenarios of different methods are given.The results highlight the importance of selecting appropriate hyperparameter optimization methods to enhance the efficiency and effectiveness of machine learning model training.

Key words: Traditional machine learning, Hyperparameter optimization, Bayesian optimization, Multi-fidelity technology, Meta-heuristic algorithms

CLC Number: 

  • TP181
[1]ATHEY S.The Impact of Machine Learning on Economics[M]//The Economics of Artificial Intelligence:An agenda.University of Chicago Press,2018:507-547.
[2]WEI J,CHU X,SUN X Y,et al.Machine Learning in Materials Science[J].InfoMat,2019,1(3):338-358.
[3]CARLEO G,CIRAC I,CRANMER K,et al.Machine Learning and Physical Sciences[J].Reviews of Modern Physics,2019,91(4):045002.
[4]LIAKOS K G,BUSATA P,MOSHOU D,et al.Machine Lear-ning in Agriculture:A Review[J].Sensors,2018,18(8):2674.
[5]TARCA A L,CAREY V J,CHEN X,et al.Machine Learningand Its Applications to Biology[J].PLoS Computational Biology,2007,3(6):e116.
[6]CORTES C,VAPNIK V.Support-Vector Networks[J].Ma-chine Learning,1995,20:273-297.
[7]BOSER B E,GUYON I M,VAPNIK V N.A Training Algorithm for Optimal Margin Classifiers[C]//Proceedings of the Fifth Annual Workshop on Computational Learning Theory.1992:144-152.
[8]COVER T,HART P.Nearest Neighbor Pattern Classification[J].IEEE Transactions on Information Theory,1967,13(1):21-27.
[9]HOSMER J R D W,LEMESHOW S,STURDIVANT R X.Applied Logistic Regression[M].John Wiley & Sons,2013.
[10]FRIEDMAN J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2001,29:1189-1232.
[11]FREUND Y,SCHAPIRE R E.A Decision-Theoretic Generalization of on-line Learning and an Application to Boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139.
[12]CHEN T,GUESTRIN C.Xgboost:A Scalable Tree BoostingSystem[C]//Proceedings of the 22nd ACM Sigkdd Interna-tional Conference on Knowledge Discovery and Data Mining.2016:785-794.
[13]KE G,MENG Q,FINLEY T,et al.Lightgbm:A Highly Efficient Gradient Boosting Decision Tree[J].Advances in Neural Information Processing Systems,2017,30:3147-3155.
[14]PROKHORENKOVA L,GUSEV G,VOROBEV A,et al.Cat-Boost:Unbiased Boosting with Categorical Features[J].Advances in Neural Information Processing Systems,2018,31:6638-6648.
[15]HUTTER F,KOTTHOFF L,VANSCHOREN J.AutomatedMachine Learning:Methods,Systems,Challenges[M].Springer Nature,2019.
[16]LIASHCHYNSKYI P,LIASHCHYNSKYI P.Grid Search,Random Search,Genetic Algorithm:A Big Comparison for NAS[J].arXiv:1912.06059,2019.
[17]BERGSTRA J,BENGIO Y.Random Search for Hyper-Parameter Optimization[J].Journal of Machine Learning Research,2012,13(1):281-305.
[18]SNOEK J,LAROCHELLE H,ADAMS R P.Practical Bayesian Optimization of Machine Learning Algorithms[J].Advances in Neural Information Processing Systems,2012,4:2951-2959.
[19]BERGSTRA J,BARDENET R,BENGIO Y,et al.Algorithms for Hyper-parameter Optimization[C]//International Confe-rence on Neural Information Processing Systems.Curran Asso-ciates Inc.2011:2546-2554.
[20]HUTTER F,HOOS H H,LEYTON-BROWN K.SequentialModel-based Optimization for General Algorithm Configuration[C]//Learning and Intelligent Optimization:5th International Conference,LION 5,Rome,Italy,January 17-21,2011.Selected Papers 5.Berlin,Heidelberg:Springer,2011:507-523.
[21]JAMIESON K,TALWALKAR A.Non-stochastic Best ArmIdentification and Hyperparameter Optimization[C]//Artificial Intelligence and Statistics.PMLR,2016:240-248.
[22]LI L,JAMIESON K,DESALVO G,et al.Hyperband:A Novel Bandit-based Approach to Hyperparameter Optimization[J].The Journal of Machine Learning Research,2017,18(1):6765-6816.
[23]FALKNER S,KLEIN A,HUTTER F.BOHB:Robust and Efficient Hyperparameter Optimization at Scale[C]//International Conference on Machine Learning.PMLR,2018:1437-1446.
[24]MITCHELL M.An Introduction to Genetic Algorithms[M].MIT press,1998.
[25]KENNEDY J,EBERHART R.Particle Swarm Optimization[C]//Proceedings of ICNN’95 International Conference on Neural Networks.IEEE,1995,4:1942-1948.
[26]HANSEN N,OSTERMEIER A.Completely Derandomized Self-adaptation in Evolution Strategies[J].Evolutionary Computation,2001,9(2):159-195.
[27]WU J,CHEN S P,LIU X Y.Efficient Hyperparameter Optimization through Model-based Reinforcement Learning[J].Neurocomputing,2020,409:381-393.
[28]LIU X,WU J,CHEN S.A Context-based Meta-reinforcementLearning Approach to Efficient Hyperparameter Optimization[J].Neurocomputing,2022,478:89-103.
[29]WU J,LIU X,CHEN S.Hyperparameter Optimization through Context-based Meta-reinforcement Learning with Task-aware Representation[J].Knowledge-Based Systems,2023,260:110160.
[30]YANG L,SHAMI A.On Hyperparameter Optimization of Machine Learning Algorithms:Theory and Practice[J].Neurocomputing,2020,415:295-316.
[31]LUO G.A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyper-parameter Values[J].Network Modeling Analysis in Health Informatics and Bioinformatics,2016,5:1-16.
[32]LORENZO P R,NALEPA J,KAWULOK M,et al.ParticleSwarm Optimization for Hyper-parameter Selection in Deep Neural Networks[C]//Proceedings of the Genetic and Evolutionary Computation Conference.2017:481-488.
[33]WITT C.Worst-case and Average-case Approximations by Simple Randomized Search Heuristics[C]//Annual Symposium on Theoretical Aspects of Computer Science.Berlin,Heidelberg:Springer,2005:44-56.
[34]LIU C,YIN S Q,ZHANG M,et al.An Improved Grid Search Algorithm for Parameters Optimization on SVM[J].Applied Mechanics and Materials,2014,644:2216-2219.
[35]SUN Y,DING S,ZHANG Z,et al.An Improved Grid SearchAlgorithm to Optimize SVR for Prediction[J].Soft Computing,2021,25:5633-5644.
[36]NAYEBI A,MUNTEANU A,POLOCZEK M.A Framework for Bayesian Optimization in Embedded Subspaces[C]//International Conference on Machine Learning.PMLR,2019:4752-4761.
[37]KIRSCHNER J,MUTNY M,HILLER N,et al.Adaptive andSafe Bayesian Optimization in High Dimensions Via One-dimensional Subspaces[C]//International Conference on Machine Learning.PMLR,2019:3429-3438.
[38]LIU Q,WANG D.Stein Variational Gradient Descent:A Gene-ral Purpose Bayesian Inference Algorithm[J].Advances in Neural Information Processing Systems,2016,29:2378-2386.
[39]GONG C,PENG J,LIU Q.Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization[C]//International Conference on Machine Learning.PMLR,2019:2347-2356.
[40]WANG Z,LI C,JEGELKA S,et al.Batched High-Dimensional Bayesian Optimization Via Structural Kernel Learning[C]//International Conference on Machine Learning.PMLR,2017:3656-3664.
[41]ROLLAND P,SCARLETT J,BOGUNOVIC I,et al.High-Dimensional Bayesian Optimization Via Additive Models with Overlapping Groups[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2018:298-307.
[42]CONTAL E,BUFFONI D,ROBICQUET A,et al.ParallelGaussian Process Optimization with Upper Confidence Bound and Pure Exploration[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Berlin,Heidelberg:Springer,2013:225-240.
[43]WANG Z,GEHRING C,KOHLI P,et al.Batched Large-scale Bayesian Optimization in High-dimensional Spaces[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2018:745-754.
[44]SWERSKY K,SNOEK J,ADAMS R P.Multi-task BayesianOptimization[J].Advances in Neural Information Processing Systems,2013,26:2004-2012.
[45]PEARCE M,BRANKE J.Continuous Multi-Task Bayesian Optimisation with Correlation[J].European Journal of Operational Research,2018,270(3):1074-1085.
[46]CHOWDHURY S R,GOPALAN A.No-regret Algorithms forMulti-task Bayesian Optimization[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2021:1873-1881.
[47]GONZALEZ J,DAI Z,HENNIG P,et al.Batch Bayesian Optimization Via Local Penalization[C]//Artificial Intelligence and Statistics.PMLR,2016:648-657.
[48]GARRIDO-MERCHAN E C,HERNANDEZ-LOBATO D.Dealing with Categorical and Integer-valued Variables in Baye-sian Optimization with Gaussian Processes[J].Neurocomputing,2020,380:20-35.
[49]FEURER M,SPRINGENBERG J,HUTTER F.InitializingBayesian Hyperparameter Optimization Via Meta-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2015.
[50]ROTHFUSS J,KOENIG C,RUPENYAN A,et al.Meta-lear-ning Priors for Safe Bayesian Optimization[C]//Conference on Robot Learning.PMLR,2023:237-265.
[51]LAN G,TOMCZAK J M,ROIJERS D M,et al.Time Efficiency in Optimization with a Bayesian-evolutionary Algorithm[J].Swarm and Evolutionary Computation,2022,69:100970.
[52]HENSMAN J,FUSI N,LAWRENCE N D.Gaussian processesfor Big data[C]//Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence.Arlington,Virginia,USA:AUAI Press,2013:282-290.
[53]ELSHAWI R,MAHER M,SAKR S.Automated MachineLearning:State-of-the-art and Open Challenges[J].arXiv:1906.02287,2019.
[54]FERNANDEZ-GODINO M G,PARK C,KIM N H,et al.Review of Multi-fidelity Models[J].arXiv:1609.07196,2016.
[55]LI L,JAMIESON K,ROSTAMIZADEH A,et al.A System for Massively Parallel Hyperparameter Tuning[J].Proceedings of Machine Learning and Systems,2020,2:230-246.
[56]SCHMUCKER R,DONINI M,ZAFAR M B,et al.Multi-objective Asynchronous Successive Halving[J].arXiv:2106.12639,2021.
[57]AWAD N,MALLIK N,HUTTER F.Dehb:Evolutionary Hy-perband for Scalable,Robust and Efficient Hyperparameter Optimization[J].arXiv:2105.09821,2021.
[58]AWAD N,SHARMA A,HUTTER F.MO-DEHB:Evolutio-nary-based Hyperband for Multi-Objective Optimization[J].arXiv:2305.04502,2023.
[59]STORN R,PRICE K.Differential Evolution:A Simple and Efficient Adaptive Scheme for Global Optimization Over Continuous Spaces[J].Journal of Global Optimization,1997,11:341-359.
[60]JIA H,SUN K,ZHANG W,et al.An Enhanced Chimp Optimization Algorithm for Continuous Optimization Domains[J].Complexand Intelligent Systems,2022,8(1):65-82.
[61]ZHAO W,WANG L,MIRJALILI S.Artificial HummingbirdAlgorithm:A New Bio-inspired Optimizer with Its Engineering Applications[J].Computer Methods in Applied Mechanics and Engineering,2022,388:114194.
[62]DENG W,SHANG S,CAI X,et al.An Improved DifferentialEvolution Algorithm and Its Application in Optimization Pro-blem[J].Soft Computing,2021,25:5277-5298.
[63]NOMURA M,WATANABE S,AKIMOTO Y,et al.WarmStarting CMA-ES for Hyperparameter Optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:9188-9196.
[64]LESSMANN S,STAHLBOCK R,CRONE S F.Optimizing Hyperparameters of Support Vector Machines by Genetic Algorithms[C]//ICAI.2005.
[65]LOBO F G,GOLDBERG D E,PELIKAN M.Time Complexity of Genetic Algorithms on Exponentially Scaled Problems[C]//Proceedings of the 2nd Annual Conference on Genetic and Evolutionary Computation.2000:151-158.
[66]YAN X H,HE F Z,CHEN Y L.A Novel Hardware/SoftwarePartitioning Method Based on Position Disturbed Particle Swarm Optimization with Invasive Weed Optimization[J].Journal of Computer Science and Technology,2017,32:340-355.
[67]AHANDANI M A,ALAVI-RAD H.Opposition-based Learning in the Shuffled Differential Evolution Algorithm[J].Soft Computing,2012,16(8):1303-1337.
[68]QIAO S.Study and Application of CMA-ES Algorithm Based on Cloud Mode[D].Taiyuan:Taiyuan University of Technology,2015.
[69]BEYER H G,SENDHOFF B.Covariance Matrix AdaptationRevisited-the CMSA Evolution Strategy[C]//International Conference on Parallel Problem Solving from Nature.Berlin,Heidelberg:Springer,2008:123-132.
[70]PEDREGOSA F,VARIQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine Learning in Python[J].The Journal of Machine Learning Research,2011,12:2825-2830.
[71]SCHWAIGHOFER A,TRESP V.Transductive and InductiveMethods for Approximate Gaussian Process Regression[J].Advances in Neural Information Processing Systems,2002,15(3):953-960.
[72]YEH I C,LIEN C.The Comparisons of Data Mining Techniques forPredictive Accuracy of Probability of Default of Credit Card Clients[J].Expert Systems with Applications,2009,36(2):2473-2480.
[1] ZAHO Peng, ZHOU Jiantao, ZHAO Daming. Cloud Computing Load Prediction Method Based on Hybrid Model of CEEMDAN-ConvLSTM [J]. Computer Science, 2023, 50(6A): 220300272-9.
[2] HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
[3] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!