面向超参数估计的贝叶斯优化方法综述

doi:10.11896/jsjkx.210300208

摘要/Abstract

摘要： 对绝大部分机器学习模型而言,超参数选择对模型的最终效果起到了至关重要的作用,所以超参数的选择与估计是机器学习理论与实践中的重要问题。从超参数空间中的点到模型泛化性能的映射可以看作一个具有高评估代价的复杂黑箱函数,一般的最优化方法难以适用。贝叶斯优化是一种非常有效的全局优化算法,适合求解具有解析式不明确、非凸、评估成本高等特点的优化问题,只需较少的目标函数评估就可以获得理想解。总结了贝叶斯优化在超参数估计问题上的基本理论和方法,综述了近年来该方向的研究热点和最新进展,包括代理模型、采集函数、算法实施等方面的研究,总结了现有的研究中尚待解决的问题,期望帮助初学者快速了解贝叶斯优化算法并理解典型的算法思想,为其之后的研究起到一定的指导作用。

关键词: 贝叶斯优化, 超参数, 概率代理模型, 黑箱优化, 机器学习

Abstract: For most machine learning models,hyper-parameter selection plays an important role in obtaining high quality models.In the current practice,most of the hyper-parameters are given manually.So the selection or estimation of hyper-parameters is an key issue in machine learning.The mapping from hyper-parameter set to the modeĹs generalization can be regarded as a complex black box function.The general optimization method is difficult to apply.Bayesian optimization is a very effective global optimization algorithm,which is suitable for solving optimization problems in which their objective functions could not be expressed,or the functions are non-convex,computational expensive.The ideal solution can be obtained with a few function evaluations.This paper summarizes the basics of the Bayesian optimization based on hyper-parameter estimation methods,and summarizes the research hot spots and the latest developments in the recent years,including the researches in agent model,acquisition function,algorithm implementation and so on.And the problems to be solved in existing research are summarized.It is expected to help beginners quickly understand Bayesian optimization algorithms,understand typical algorithm ideas,and play a guiding role in future researches.

Key words: Bayesian optimization, Black box optimization, Hyper-parameters, Machine learning, Probabilistic surrogate model

中图分类号:

TP181

李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述[J]. 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208

LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning[J]. Computer Science, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208

参考文献

[1] SNOEK J,LAROCHELLE H,Adams R P.Practical bayesian optimization of machine learning algorithms[J].arXiv:1206.2944,2012.
[2] BROCHU E,CORA V M,DE FREITAS N.A tutorial onBayesian optimization of expensive cost functions,with application to active user modeling and hierarchical reinforcement learning[J].arXiv:1012.2599,2010.
[3] LETHAM B,KARRER B,OTTONI G,et al.ConstrainedBayesian optimization with noisy experiments[J].Bayesian Analysis,2019,14(2):495-519.
[4] BERGSTRA J,BARDENET R,BENGIO Y,et al.Algorithms for hyper-parameter optimization[C]//25th Annual Conference on Neural Information Processing Systems(NIPS 2011).Neural Information Processing Systems Foundation,2011.
[5] BAO Y,LIU Z.A fast grid search method in support vector regression forecasting time series[C]//International Conference on Intelligent Data Engineering and Automated Learning.Berlin:Springer,2006:504-511.
[6] BERGSTRA J,BENGIO Y.Random search for hyper-parameter optimization[J].Journal of Machine Learning Research,2012,13(1):281-305.
[7] PELIKAN M,GOLDBERG D E,CANTÚ-PAZ E.BOA:TheBayesian optimization algorithm[C]//Proceedings of the Gene-tic and Evolutionary Computation Conference(GECCO-99).1999:525-532.
[8] FRAZIER P I.A tutorial on Bayesian optimization[J].arXiv:1807.02811,2018.
[9] SHAHRIARI B,SWERSKY K,WANG Z,et al.Taking the human out of the loop:A review of Bayesian optimization[C]//Proceedings of the IEEE.2015:148-175.
[10] MAHENDRAN N,WANG Z,HAMZE F,et al.Adaptive MCMC with Bayesian optimization[C]//Artificial Intelligence and Statistics.PMLR,2012:751-760.
[11] JONES D R,SCHONLAU M,WELCH W J.Efficient global optimization of expensive black-box functions[J].Journal of GlobalOptimization,1998,13(4):455-492.
[12] JIANG M.Research and Application of Bayesian Optimization algorithm[D].Shanghai:Shanghai University,2012.
[13] RASMUSSEN C E.Gaussian processes in machine learning[C]//Summer School on Machine Learning.Berlin:Springer,2003:63-71.
[14] SRINIVAS N,KRAUSE A,KAKADE S M,et al.Gaussianprocess optimization in the bandit setting:No regret and experimental design[J].arXiv:0912.3995,2009.
[15] THORNTON C,HUTTER F,HOOS H H,et al.Auto-WEKA:Combined selection and hyperparameter optimization of classification algorithms[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2013:847-855.
[16] GARRIDO-MERCHÁN E C,HERNÁNDEZ-LOBATO D.Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes[J].Neurocomputing,2020,380:20-35.
[17] TOSCANO-PALMERIN S,FRAZIER P I.Bayesian optimiza-tion with expensive integrands[J].arXiv:1803.08661,2018.
[18] ASTUDILLO R,FRAZIER P.Bayesian optimization of compo-site functions[C]//International Conference on Machine Lear-ning.PMLR,2019:354-363.
[19] KANDASAMY K,SCHNEIDER J,PÓCZOS B.High dimen-sional Bayesian optimisation and bandits via additive models[C]//International Conference on Machine Learning.PMLR,2015:295-304.
[20] LI C L,KANDASAMY K,PÓCZOS B,et al.High dimensional Bayesian optimization via restricted projection pursuit models[C]//Artificial Intelligence and Statistics.PMLR,2016:884-892.
[21] ROLLAND P,SCARLETT J,BOGUNOVIC I,et al.High-di-mensional Bayesian optimization via additive models with overlapping groups[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2018:298-307.
[22] WANG Z,LI C,JEGELKA S,et al.Batched high-dimensional Bayesian optimization via structural kernel learning[C]//International Conference on Machine Learning.PMLR,2017:3656-3664.
[23] WILLIAMS C,BONILLA E V,CHAI K M.Multi-task Gaussian process prediction[C]//Advances in Neural Information Processing Systems.2007:153-160.
[24] SWERSKY K,SNOEK J,ADAMS R P.Multi-task bayesian optimization[C]//Advances in Neural Information Processing Systems.2013.
[25] DJOLONGA J,KRAUSE A,CEVHER V.High-dimensionalgaussian process bandits[C]//Neural Information Processing Systems.2013.
[26] NAYEBI A,MUNTEANU A,POLOCZEK M.A framework for Bayesian optimization in embedded subspaces[C]//International Conference on Machine Learning.PMLR,2019:4752-4761.
[27] KIRSCHNER J,MUTNY M,HILLER N,et al.Adaptive andsafe Bayesian optimization in high dimensions via one-dimensional subspaces[C]//International Conference on Machine Learning.PMLR,2019:3429-3438.
[28] HENNIG P,SCHULER C J.Entropy Search for Information-Efficient Global Optimization[J].arXiv:1112.1217,2012.
[29] HERNÁNDEZ-LOBATO J M,HOFFMAN M W,GHAHRAMANI Z.Predictive entropy search for efficient global optimization of black-box functions[J].arXiv:1406.2541,2014.
[30] HERNÁNDEZ-LOBATO D,HERNANDEZ-LOBATO J,SHAHA,et al.Predictive entropy search for multi-objective bayesian optimization[C]//International Conference on Machine Learning.PMLR,2016:1492-1501.
[31] MOSS H B,LESLIE D S,RAYSON P.Mumbo:Multi-task max-value bayesian optimization[J].arXiv:2006.12093,2020.
[32] WANG Z,GEHRING C,KOHLI P,et al.Batched large-scale bayesian optimization in high-dimensional spaces[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2018:745-754.
[33] CONTAL E,BUFFONI D,ROBICQUET A,et al.ParallelGaussian process optimization with upper confidence bound and pure exploration[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2013:225-240.
[34] LYU W,YANG F,YAN C,et al.Batch bayesian optimization via multi-objective acquisition ensemble for automated analog circuit design[C]//International Conference on Machine Lear-ning.PMLR,2018:3306-3314.
[35] PARIA B,KANDASAMY K,PÓCZOS B.A flexible framework for multi-objective Bayesian optimization using random scalari-zations[C]//Uncertainty in Artificial Intelligence.PMLR,2020:766-776.
[36] GONG C,PENG J,LIU Q.Quantile stein variational gradient descent for batch bayesian optimization[C]//International Conference on Machine Learning.PMLR,2019:2347-2356.
[37] LIU Q,WANG D.Stein variational gradient descent:A general purpose bayesian inference algorithm[J].arXiv:1608.04471,2016.
[38] SWERSKY K,SNOEK J,ADAMS R P.Freeze-thaw bayesianoptimization[J].arXiv:1406.3896,2014.
[39] PERDIKARIS P,KARNIADAKIS G E.Model inversion viamulti-fidelity Bayesian optimization:a new paradigm for parameter estimation in haemodynamics,and beyond[J].Journal of The Royal Society Interface,2016,13(118):20151107.
[40] DAI Z,YU H,LOW B K H,et al.Bayesian optimization meets Bayesian optimal stopping[C]//International Conference on Machine Learning.PMLR,2019:1496-1506.
[41] KLEIN A,FALKNER S,BARTELS S,et al.Fast bayesian optimization of machine learning hyperparameters on large datasets[C]//Artificial Intelligence and Statistics.PMLR,2017:528-536.
[42] RAMACHANDRAN A,GUPTA S,RANA S,et al.Selectingoptimal source for transfer learning in Bayesian optimisation[C]//Pacific Rim International Conference on Artificial Intelligence.Cham:Springer,2018:42-56.
[43] OH C,TOMCZAK J M,GAVVES E,et al.Combinatorialbayesian optimization using the graph cartesian product[J].ar-Xiv:1902.00448,2019.
[44] GONZÁLEZ J,DAI Z,HENNIG P,et al.Batch Bayesian optimization via local penalization[C]//Artificial Intelligence and Statistics.PMLR,2016:648-657.

相关文章 15

[1]	冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2]	宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3]	何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4]	李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5]	张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6]	陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7]	肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[8]	姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮. 一种基于异质模型融合的 Android 终端恶意软件检测方法 Android Malware Detection Method Based on Heterogeneous Model Fusion 计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[9]	王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[10]	赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[11]	许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[12]	刘林云, 陈开颜, 李雄伟, 张阳, 谢方方. 基于卷积神经网络的旁路密码分析综述 Overview of Side Channel Analysis Based on Convolutional Neural Network 计算机科学, 2022, 49(5): 296-302. https://doi.org/10.11896/jsjkx.210300286
[13]	李野, 陈松灿. 基于物理信息的神经网络:最新进展与展望 Physics-informed Neural Networks:Recent Advances and Prospects 计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
[14]	么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[15]	章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江. 基于眼前节相干光断层扫描成像的核性白内障分类算法 Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image 计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed