在线核选择的对抗式多臂赌博机模型

doi:10.11896／j.issn.1002-137X.2019.01.009

摘要/Abstract

摘要： 在线核选择是在线核方法的重要工作,可分为过滤式、包裹式和嵌入式3种类型。已有在线核选择探索了包裹式方法和嵌入式方法,也经验地采用了过滤式方法,但迄今尚没有一个统一的框架来比较、分析并研究各种在线核选择问题。文中提出一种在线核选择的多臂赌博机模型,该模型可作为一个统一框架,同时给出在线核选择的包裹式方法和嵌入式方法。给定候选核集合,候选集中的一个核对应多臂赌博机模型中的一个臂,在线核选择的每回合依据一个概率分布重复地随机选择多个核,并应用指数加权的方法来更新该概率分布。这样,在线核选择问题本质上可归约为一个非遗忘对手环境下的对抗式多臂赌博机问题,并可应用对抗式多臂赌博机模型统一地给出在线核选择的包裹式方法和嵌入式方法。文中进一步提出一个新的在线核选择后悔的概念,理论证明包裹式方法具有关于回合数亚线性的弱期望后悔界,并且嵌入式方法具有关于回合数亚线性的期望后悔界。最后,在标准数据集上通过实验验证了所提统一框架的可行性。

关键词: 对抗式多臂赌博机, 非遗忘对手, 统一框架, 在线核选择

Abstract: Online kernel selection is an important component of online kernel methods,and it can be classified into three categories,that is,the filter,the wrapper and the embedder.Existing online kernel selection explores the wrapper and the embedder categories,and empirically adopts the filter approach.But there have been no unified frameworks yet for comparing,analyzing and investigating online kernel selection problems.This paper proposed a unified framework for online kernel selection researches via multi-armed bandits,which can model the wrapper and the embedder of online kernel selection simultaneously.Giving a set of candidate kernels,this paper corresponds each kernel to an arm in an adversarial bandit model.At each round of online kernel selection,this paper randomly chose multiple kernels according to a probability distribution,and updated the probability distribution via the exponentially weighted average method.In this way,an online kernel selection problem was reduced to an adversarial bandit problem in a non-oblivious adversary setting,and a unified framework was developed for online kernel selection researches,which can model the wrapper and the embedder uniformly.This paper further defined a new regret concept of online kernel selection,and proved that the wrapper within the framework enjoys a sub-linear weak expected regret bound and the embedder within the framework enjoys a sub-linear expected regret bound.Experimental results on benchmark datasets demonstrate the effectiveness of the proposed unified framework.

Key words: Adversarial multi-armed bandit, Non-oblivious adversary, Online kernel selection, Unified framework

中图分类号:

TP181

李峻樊, 廖士中. 在线核选择的对抗式多臂赌博机模型[J]. 计算机科学, 2019, 46(1): 57-63. https://doi.org/10.11896／j.issn.1002-137X.2019.01.009

LI Jun-fan, LIAO Shi-zhong. Adversarial Multi-armed Bandit Model with Online Kernel Selection[J]. Computer Science, 2019, 46(1): 57-63. https://doi.org/10.11896／j.issn.1002-137X.2019.01.009

参考文献

[1]SHALEV-SHWARTZ S.Online Learning and Online Convex Optimization [J].Foundations & Trends® in Machine Lear-ning,2012,4(2):107-194. [2]NGUYEN K.Nonparametric online machine learning with kernels [C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.Morgan Kaufmann,2017:5197-5198. [3]STONE M.Cross-validatory choice and assessment of statistical predictions [J].Journal of the Royal Statistical Society,Series B (Statistical Methodology),1974,36(2):111-147. [4]BARTLETT P L,BOUCHERON S,LUGOSI G.Model Selection and Error Estimation [J].Machine Learning,2002,48(1):85-113. [5]FOSTER D J,KALE S,MOHRI M,et al.Parameter-free Online Learning via Model Selection [J].Advances in Neural Information Processing Systems,2017,30:6022-6032. [6]GUYON I,SAFFARI A,DROR G,et al.Model Selection:Beyond the Bayesian/Frequentist Divide [J].Journal of Machine Learning Research,2010,11(1):61-87. [7]DEKEL O,SHALEV-SHWARTZ S,SINGER Y.The Forge-tron:A Kernel-Based Perceptron On A Budget[J].Siam Journal on Computing,2008,37(5):1342-1372. [8]ZHAO P,WANG J,WU P,et al.Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-based Online Learning [C]//Proceedings of the 29th International Conference on Machine Learning.ACM,2012:1075-1082. [9]FAN H,SONG Q,SHRESTHA S B.Kernel Online Learning with Adaptive Kernel Width [J].Neurocomputing,2016,175:233-242. [10]CHEN B,LIANG J,ZHENG N,et al.Kernel Least Mean Square with Adaptive Kernel Size [J].Neurocomputing,2016,191:95-106. [11]NGUYEN T D,LE T,BUI H,et al.Large-scale Online Kernel Learning with Random Feature Reparameterization [C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.Morgan Kaufmann,2017:2543-2549. [12]HAN Z,LIAO S.Stochastic Online Kernel Selection with In- stantaneous Loss in Random Feature Space [C]//Proceedings of the 24th International Conference on Neural Information Processing.Springer,2017:33-42. [13]YANG T,MAHDAVI M,JIN R,et al.Online Kernel Selection:Algorithms and Evaluations [C]//Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.AAAI,2012:1197-1202. [14]AUER P,CESA-BIANCHI N,FREUND Y,et al.The Nonstochastic Multi-armed Bandit Problem [J].SIAM Journal on Computing,2002,32(1):48-77. [15]BUBECK S,CESA-BIANCHI N.Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J].Foundations & Trends® in Machine Learning,2012,5(1):1-122. [16]ARORA R,DEKEL O,TEWARI A.Online Dandit Learning Against an Adaptive Adversary:From Regret to Policy Regret [C]//Proceedings of the 29th International Conference on Machine Learning.ACM,2012:1503-1510. [17]JIN R,HOI S C H,YANG T.Online Multiple Kernel Learning:Algorithms and Mistake Bounds [C]//Proceedings of the 21st International Conference on Algorithmic Learning Theory.Springer,2010:390-404. [18]LU J,HOI S C H,WANG J,et al.Large Scale Online Kernel Learning [J].Journal of Machine Learning Research,2016,17(47):1-43. [19]RAHIMI A,RECHT B.Random Features for Large-scale Kernel Machines [J].Advances in Neural Information Processing Systems,2007,20:1177-1184. [20]CHANG C C,LIN C J.LIBSVM:A Library for Support Vector Machines [J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27. [21]LICHMAN M.UCI Machine Learning Repository[EB/OL].http://archive.ics.uci.edu/ml/index.php.

相关文章 15

[1]	程章桃, 钟婷, 张晟铭, 周帆. 基于图学习的推荐系统研究综述 Survey of Recommender Systems Based on Graph Learning 计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3]	齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[4]	高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[5]	孙晓寒, 张莉. 基于评分区域子空间的协同过滤推荐算法 Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace 计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062
[6]	刘卫明, 安冉, 毛伊敏. 基于聚类和WOA的并行支持向量机算法 Parallel Support Vector Machine Algorithm Based on Clustering and WOA 计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040
[7]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[8]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[9]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[10]	王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[11]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12]	陈章辉, 熊贇. 基于解耦-检索-生成的图像风格化描述生成模型 Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate 计算机科学, 2022, 49(6): 180-186. https://doi.org/10.11896/jsjkx.211100129
[13]	徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[14]	许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[15]	罗俊仁, 张万鹏, 陆丽娜, 陈璟. 即时策略博弈在线对抗规划方法综述 Survey on Online Adversarial Planning for Real-time Strategy Game 计算机科学, 2022, 49(6): 287-296. https://doi.org/10.11896/jsjkx.210600168

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed