计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 57-63.doi: 10.11896/j.issn.1002-137X.2019.01.009

• 2018 年第七届中国数据挖掘会议 • 上一篇    下一篇

在线核选择的对抗式多臂赌博机模型

李峻樊, 廖士中   

  1. (天津大学智能与计算学部 天津300350)
  • 收稿日期:2018-06-18 出版日期:2019-01-15 发布日期:2019-02-25
  • 作者简介:李峻樊(1992-),男,硕士,主要研究方向为机器学习和模型选择,E-mail:junfli@tju.edu.cn;廖士中(1964-),男,博士,教授,CCF会员,主要研究方向为人工智能、机器学习和理论计算机科学,E-mail:szliao@tju.edu.cn(通信作者)。
  • 基金资助:
    国家自然科学基金项目(61673293)资助

Adversarial Multi-armed Bandit Model with Online Kernel Selection

LI Jun-fan, LIAO Shi-zhong   

  1. (College of Intelligence and Computing,Tianjin University,Tianjin 300350,China)
  • Received:2018-06-18 Online:2019-01-15 Published:2019-02-25

摘要: 在线核选择是在线核方法的重要工作,可分为过滤式、包裹式和嵌入式3种类型。已有在线核选择探索了包裹式方法和嵌入式方法,也经验地采用了过滤式方法,但迄今尚没有一个统一的框架来比较、分析并研究各种在线核选择问题。文中提出一种在线核选择的多臂赌博机模型,该模型可作为一个统一框架,同时给出在线核选择的包裹式方法和嵌入式方法。给定候选核集合,候选集中的一个核对应多臂赌博机模型中的一个臂,在线核选择的每回合依据一个概率分布重复地随机选择多个核,并应用指数加权的方法来更新该概率分布。这样,在线核选择问题本质上可归约为一个非遗忘对手环境下的对抗式多臂赌博机问题,并可应用对抗式多臂赌博机模型统一地给出在线核选择的包裹式方法和嵌入式方法。文中进一步提出一个新的在线核选择后悔的概念,理论证明包裹式方法具有关于回合数亚线性的弱期望后悔界,并且嵌入式方法具有关于回合数亚线性的期望后悔界。最后,在标准数据集上通过实验验证了所提统一框架的可行性。

关键词: 对抗式多臂赌博机, 非遗忘对手, 统一框架, 在线核选择

Abstract: Online kernel selection is an important component of online kernel methods,and it can be classified into three categories,that is,the filter,the wrapper and the embedder.Existing online kernel selection explores the wrapper and the embedder categories,and empirically adopts the filter approach.But there have been no unified frameworks yet for comparing,analyzing and investigating online kernel selection problems.This paper proposed a unified framework for online kernel selection researches via multi-armed bandits,which can model the wrapper and the embedder of online kernel selection simultaneously.Giving a set of candidate kernels,this paper corresponds each kernel to an arm in an adversarial bandit model.At each round of online kernel selection,this paper randomly chose multiple kernels according to a probability distribution,and updated the probability distribution via the exponentially weighted average method.In this way,an online kernel selection problem was reduced to an adversarial bandit problem in a non-oblivious adversary setting,and a unified framework was developed for online kernel selection researches,which can model the wrapper and the embedder uniformly.This paper further defined a new regret concept of online kernel selection,and proved that the wrapper within the framework enjoys a sub-linear weak expected regret bound and the embedder within the framework enjoys a sub-linear expected regret bound.Experimental results on benchmark datasets demonstrate the effectiveness of the proposed unified framework.

Key words: Adversarial multi-armed bandit, Non-oblivious adversary, Online kernel selection, Unified framework

中图分类号: 

  • TP181
[1]SHALEV-SHWARTZ S.Online Learning and Online Convex Optimization [J].Foundations & Trends<sup>®</sup> in Machine Lear-ning,2012,4(2):107-194.<br /> [2]NGUYEN K.Nonparametric online machine learning with kernels [C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.Morgan Kaufmann,2017:5197-5198.<br /> [3]STONE M.Cross-validatory choice and assessment of statistical predictions [J].Journal of the Royal Statistical Society,Series B (Statistical Methodology),1974,36(2):111-147.<br /> [4]BARTLETT P L,BOUCHERON S,LUGOSI G.Model Selection and Error Estimation [J].Machine Learning,2002,48(1):85-113.<br /> [5]FOSTER D J,KALE S,MOHRI M,et al.Parameter-free Online Learning via Model Selection [J].Advances in Neural Information Processing Systems,2017,30:6022-6032.<br /> [6]GUYON I,SAFFARI A,DROR G,et al.Model Selection:Beyond the Bayesian/Frequentist Divide [J].Journal of Machine Learning Research,2010,11(1):61-87.<br /> [7]DEKEL O,SHALEV-SHWARTZ S,SINGER Y.The Forge-tron:A Kernel-Based Perceptron On A Budget[J].Siam Journal on Computing,2008,37(5):1342-1372.<br /> [8]ZHAO P,WANG J,WU P,et al.Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-based Online Learning [C]//Proceedings of the 29th International Conference on Machine Learning.ACM,2012:1075-1082.<br /> [9]FAN H,SONG Q,SHRESTHA S B.Kernel Online Learning with Adaptive Kernel Width [J].Neurocomputing,2016,175:233-242.<br /> [10]CHEN B,LIANG J,ZHENG N,et al.Kernel Least Mean Square with Adaptive Kernel Size [J].Neurocomputing,2016,191:95-106.<br /> [11]NGUYEN T D,LE T,BUI H,et al.Large-scale Online Kernel Learning with Random Feature Reparameterization [C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.Morgan Kaufmann,2017:2543-2549.<br /> [12]HAN Z,LIAO S.Stochastic Online Kernel Selection with In- stantaneous Loss in Random Feature Space [C]//Proceedings of the 24th International Conference on Neural Information Processing.Springer,2017:33-42.<br /> [13]YANG T,MAHDAVI M,JIN R,et al.Online Kernel Selection:Algorithms and Evaluations [C]//Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.AAAI,2012:1197-1202.<br /> [14]AUER P,CESA-BIANCHI N,FREUND Y,et al.The Nonstochastic Multi-armed Bandit Problem [J].SIAM Journal on Computing,2002,32(1):48-77.<br /> [15]BUBECK S,CESA-BIANCHI N.Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J].Foundations & Trends<sup>®</sup> in Machine Learning,2012,5(1):1-122.<br /> [16]ARORA R,DEKEL O,TEWARI A.Online Dandit Learning Against an Adaptive Adversary:From Regret to Policy Regret [C]//Proceedings of the 29th International Conference on Machine Learning.ACM,2012:1503-1510.<br /> [17]JIN R,HOI S C H,YANG T.Online Multiple Kernel Learning:Algorithms and Mistake Bounds [C]//Proceedings of the 21st International Conference on Algorithmic Learning Theory.Springer,2010:390-404.<br /> [18]LU J,HOI S C H,WANG J,et al.Large Scale Online Kernel Learning [J].Journal of Machine Learning Research,2016,17(47):1-43.<br /> [19]RAHIMI A,RECHT B.Random Features for Large-scale Kernel Machines [J].Advances in Neural Information Processing Systems,2007,20:1177-1184.<br /> [20]CHANG C C,LIN C J.LIBSVM:A Library for Support Vector Machines [J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.<br /> [21]LICHMAN M.UCI Machine Learning Repository[EB/OL].http://archive.ics.uci.edu/ml/index.php.
[1] 程章桃, 钟婷, 张晟铭, 周帆.
基于图学习的推荐系统研究综述
Survey of Recommender Systems Based on Graph Learning
计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[4] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[5] 孙晓寒, 张莉.
基于评分区域子空间的协同过滤推荐算法
Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace
计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062
[6] 刘卫明, 安冉, 毛伊敏.
基于聚类和WOA的并行支持向量机算法
Parallel Support Vector Machine Algorithm Based on Clustering and WOA
计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040
[7] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[8] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[9] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[10] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[11] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12] 陈章辉, 熊贇.
基于解耦-检索-生成的图像风格化描述生成模型
Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate
计算机科学, 2022, 49(6): 180-186. https://doi.org/10.11896/jsjkx.211100129
[13] 徐辉, 康金梦, 张加万.
基于特征感知的数字壁画复原方法
Digital Mural Inpainting Method Based on Feature Perception
计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[14] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[15] 罗俊仁, 张万鹏, 陆丽娜, 陈璟.
即时策略博弈在线对抗规划方法综述
Survey on Online Adversarial Planning for Real-time Strategy Game
计算机科学, 2022, 49(6): 287-296. https://doi.org/10.11896/jsjkx.210600168
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!