计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 1-6.doi: 10.11896/j.issn.1002-137X.2018.07.001
• 第五届CCF 大数据学术会议 • 下一篇
赵星宇1,丁世飞1,2
ZHAO Xing-yu1,DING Shi-fei1,2
摘要: 作为一种崭新的机器学习方法,深度强化学习将深度学习和强化学习技术结合起来,使智能体能够从高维空间感知信息,并根据得到的信息训练模型、做出决策。由于深度强化学习算法具有通用性和有效性,人们对其进行了广泛的研究,并将其运用到了日常生活的各个领域。首先,对深度强化学习研究进行概述,介绍了深度强化学习的基础理论;然后,分别介绍了基于值函数和基于策略的深度强化学习算法,讨论了其应用前景;最后,对相关研究工作做了总结和展望。
中图分类号:
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning.Nature,2015,518(7540):529-533.<br /> [2]SILVER D,HUANG A,MADDISON C,et al.ing the game of Go with deep neural networks and tree search.Nature,2016,529(7587):484-489.<br /> [3]LEVINE S,PASTOR P,KRIZHEVSKY A,et al.LearningHand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection[C]∥International Symposium on Experimental Robotics.Springer,Cham,2016:173-184.<br /> [4]ZHANG M,MCCARTHY Z,FINN C,et al.Learning deep neural network policies with continuous memory states[C]∥Proceedings of the International Conference on Robotics and Automation.Stockholm,Sweden,2016:520-527.<br /> [5]LEVINE S,FINN C,DARRELL T,et al.End-to-end training of deep visuomotor policies.Journal of Machine Learning Research,2016,17(39):1-40.<br /> [6]LENZ I,KNEPPER R,SAXENA A.Deepmpc:learning deep latent features for model predictive control[C]∥Proceedings of the Robotics Scienceand Systems.Rome,Italy,2015:201-209.<br /> [7]SATIJA H,PINEAU J.Simultaneous machine translation using deep reinforcement learning[C]∥Proceedings of the Workshops of International Conference on Machine Learning.New York,USA,2016:110-119.<br /> [8]OH J,GUO X,LEE H,et al.Action-conditional video prediction using deep networks in atari games[C]∥Advances in Neural Information Processing Systems.2015:2863-2871.<br /> [9]GUO H.Generating text with deep reinforcement learning[C]∥Proceedings of the Workshops of Advances in Neural Information Processing Systems.Montreal,Canada,2015:1-9.<br /> [10]LI J,MONROE W,RITTER A,et al.Deep reinforcement lear-ning for dialogue generation[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Austin,USA,2016:1192-1202.<br /> [11]NARASIMHAN K,KULKARNI T,BARZILAY R.Language Understanding for Text-based Games Using Deep Reinforcement Learning.Computer Science,2015,40(4):1-5.<br /> [12]SALLAB A,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving.Electronic Imaging,2017,2017(19):70-76.<br /> [13]CAICEDO J,LAZEBNIK S.Active Object Localization with Deep Reinforcement Learning[C]∥IEEE International Con-ference on Computer Vision.IEEE,2015:2488-2496.<br /> [14]ZHAO D B,SHAO K,ZHU Y H,et al.Review of deep reinforcement learning and discussions on the development of computer Go.Control Theory and Applications,2016,33(6):701-717.(in Chinese)<br /> 赵冬斌,邵坤,朱圆恒,等.深度强化学习综述:兼论计算机围棋的发展.控制理论与应用,2016,33(6):701-717.<br /> [15]HINTON G,SALAKHUTDINOV R.Reducing the Dimensiona-lity of Data with Neural Networks.Science,2006,313(5786):504-507.<br /> [16]DENG L,YU D.Deep learning:methods and applications.Foundations and Trends in Signal Processing,2014,7(3/4):197-387.<br /> [17]BENGIO Y,LECUN Y.Scaling learning algorithms towards AI.Large-scale Kernel Machines,2007,34(5):1-41.<br /> [18]HINTON G,OSINDERO S,TEH Y.A fast learning algorithm for deep belief nets.Neural Computation,2006,18(7):1527-1554.<br /> [19]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory.Neural Computation,1997,9(8):1735-1780.<br /> [20]CHO K,VAN MERRI NBOER B,GULCE-HRE C,et al.Lear-ning phrase representations using RNN encoder-decoder for statistical machine translation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1724-1734.GAO Y,CHEN S F,LU X.Research on Reinfocerment Lear-ning Technology:A Review.Acta Automatica Sinica,2004,30(1):86-100.(in Chinese)<br /> 高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1):86-100.<br /> [22]WATKINS C.Learning from delayed rewards.Cambridge:King’s College,1989.<br /> [23]WILLIAMS R.Simple statistical gradient-following algoithmsfor connectionist reinforcement learning.Machine Learning,1992,8(3/4):229-256.<br /> [24]KONDA V,TSITSIKLIS J.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.<br /> [25]LANGE S,RIEDMILLER M.Deep auto-encoder neural net-works in reinforcement learning[C]∥Neural Networks (IJCNN),The 2010 International Joint Conference on Computational Science and Optimization.IEEE,2010:1-8.<br /> [26]LANGE S,RIEDMILLER M,VOIGTL NDER A.Autono-mous reinforcement learning on raw visual input data in a real world application[C]∥International Joint Conference on Neural Networks.IEEE,2012:1-8.<br /> [27]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA,2013:201-220.<br /> [28]HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Lear-ning with Double Q-Learning[C]∥AAAI.2016:2094-2100.<br /> [29]WANG Z,FREITAS N,LANCTOT M.Dueling network architectures for deep reinforcement learning[C]∥Proceedings of the International Conference on Machine Learning.New York,USA,2016:1995-2003.<br /> [30]SCHAUL T,QUAN J,ANTONOGLOU I,et al. Prioritized experience replay∥Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico,2016:322-355.<br /> [31]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN[C]∥Advances in Neural Information Processing Systems.2016:4026-4034.<br /> [32]HASSELT H,GUEZ A,HESSEL M,et al.Learning functions across many orders of magnitudes[C]∥Proceedings of the Advances in Neural Information Processing Systems.Barcelona,Spain,2016:80-99.<br /> [33]LAKSHMINARAYANAN A,SHARMA S,RAVINDRAN B.Dynamic frame skip deep q network∥Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence.New York,USA,2016.<br /> [34]MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning∥Advances in Neural Information Processing Systems.2016:1054-1062.<br /> [35]FRAN OIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:towards new dynamic strategies∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada,2015:107-1160.<br /> [36]LILLICRAP T,HUNT J,PRITZEL A,et al.Continuous control with deep reinforcement learning.https://arxiv.org/abs.1509.02971.<br /> [37]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms∥Proceedings of the 31st International Conference on Machine Learning.2014:387-395.<br /> [38]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms.https://arxiv.org/abs/1707.06347.<br /> [39]HEESS N,DHRUVA T,SRIRAM S,et al.Emergence of Locomotion Behaviours in Rich Environments .https://ar-xiv.org/abs/1707.02286.<br /> [40]SCHULMAN J,LEVINE S,MORITZ P,et al.Trust RegionPolicy Optimization∥International Conference on Machine Learning.Lille:International Machine Learning Society,2015:1889-1897.<br /> [41]ZHANG T,KAHN G,LEVINE S,et al.Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search∥2016 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2016:528-535.<br /> [42]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarkingdeep reinforcement learning for continuous control∥International Conference on Machine Learning.2016:1329-1338.<br /> [43]BALDUZZI D,GHIFARY M.Compatible Value Gradients forReinforcement Learning of Continuous Deep Policies.https://arxiv.org/abs/1509.03005.<br /> [44]HEESS N,WAYNE G,SILVER D,et al.Learning continuous control policies by stochastic value gradients∥Advances in Neural Information Processing Systems.2015:2944-2952.<br /> [45]MNIH V,BADIA A,MIRZA M,et al.Asynchronous methods for deep reinforcement learning∥International Conference on Machine Learning.2016:1928-1937.<br /> [46]JADERBERG M,MNIH V,CZARNECKI W,et al.Reinforcement learning with unsupervised auxiliary tasks .https://arxiv.org/abs/1611.05397.<br /> [47]FINN C,LEVINE S,ABBEEL P.Guided cost learning:Deep inverse optimal control via policy optimization∥International Conference on Machine Learning.2016:49-58.<br /> [48]OH J,CHOCKALINGAM V,SINGH S,et al.Control of memory,active perception,and action in Minecraft∥Proceedings of the International Conference on Machine Learning.New York,USA,2016:2790-2799.<br /> [49]KULKARNI T,NARASIMHAN K,SAEEDI A,et al.Hierarchical deep reinforcement learning:Integrating temporal abstraction and intrinsic motivation∥Advances in Neural Information Processing Systems.2016:3675-3683.<br /> [50]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:Variationalinformation maximizing exploration∥Advances in Neural Information Processing Systems.2016:1109-1117.<br /> [51]FERN NDEZ F,VELOSO M.Probabilistic policy reuse in areinforcement learning agent∥Proceedings of the InternationalJoint Conference on Autonomous Agents and Multiagent Systems.Istanbul,Turkey,2015:720-727.<br /> [52]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation∥Proceedings of the Conference on Neural Information Processing Systems.Barcelona,Spain,2016:1471-1479.<br /> [53]SCHAUL T,HORGAN D,GREGOR K,et al.Universal value function approximators∥Proceedings of the 32nd International Conference on Machine Learning.Lugano,Switzerland,2015:1312-1320.<br /> [54]LAMPLE G,CHAPLOT D.Playing FPS Games with DeepReinforcement Learning∥AAAI.2017:2140-2146.<br /> [55]KEMPKA M,WYDMUCH M,RUNC G,et al.Vizdoom:Adoom-based ai research platform for visual reinforcement lear-ning∥2016 IEEE Conference on Computational Intelligence and Games (CIG).IEEE,2016:1-8.<br /> [56]VINYALS O,EWALDS T,BARTUNOV S,et al.StarCraft II:A New Challenge for Reinforcement Learning.https://arxiv.org/abs/1708.04782.<br /> [57]ZHU Y,MOTTAGHI R,KOLVE E,et al.Target-driven visual navigation in indoor scenes using deep reinforcement learning∥2017 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2017:3357-3364.<br /> [58]SUTSKEVER I,VINYALS O,LE Q.Sequence to sequence lear-ning with neural networks∥Advances in Neural Information Processing Systems.2014:3104-3112.<br /> [59]LI J,MONROE W,RITTER A,et al.Deep reinforcement lear-ning for dialogue generation.https://arxiv.org/abs/1707.06347.<br /> [60]PARISOTTO E,BA J,SALAKHUTDINOV R.Actor-mimic:deep multitaskand transfer reinforcement learning∥Proceedings of the International Conference on Learning Representations.San Juan,Puerto Rico,2016:156-171.<br /> [61]CHEN X G,YU Y.Reinforcement Learning and Its Application to the Game of Go.Acta Automatica Sinica,2016,42(5):685-695.(in Chinese)<br /> 陈兴国,俞扬.强化学习及其在电脑围棋中的应用.自动化学报,2016,42(5):685-695. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[3] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[4] | 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148 |
[5] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[6] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[7] | 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174 |
[8] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[9] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[10] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[11] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[12] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[13] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[14] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[15] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
|