Computer Science ›› 2018, Vol. 45 ›› Issue (7): 1-6.doi: 10.11896/j.issn.1002-137X.2018.07.001

• CCF Big Data 2017 •     Next Articles

Research on Deep Reinforcement Learning

ZHAO Xing-yu1,DING Shi-fei1,2   

  1. School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China1;
    Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China2
  • Received:2017-06-12 Online:2018-07-30 Published:2018-07-30

Abstract: As a new machine learning method,deep reinforcement learning combines deep learning and reinforcement learning,which makes that the agent can perceive the information from high dimensional space,train model and make decision according to the received information.Deep reinforcement learning has been widely researched and used in va-rious fields of daily life because of its universality and effectiveness.Firstly,an overview of the deep reinforcement lear-ning research was given and the basic theory of deep reinforcement learning was introduced.Then value-based algorithms and policy-based algorithms were introduced.After that,the application prospects of deep reinfercement learning were discussed.Finally,the related researches were summarized and prospected.

Key words: Artificial intelligence, Deep learning, Deep reinforcement learning, Reinforcement learning

CLC Number: 

  • TP181
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning.Nature,2015,518(7540):529-533.<br /> [2]SILVER D,HUANG A,MADDISON C,et al.ing the game of Go with deep neural networks and tree search.Nature,2016,529(7587):484-489.<br /> [3]LEVINE S,PASTOR P,KRIZHEVSKY A,et al.LearningHand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection[C]∥International Symposium on Experimental Robotics.Springer,Cham,2016:173-184.<br /> [4]ZHANG M,MCCARTHY Z,FINN C,et al.Learning deep neural network policies with continuous memory states[C]∥Proceedings of the International Conference on Robotics and Automation.Stockholm,Sweden,2016:520-527.<br /> [5]LEVINE S,FINN C,DARRELL T,et al.End-to-end training of deep visuomotor policies.Journal of Machine Learning Research,2016,17(39):1-40.<br /> [6]LENZ I,KNEPPER R,SAXENA A.Deepmpc:learning deep latent features for model predictive control[C]∥Proceedings of the Robotics Scienceand Systems.Rome,Italy,2015:201-209.<br /> [7]SATIJA H,PINEAU J.Simultaneous machine translation using deep reinforcement learning[C]∥Proceedings of the Workshops of International Conference on Machine Learning.New York,USA,2016:110-119.<br /> [8]OH J,GUO X,LEE H,et al.Action-conditional video prediction using deep networks in atari games[C]∥Advances in Neural Information Processing Systems.2015:2863-2871.<br /> [9]GUO H.Generating text with deep reinforcement learning[C]∥Proceedings of the Workshops of Advances in Neural Information Processing Systems.Montreal,Canada,2015:1-9.<br /> [10]LI J,MONROE W,RITTER A,et al.Deep reinforcement lear-ning for dialogue generation[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Austin,USA,2016:1192-1202.<br /> [11]NARASIMHAN K,KULKARNI T,BARZILAY R.Language Understanding for Text-based Games Using Deep Reinforcement Learning.Computer Science,2015,40(4):1-5.<br /> [12]SALLAB A,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving.Electronic Imaging,2017,2017(19):70-76.<br /> [13]CAICEDO J,LAZEBNIK S.Active Object Localization with Deep Reinforcement Learning[C]∥IEEE International Con-ference on Computer Vision.IEEE,2015:2488-2496.<br /> [14]ZHAO D B,SHAO K,ZHU Y H,et al.Review of deep reinforcement learning and discussions on the development of computer Go.Control Theory and Applications,2016,33(6):701-717.(in Chinese)<br /> 赵冬斌,邵坤,朱圆恒,等.深度强化学习综述:兼论计算机围棋的发展.控制理论与应用,2016,33(6):701-717.<br /> [15]HINTON G,SALAKHUTDINOV R.Reducing the Dimensiona-lity of Data with Neural Networks.Science,2006,313(5786):504-507.<br /> [16]DENG L,YU D.Deep learning:methods and applications.Foundations and Trends in Signal Processing,2014,7(3/4):197-387.<br /> [17]BENGIO Y,LECUN Y.Scaling learning algorithms towards AI.Large-scale Kernel Machines,2007,34(5):1-41.<br /> [18]HINTON G,OSINDERO S,TEH Y.A fast learning algorithm for deep belief nets.Neural Computation,2006,18(7):1527-1554.<br /> [19]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory.Neural Computation,1997,9(8):1735-1780.<br /> [20]CHO K,VAN MERRI NBOER B,GULCE-HRE C,et al.Lear-ning phrase representations using RNN encoder-decoder for statistical machine translation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1724-1734.GAO Y,CHEN S F,LU X.Research on Reinfocerment Lear-ning Technology:A Review.Acta Automatica Sinica,2004,30(1):86-100.(in Chinese)<br /> 高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1):86-100.<br /> [22]WATKINS C.Learning from delayed rewards.Cambridge:King’s College,1989.<br /> [23]WILLIAMS R.Simple statistical gradient-following algoithmsfor connectionist reinforcement learning.Machine Learning,1992,8(3/4):229-256.<br /> [24]KONDA V,TSITSIKLIS J.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.<br /> [25]LANGE S,RIEDMILLER M.Deep auto-encoder neural net-works in reinforcement learning[C]∥Neural Networks (IJCNN),The 2010 International Joint Conference on Computational Science and Optimization.IEEE,2010:1-8.<br /> [26]LANGE S,RIEDMILLER M,VOIGTL NDER A.Autono-mous reinforcement learning on raw visual input data in a real world application[C]∥International Joint Conference on Neural Networks.IEEE,2012:1-8.<br /> [27]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA,2013:201-220.<br /> [28]HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Lear-ning with Double Q-Learning[C]∥AAAI.2016:2094-2100.<br /> [29]WANG Z,FREITAS N,LANCTOT M.Dueling network architectures for deep reinforcement learning[C]∥Proceedings of the International Conference on Machine Learning.New York,USA,2016:1995-2003.<br /> [30]SCHAUL T,QUAN J,ANTONOGLOU I,et al. Prioritized experience replay∥Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico,2016:322-355.<br /> [31]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN[C]∥Advances in Neural Information Processing Systems.2016:4026-4034.<br /> [32]HASSELT H,GUEZ A,HESSEL M,et al.Learning functions across many orders of magnitudes[C]∥Proceedings of the Advances in Neural Information Processing Systems.Barcelona,Spain,2016:80-99.<br /> [33]LAKSHMINARAYANAN A,SHARMA S,RAVINDRAN B.Dynamic frame skip deep q network∥Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence.New York,USA,2016.<br /> [34]MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning∥Advances in Neural Information Processing Systems.2016:1054-1062.<br /> [35]FRAN OIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:towards new dynamic strategies∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada,2015:107-1160.<br /> [36]LILLICRAP T,HUNT J,PRITZEL A,et al.Continuous control with deep reinforcement learning.https://arxiv.org/abs.1509.02971.<br /> [37]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms∥Proceedings of the 31st International Conference on Machine Learning.2014:387-395.<br /> [38]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms.https://arxiv.org/abs/1707.06347.<br /> [39]HEESS N,DHRUVA T,SRIRAM S,et al.Emergence of Locomotion Behaviours in Rich Environments .https://ar-xiv.org/abs/1707.02286.<br /> [40]SCHULMAN J,LEVINE S,MORITZ P,et al.Trust RegionPolicy Optimization∥International Conference on Machine Learning.Lille:International Machine Learning Society,2015:1889-1897.<br /> [41]ZHANG T,KAHN G,LEVINE S,et al.Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search∥2016 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2016:528-535.<br /> [42]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarkingdeep reinforcement learning for continuous control∥International Conference on Machine Learning.2016:1329-1338.<br /> [43]BALDUZZI D,GHIFARY M.Compatible Value Gradients forReinforcement Learning of Continuous Deep Policies.https://arxiv.org/abs/1509.03005.<br /> [44]HEESS N,WAYNE G,SILVER D,et al.Learning continuous control policies by stochastic value gradients∥Advances in Neural Information Processing Systems.2015:2944-2952.<br /> [45]MNIH V,BADIA A,MIRZA M,et al.Asynchronous methods for deep reinforcement learning∥International Conference on Machine Learning.2016:1928-1937.<br /> [46]JADERBERG M,MNIH V,CZARNECKI W,et al.Reinforcement learning with unsupervised auxiliary tasks .https://arxiv.org/abs/1611.05397.<br /> [47]FINN C,LEVINE S,ABBEEL P.Guided cost learning:Deep inverse optimal control via policy optimization∥International Conference on Machine Learning.2016:49-58.<br /> [48]OH J,CHOCKALINGAM V,SINGH S,et al.Control of memory,active perception,and action in Minecraft∥Proceedings of the International Conference on Machine Learning.New York,USA,2016:2790-2799.<br /> [49]KULKARNI T,NARASIMHAN K,SAEEDI A,et al.Hierarchical deep reinforcement learning:Integrating temporal abstraction and intrinsic motivation∥Advances in Neural Information Processing Systems.2016:3675-3683.<br /> [50]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:Variationalinformation maximizing exploration∥Advances in Neural Information Processing Systems.2016:1109-1117.<br /> [51]FERN NDEZ F,VELOSO M.Probabilistic policy reuse in areinforcement learning agent∥Proceedings of the InternationalJoint Conference on Autonomous Agents and Multiagent Systems.Istanbul,Turkey,2015:720-727.<br /> [52]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation∥Proceedings of the Conference on Neural Information Processing Systems.Barcelona,Spain,2016:1471-1479.<br /> [53]SCHAUL T,HORGAN D,GREGOR K,et al.Universal value function approximators∥Proceedings of the 32nd International Conference on Machine Learning.Lugano,Switzerland,2015:1312-1320.<br /> [54]LAMPLE G,CHAPLOT D.Playing FPS Games with DeepReinforcement Learning∥AAAI.2017:2140-2146.<br /> [55]KEMPKA M,WYDMUCH M,RUNC G,et al.Vizdoom:Adoom-based ai research platform for visual reinforcement lear-ning∥2016 IEEE Conference on Computational Intelligence and Games (CIG).IEEE,2016:1-8.<br /> [56]VINYALS O,EWALDS T,BARTUNOV S,et al.StarCraft II:A New Challenge for Reinforcement Learning.https://arxiv.org/abs/1708.04782.<br /> [57]ZHU Y,MOTTAGHI R,KOLVE E,et al.Target-driven visual navigation in indoor scenes using deep reinforcement learning∥2017 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2017:3357-3364.<br /> [58]SUTSKEVER I,VINYALS O,LE Q.Sequence to sequence lear-ning with neural networks∥Advances in Neural Information Processing Systems.2014:3104-3112.<br /> [59]LI J,MONROE W,RITTER A,et al.Deep reinforcement lear-ning for dialogue generation.https://arxiv.org/abs/1707.06347.<br /> [60]PARISOTTO E,BA J,SALAKHUTDINOV R.Actor-mimic:deep multitaskand transfer reinforcement learning∥Proceedings of the International Conference on Learning Representations.San Juan,Puerto Rico,2016:156-171.<br /> [61]CHEN X G,YU Y.Reinforcement Learning and Its Application to the Game of Go.Acta Automatica Sinica,2016,42(5):685-695.(in Chinese)<br /> 陈兴国,俞扬.强化学习及其在电脑围棋中的应用.自动化学报,2016,42(5):685-695.
[1] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[2] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[3] LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[4] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[5] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[6] YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[7] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[8] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[9] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[12] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[13] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[14] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[15] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!