Computer Science ›› 2018, Vol. 45 ›› Issue (7): 1-6.doi: 10.11896/j.issn.1002-137X.2018.07.001
• CCF Big Data 2017 • Next Articles
ZHAO Xing-yu1,DING Shi-fei1,2
CLC Number:
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning.Nature,2015,518(7540):529-533.<br /> [2]SILVER D,HUANG A,MADDISON C,et al.ing the game of Go with deep neural networks and tree search.Nature,2016,529(7587):484-489.<br /> [3]LEVINE S,PASTOR P,KRIZHEVSKY A,et al.LearningHand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection[C]∥International Symposium on Experimental Robotics.Springer,Cham,2016:173-184.<br /> [4]ZHANG M,MCCARTHY Z,FINN C,et al.Learning deep neural network policies with continuous memory states[C]∥Proceedings of the International Conference on Robotics and Automation.Stockholm,Sweden,2016:520-527.<br /> [5]LEVINE S,FINN C,DARRELL T,et al.End-to-end training of deep visuomotor policies.Journal of Machine Learning Research,2016,17(39):1-40.<br /> [6]LENZ I,KNEPPER R,SAXENA A.Deepmpc:learning deep latent features for model predictive control[C]∥Proceedings of the Robotics Scienceand Systems.Rome,Italy,2015:201-209.<br /> [7]SATIJA H,PINEAU J.Simultaneous machine translation using deep reinforcement learning[C]∥Proceedings of the Workshops of International Conference on Machine Learning.New York,USA,2016:110-119.<br /> [8]OH J,GUO X,LEE H,et al.Action-conditional video prediction using deep networks in atari games[C]∥Advances in Neural Information Processing Systems.2015:2863-2871.<br /> [9]GUO H.Generating text with deep reinforcement learning[C]∥Proceedings of the Workshops of Advances in Neural Information Processing Systems.Montreal,Canada,2015:1-9.<br /> [10]LI J,MONROE W,RITTER A,et al.Deep reinforcement lear-ning for dialogue generation[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Austin,USA,2016:1192-1202.<br /> [11]NARASIMHAN K,KULKARNI T,BARZILAY R.Language Understanding for Text-based Games Using Deep Reinforcement Learning.Computer Science,2015,40(4):1-5.<br /> [12]SALLAB A,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving.Electronic Imaging,2017,2017(19):70-76.<br /> [13]CAICEDO J,LAZEBNIK S.Active Object Localization with Deep Reinforcement Learning[C]∥IEEE International Con-ference on Computer Vision.IEEE,2015:2488-2496.<br /> [14]ZHAO D B,SHAO K,ZHU Y H,et al.Review of deep reinforcement learning and discussions on the development of computer Go.Control Theory and Applications,2016,33(6):701-717.(in Chinese)<br /> 赵冬斌,邵坤,朱圆恒,等.深度强化学习综述:兼论计算机围棋的发展.控制理论与应用,2016,33(6):701-717.<br /> [15]HINTON G,SALAKHUTDINOV R.Reducing the Dimensiona-lity of Data with Neural Networks.Science,2006,313(5786):504-507.<br /> [16]DENG L,YU D.Deep learning:methods and applications.Foundations and Trends in Signal Processing,2014,7(3/4):197-387.<br /> [17]BENGIO Y,LECUN Y.Scaling learning algorithms towards AI.Large-scale Kernel Machines,2007,34(5):1-41.<br /> [18]HINTON G,OSINDERO S,TEH Y.A fast learning algorithm for deep belief nets.Neural Computation,2006,18(7):1527-1554.<br /> [19]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory.Neural Computation,1997,9(8):1735-1780.<br /> [20]CHO K,VAN MERRI NBOER B,GULCE-HRE C,et al.Lear-ning phrase representations using RNN encoder-decoder for statistical machine translation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1724-1734.GAO Y,CHEN S F,LU X.Research on Reinfocerment Lear-ning Technology:A Review.Acta Automatica Sinica,2004,30(1):86-100.(in Chinese)<br /> 高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1):86-100.<br /> [22]WATKINS C.Learning from delayed rewards.Cambridge:King’s College,1989.<br /> [23]WILLIAMS R.Simple statistical gradient-following algoithmsfor connectionist reinforcement learning.Machine Learning,1992,8(3/4):229-256.<br /> [24]KONDA V,TSITSIKLIS J.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.<br /> [25]LANGE S,RIEDMILLER M.Deep auto-encoder neural net-works in reinforcement learning[C]∥Neural Networks (IJCNN),The 2010 International Joint Conference on Computational Science and Optimization.IEEE,2010:1-8.<br /> [26]LANGE S,RIEDMILLER M,VOIGTL NDER A.Autono-mous reinforcement learning on raw visual input data in a real world application[C]∥International Joint Conference on Neural Networks.IEEE,2012:1-8.<br /> [27]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA,2013:201-220.<br /> [28]HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Lear-ning with Double Q-Learning[C]∥AAAI.2016:2094-2100.<br /> [29]WANG Z,FREITAS N,LANCTOT M.Dueling network architectures for deep reinforcement learning[C]∥Proceedings of the International Conference on Machine Learning.New York,USA,2016:1995-2003.<br /> [30]SCHAUL T,QUAN J,ANTONOGLOU I,et al. Prioritized experience replay∥Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico,2016:322-355.<br /> [31]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN[C]∥Advances in Neural Information Processing Systems.2016:4026-4034.<br /> [32]HASSELT H,GUEZ A,HESSEL M,et al.Learning functions across many orders of magnitudes[C]∥Proceedings of the Advances in Neural Information Processing Systems.Barcelona,Spain,2016:80-99.<br /> [33]LAKSHMINARAYANAN A,SHARMA S,RAVINDRAN B.Dynamic frame skip deep q network∥Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence.New York,USA,2016.<br /> [34]MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning∥Advances in Neural Information Processing Systems.2016:1054-1062.<br /> [35]FRAN OIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:towards new dynamic strategies∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada,2015:107-1160.<br /> [36]LILLICRAP T,HUNT J,PRITZEL A,et al.Continuous control with deep reinforcement learning.https://arxiv.org/abs.1509.02971.<br /> [37]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms∥Proceedings of the 31st International Conference on Machine Learning.2014:387-395.<br /> [38]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms.https://arxiv.org/abs/1707.06347.<br /> [39]HEESS N,DHRUVA T,SRIRAM S,et al.Emergence of Locomotion Behaviours in Rich Environments .https://ar-xiv.org/abs/1707.02286.<br /> [40]SCHULMAN J,LEVINE S,MORITZ P,et al.Trust RegionPolicy Optimization∥International Conference on Machine Learning.Lille:International Machine Learning Society,2015:1889-1897.<br /> [41]ZHANG T,KAHN G,LEVINE S,et al.Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search∥2016 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2016:528-535.<br /> [42]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarkingdeep reinforcement learning for continuous control∥International Conference on Machine Learning.2016:1329-1338.<br /> [43]BALDUZZI D,GHIFARY M.Compatible Value Gradients forReinforcement Learning of Continuous Deep Policies.https://arxiv.org/abs/1509.03005.<br /> [44]HEESS N,WAYNE G,SILVER D,et al.Learning continuous control policies by stochastic value gradients∥Advances in Neural Information Processing Systems.2015:2944-2952.<br /> [45]MNIH V,BADIA A,MIRZA M,et al.Asynchronous methods for deep reinforcement learning∥International Conference on Machine Learning.2016:1928-1937.<br /> [46]JADERBERG M,MNIH V,CZARNECKI W,et al.Reinforcement learning with unsupervised auxiliary tasks .https://arxiv.org/abs/1611.05397.<br /> [47]FINN C,LEVINE S,ABBEEL P.Guided cost learning:Deep inverse optimal control via policy optimization∥International Conference on Machine Learning.2016:49-58.<br /> [48]OH J,CHOCKALINGAM V,SINGH S,et al.Control of memory,active perception,and action in Minecraft∥Proceedings of the International Conference on Machine Learning.New York,USA,2016:2790-2799.<br /> [49]KULKARNI T,NARASIMHAN K,SAEEDI A,et al.Hierarchical deep reinforcement learning:Integrating temporal abstraction and intrinsic motivation∥Advances in Neural Information Processing Systems.2016:3675-3683.<br /> [50]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:Variationalinformation maximizing exploration∥Advances in Neural Information Processing Systems.2016:1109-1117.<br /> [51]FERN NDEZ F,VELOSO M.Probabilistic policy reuse in areinforcement learning agent∥Proceedings of the InternationalJoint Conference on Autonomous Agents and Multiagent Systems.Istanbul,Turkey,2015:720-727.<br /> [52]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation∥Proceedings of the Conference on Neural Information Processing Systems.Barcelona,Spain,2016:1471-1479.<br /> [53]SCHAUL T,HORGAN D,GREGOR K,et al.Universal value function approximators∥Proceedings of the 32nd International Conference on Machine Learning.Lugano,Switzerland,2015:1312-1320.<br /> [54]LAMPLE G,CHAPLOT D.Playing FPS Games with DeepReinforcement Learning∥AAAI.2017:2140-2146.<br /> [55]KEMPKA M,WYDMUCH M,RUNC G,et al.Vizdoom:Adoom-based ai research platform for visual reinforcement lear-ning∥2016 IEEE Conference on Computational Intelligence and Games (CIG).IEEE,2016:1-8.<br /> [56]VINYALS O,EWALDS T,BARTUNOV S,et al.StarCraft II:A New Challenge for Reinforcement Learning.https://arxiv.org/abs/1708.04782.<br /> [57]ZHU Y,MOTTAGHI R,KOLVE E,et al.Target-driven visual navigation in indoor scenes using deep reinforcement learning∥2017 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2017:3357-3364.<br /> [58]SUTSKEVER I,VINYALS O,LE Q.Sequence to sequence lear-ning with neural networks∥Advances in Neural Information Processing Systems.2014:3104-3112.<br /> [59]LI J,MONROE W,RITTER A,et al.Deep reinforcement lear-ning for dialogue generation.https://arxiv.org/abs/1707.06347.<br /> [60]PARISOTTO E,BA J,SALAKHUTDINOV R.Actor-mimic:deep multitaskand transfer reinforcement learning∥Proceedings of the International Conference on Learning Representations.San Juan,Puerto Rico,2016:156-171.<br /> [61]CHEN X G,YU Y.Reinforcement Learning and Its Application to the Game of Go.Acta Automatica Sinica,2016,42(5):685-695.(in Chinese)<br /> 陈兴国,俞扬.强化学习及其在电脑围棋中的应用.自动化学报,2016,42(5):685-695. |
[1] | XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171. |
[2] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[3] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[4] | TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305. |
[5] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[6] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[7] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[8] | WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293. |
[9] | HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329. |
[10] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[11] | HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78. |
[12] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[13] | HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163. |
[14] | ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169. |
[15] | SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235. |
|