Computer Science ›› 2019, Vol. 46 ›› Issue (5): 169-174.doi: 10.11896/j.issn.1002-137X.2019.05.026
Previous Articles Next Articles
LI Jie1,2, LING Xing-hong1,2, FU Yu-chen1,2, LIU Quan1,2,3,4
CLC Number:
[1]YU K,JIA L,CHEN Y Q,et al.Deep learning:yesterday,today,and tomorrow[J].Journal of computer Research and Deve-lopment,2013,50(9):1799-1804.(in Chinese)余凯,贾磊,陈雨强,等.深度学习的昨天、今天和明天[J].计算机研究与发展,2013,50(9):1799-1804. [2]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998. [3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA,2013:201-220. [4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [5]WATKINS C J C H.Learning from Delayed Rewards[J].Robotics & Autonomous Systems,1989,15(4):233-235. [6]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥Association for the Advance of Artificial Intelligence.2016:2094-2100. [7]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay∥Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico,2016:322-355. [8]RUMMERY G A,NIRANJAN M.On-line Q-learning usingconnectionist systems[D].Cambridge:University of Cambridge,1994. [9]SUTTON R S.Generalization in reinforcement learning:suc-cessful examples using sparse coarse coding[C]∥International Conference on Neural Information Processing Systems.MIT Press,1995:1038-1044. [10]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.2016:1928-1937. [11]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate.arXiv:1409.0473,2014. [12]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural ima-ge caption generation with visual attention[C]∥International Conference on Machine Learning.2015:2048-2057. [13]BUSONIU L,BABUSKA R,DE SCHUTTER B,et al.Rein-forcement learning and dynamic programming using function approximators[M].CRC Press,2010. [14]WIERING M,OTTERLO M V.Reinforcement Learning:State-of-the-Art[M].Springer Publishing Company,Incorporated,2012. [15]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]∥Advances in neural information processing systems.2000:1057-1063. [16]KAKADE S.A natural policy gradient[C]∥International Conference on Neural Information Processing Systems:Natural and Synthetic.MIT Press,2001:1531-1538. [17]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]∥International Conference on International Conference on Machine Learning.2014:387-395. [18]KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014. [19]BHATNAGAR S,GHAVAMZADEH M,LEE M,et al.Incremental natural actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2008:105-112. [20]KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014. |
[1] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[2] | DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243. |
[3] | YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91. |
[4] | LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing. RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic [J]. Computer Science, 2020, 47(2): 233-238. |
[5] | JIN Yu-jing,ZHU Wen-wen,FU Yu-chen and LIU Quan. Actor-Critic Algorithm Based on Tile Coding and Model Learning [J]. Computer Science, 2014, 41(6): 239-242. |
|