Computer Science ›› 2019, Vol. 46 ›› Issue (6): 212-217.doi: 10.11896/j.issn.1002-137X.2019.06.032
Previous Articles Next Articles
LI Jian-guo1, ZHAO Hai-tao1, SUN Shao-yuan2
CLC Number:
[1]MOUSAVI S S,SCHUKAT M,HOWLEY E.Deep Reinforcement Learning:An Overview[C]∥Sai Intelligent Systems Conference.Cham:Springer,2016:426-440. [2]WU J,HE H,PENG J,et al.Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus[J].Applied Energy,2018,222:799-811. [3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-532. [4]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search.[J].Nature,2016,529(7587):484-489. [5]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]∥Proceedings of International Conference on Machine Learning.PMLR,2016:1995-2003. [6]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]∥International Conference on International Conference on Machine Learning.JMLR.org,2016:1329-1338. [7]SONG R,LEWIS F L,WEI Q.Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games[J].IEEE Transactions on Neural Networks & Learning Systems,2016,28(3):704. [8]GU S,HOLLY E,LILLICRAP T,et al.Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates[C]∥International Conference on Robotics and Automation.New York:IEEE Press,2017:3389-3396. [9]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):A187. [10]PANNE M V D,PANNE M V D,PANNE M V D,et al.DeepLoco:dynamic locomotion skills using hierarchical deep reinforcement learning[J].Acm Transactions on Graphics,2017,36(4):41. [11]YUAN C L,RADULESCU A,DANIEL R,et al.Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments[J].Neuron,2017,93(2):451-463. [12]ZHAO D,WANG B,LIU D.A supervised Actor-Critic approach for adaptive cruise control[J].Soft Computing,2013,17(11):2089-2099. [13]THOMAS P S,BRUNSKILL E.Data-efficient off-policy policy evaluation for reinforcement learning[C]∥International Conference on Machine Learning.JMLR.org,2016:2139-2148. [14]CHEN X G,GAO Y,FAN S G,et al.Kernel-Based Continous-Action Actor-Critic Learning[J].Patten Recognition and Artificial Intelligence,2014,27(2):103-110.(in Chinese) 陈兴国,高阳,范顺国,等.基于核方法的连续动作Actor-Critic学习[J].模式识别与人工智能,2014,27(2):103-110. [15]VAMVOUDAKIS K G,LEWIS F L.Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem[J].Automatica,2010,46(5):878-888. [16]LEVINE S,FINN C,DARRELL T,et al.End-to-end training of deep visuomotor policies[J].Journal of Machine Learning Research,2015,17(1):1334-1373. [17]JOEL D,NIV Y,RUPPIN E.Actor-critic models of the basal ganglia:new anatomical and computational perspectives[J].Neural Networks,2002,15(4):535-547. [18]FILIPPI S,CAPPÉ O,GARIVIER A.Optimism in reinforce-ment learning and Kullback-Leibler divergence[C]∥Communication,Control,and Computing.IEEE,2011:115-122. [19]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):A187. [20]SCHULMAN J,LEVINE S,MORITZ P,et al.Trust Region Policy Optimization[C]∥Proceedings of International Conference on Machine Learning.PMLR,2015:1889-1897. [21]YUAN C L,RADULESCU A,DANIEL R,et al.Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments[J].Neuron,2017,93(2):451-463. [22]CHEN X,YANG G,WANG R.Online Selective Kernel-Based Temporal Difference Learning[J].IEEE Transactions on Neural Networks & Learning Systems,2013,24(12):1944-1950. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[7] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[8] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[9] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[10] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[11] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[12] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[13] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[14] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
[15] | HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352. |
|