Computer Science ›› 2021, Vol. 48 ›› Issue (12): 297-303.doi: 10.11896/jsjkx.201000163
• Artificial Intelligence • Previous Articles Next Articles
SHEN Yi1, LIU Quan1,2,3,4
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement Learning:An In- troduction[M].Cambridge,MA:MIT Press,1998:6-22. [2]PARR R,LI L,TAYLOR G,et al.An Analysis of Linear Mo- dels,Linear Value-Function Approximation,and Feature Selection for Reinforcement Learning[C]//International Conference on Machine Learning.2008. [3]KOHL N,STONE P.Policy gradient reinforcement learning for fast quadrupedal locomotion[C]//IEEE International Confe-rence on Robotics & Automation.IEEE,2004. [4]BARTO A G,SUTTON R S,ANDERSON C W.Neuronlike adaptive elements that can solve difficult learning control problems[J].IEEE Transaction on Systems,Man and Cybernetics,1983,13(5):834-846. [5]SEIJEN H V,HASSELT H V,WHITESON S,et al.A theore- tical and empirical analysis of Expected Sarsa[C]//Adaptive Dynamic Programming and Reinforcement Learning,2009.IEEE,2009. [6]KIUMARSI B,LEWIS F L,MODARES H,et al.Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics[J].Automatica,2014,50(4):1167-1175. [7]TANGKARATT V,ABDOLMALEKI A,SUGIYAMA M. Guide Actor-Critic for Continuous Control[J].arXiv:1705.07606,2017. [8]KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNet Classification with Deep Convolutional Neural Networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105. [9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [10]LIU Q,ZHAI J W,ZHANG Z Z,et al.A review of deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27. [11]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[J].Proceedings of the 33nd International Conference on Machine Learning.New York,USA,2016:692-700. [12]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]//Proceedings of theThir-tieth AAAI Conference on Artificial Intelligence.Phoenix,USA,2016:2094-2100. [13]HAUSKNECHT M,STONE P.Deep recurrent q-learning for partially observable mdps[C]//2015 AAAI fall symposium series.2015. [14]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]//Proc. of the 31st Int. Conf. on Machine Learning.New York:ACM,2014:387-395. [15]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015. [16]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po- licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897. [17]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017. [18]HAN S,SUNG Y.Amber:Adaptive multi-batch experience replay for continuous action control[J].arXiv:1710.04423,2017. [19]LIU H,FENG Y,MAO Y,et al.Sample-efficient policy optimization with stein control variate[J].arXiv:1710.11198,2017. [20]LING P,CAI Q P,HUANG L B.Multi-Path Policy Optimization[C]//International Conference on Autonomous Agents and Multi Agent Systems.2020:1001-1009. [21]PAN F,CAI Q,ZENG A X,et al.Policy optimization with mo- del-based explorations[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2019,33:4675-4682. [22]TOUATI A,ZHANG A,PINEAU J,et al.Stable policy optimization via off-policy divergence regularization[C]//Conference on Uncertainty in Artificial Intelligence.PMLR,2020:1328-1337. [23]LI A,FLORENSA C,CLAVERA I,et al.Sub-policy Adaptation for Hierarchical Reinforcement Learning[C]//International Conference on Learning Representations.2019. [24]YOSHIDA N,UCHIBE E,DOYA K.Reinforcement learning with state-dependent discount factor[C]//IEEE Third Joint International Conference on Development & Learning & Epigenetic Robotics.IEEE,2013. [25]FU Q M,LIU Q,SUN H K,et al.A second-order TD Error fast Q(λ) algorithm[J].Pattern Recognition and Artificial Intelligence,2013(3):282-292. [26]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016. [27]TODOROV E,EREZ T,TASSA Y.MuJoCo:A physics engine for model-based control[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2012. [28]DHARIWAL P,HESSE N,MANNING C,et al.OpenAI baselines [OL].GitHub,2017.https://github.com/openai/baselines. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[7] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[8] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[9] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[10] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[11] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[12] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[13] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
[14] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[15] | HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352. |
|