Computer Science ›› 2021, Vol. 48 ›› Issue (6): 168-174.doi: 10.11896/jsjkx.200600133
• Artificial Intelligence • Previous Articles Next Articles
LU Jia-you1, LING Xing-hong1,2, LIU Quan1, ZHU Fei1
CLC Number:
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [2]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [3]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354. [4]SCHMIDHUBER J.Evolutionary principles in self-referentiallearning[D].Munich:Univ.Munich,1987. [5]BENGIO Y,BENGIO S,CLOUTIER J.Learning a synapticlearning rule[C]//IJCNN-91-Seattle International Joint Confe-rence on Neural Networks.IEEE,2002. [6]WANG J X,KURTHNELSON Z,TIRUMALA D,et al.Lear-ning to reinforcement learn[C]//CogSci.2016. [7]DUAN Y,SCHULMAN J,CHEN X,et al.RL2:Fast Reinforcement Learning via Slow Reinforcement Learning[C]//International Conference on Learning Representations.2017. [8]MISHRA N,ROHANINEJAD M,CHEN X,et al.A SimpleNeural Attentive Meta-Learner[C]//International Conference on Learning Representations.2018. [9]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning.2017:1126-1135. [10]GUPTA A,MENDONCA R,LIU Y,et al.Meta-reinforcement learning of structured exploration strategies[C]//Advances in Neural Information Processing Systems.2018:5302-5311. [11]ROTHFUSS J,LEE D,CLAVERA I,et al.ProMP:Proximal Meta-Policy Search[C]//International Conference on Learning Representations.2019. [12]RAJESWARAN A,FINN C,KAKADE S M,et al.Meta-lear-ning with implicit gradients[C]//Advances in Neural Information Processing Systems.2019:113-124. [13]RAKELLY K,ZHOU A,FINN C,et al.Efficient off-policy meta-reinforcement learning via probabilistic context variables[C]//International Conference on Machine Learning.2019:5331-5340. [14]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.2018:1856-1865. [15]ZIEBART B D,MAAS A L,BAGNELL J A,et al.Maximumentropy inverse reinforcement learning[C]//Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence.Chicago,Illinois,USA,2008:13-17. [16]WANG H,ZHOU J,HE X.Learning Context-aware Task Reasoning for Efficient Meta-reinforcement Learning[J].arXiv:2003.01373,2020. [17]MONTAGUE P R.Reinforcement learning:an introduction,by Sutton,RS and Barto,AG[J].Trends in Cognitive Sciences,1999,3(9):360. [18]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[C]//International Conference on Learning Representations.2014. [19]ALEMI A A,FISCHER I,DILLON J V,et al.Deep Variational Information Bottleneck[C]//International Conference on Lear-ning Representations.2017. [20]EYSENBACH B,LEVINE S.If MaxEnt RL is the Answer,What is the Question?[J].arXiv:1910.01913,2019. [21]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.2016:1928-1937. [22]HAARNOJA T,TANG H,ABBEEL P,et al.Reinforcementlearning with deep energy-based policies[C]//Proceedings of the 34th International Conference on Machine Learning.2017:1352-1361. [23]FUJIMOTO S,VAN HOOF H,MEGER D.Addressing Function Approximation Error in Actor-Critic Methods[C]//International Conference on Machine Learning.2018:1582-1591. [24]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//International Conference on Learning Representations.2016. [25]TODOROV E,EREZ T,TASSA Y.Mujoco:A physics engine for model-based control[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2012:5026-5033. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[7] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[8] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[9] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[10] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[11] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[12] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[13] | LIU Yang, LI Fan-zhang. Fiber Bundle Meta-learning Algorithm Based on Variational Bayes [J]. Computer Science, 2022, 49(3): 225-231. |
[14] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[15] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
|