Computer Science ›› 2022, Vol. 49 ›› Issue (5): 179-185.doi: 10.11896/jsjkx.210300084
• Artificial Intelligence • Previous Articles Next Articles
ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang
CLC Number:
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [2]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Maste-ring the game of Go without human knowledge[J].Nature,2017,550(7676):354-359. [3]KOBER J,BAGNELL J A,PETERS J.Reinforcement learning in robotics:A survey[J].2013,32(11):1238-1274. [4]GREGURI M,VUJI M,ALEXOPOULOS C,et al.Application of Deep Reinforcement Learning in Traffic Signal Control:An Overview and Impact of Open Traffic Data[J].Applied Sciences,2020,10(11):4011-4036. [5]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[C]//International Conference on Learning Representations.2016. [6]LIN L J.Self-improving reactive agents based on reinforcement learning,planning and teaching[J].Machine Learning,1992,8(3/4):293-321. [7]ZHAO Y N,LIU P,ZHAO W,et al.Twice Sampling Method in Deep Q-network[J].Acta Automatica Sinica,2019,45(10):1870-1882. [8]CAO X,WAN H,LIN Y,et al.High-Value Prioritized Expe-rience Replay for Off-Policy Reinforcement Learning[C]//2019 IEEE 31st International Conference on Tools with Artificial Intelligence.IEEE,2019:1510-1514. [9]ZHU F,WU W,LIU Q,et al.A Deep Q-Network Method Based on Upper Confidence Bound Experience Sampling[J].Journal of Computer Research and Development,2018,55(8):1694-1705. [10]NOVATI G,KOUMOUTSAKOS P.Remember and forget for experience replay[C]//International Conference on Machine Learning.2019:4851-4860. [11]SUN P,ZHOU W,LI H.Attentive Experience Replay[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5900-5907. [12]BU F,CHANG D E.Double Prioritized State Recycled Expe-rience Replay[C]//IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia).2020:1-6. [13]BRUIN T D,KOBER J,TUYLS K,et al.Experience Selection in Deep Reinforcement Learning for Control[J].Journal of Machine Learning Research,2018,19:1-56. [14]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[EB/OL].https://arxiv.org/abs/1606.01540. [15]SUTTON R,BARTO A.Reinforcement learning:An introduction[M].Massachusetts:MIT press,2018. [16]LIU Q,ZHAI J W,ZHANG Z C,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,41(1):1-27. [17]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.2018:1861-1870. [18]WU H L,CAI L C,GAO X.Online pheromone stringency gui-ding heuristically accelerated Q-learning[J].Application Research of Computers,2018,35(8):2323-2327. [19]HUANG Z Y,WU H L,WANG Z,et al.DQN Algorithm Based on Averaged Neural Network Parameters[J].Computer Science,2021,48(4):223-228. [20]TODOROV E,EREZ T,TASSA Y.Mujoco:A physics engine for model-based control[C]//International Conference on Intelligent Robots and Systems.2012:5026-5033. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[7] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[8] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[9] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[10] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[11] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[12] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[13] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
[14] | HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352. |
[15] | AO Tian-yu, LIU Quan. Upper Confidence Bound Exploration with Fast Convergence [J]. Computer Science, 2022, 49(1): 298-305. |
|