Computer Science ›› 2022, Vol. 49 ›› Issue (3): 239-245.doi: 10.11896/jsjkx.201200173
• Artificial Intelligence • Previous Articles Next Articles
ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai
CLC Number:
[1]XIONG H,DIAO X.Safety robustness of reinforcement learning policies:A view from robust control[J].Neurocomputing,2021,422:12-21. [2]WATKINS C,DAYAN P.Technical note:Q-learning[J].Machine Learning,1992,8314:279-292. [3]SONG Z,HOFMANN H,LI J,et al.Optimization for a hybrid energy storage system in electric vehicles using dynamic programing approach[J].Applied Energy,2015,139:151-162. [4]DANN C,NEUMANN G,PETERS J.Policy evaluation withtemporal differences:A survey and comparison[J].Journal of Machine Learning Research,2014,15(1):809-883. [5]SCHILPEROOT J,MAK I,DRUGAN M,et al.Learning toPlay Pac-Xon with Q-Learning and Two Double Q-Learning Variants[C]//Institute of Electrical and Electronics Engineers Inc.IEEE Symposium Series on Computational Intelligence (SSCI).Bangalore,India,2018:1151-1158. [6]GHESHLAGHI A M,MUNOS R,GHAVAMZADEH M,et al.Speedy Q-learning[C]//Curran Associates Inc.Advances in Neural Information Processing Systems (NIPS).Granada,Spain,2011:2411-2419. [7]VAN H H.Double Q-learning[C] //Curran Associates Inc.Advances in Neural Information Processing Systems(NIPS).Canada,2010:2613-2621. [8]ZHENG S,LUO F,GU C H,et al.Double Speedy Q Learning[J].Computer Science,2020,47(7):179-185. [9]HADJIDIMOS A.Successive overrelaxation (SOR) and related methods[J].Journal of Computational and Applied Mathema-tics,2000,123(1):177-199. [10]SAITO M,MASUDA K,KURIHARA K.A flexible Q-relear-ning method to accelerate learning under the change of environments by reusing a portion of useful policies[C] //Society of Instrument and Control Engineers (SICE).Akita,Japan,2012:1223-1227. [11]JANG B,KIM M,HARERIMANA G,et al.Q-Learning Algorithms:A Comprehensive Classification and Applications[J].IEEE Access,2019,7:133653-133667. [12]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learningar[J].arXiv:1312.5602,2013. [13]WANG F,ZHANG J J,ZHENG X,et al.Where does AlphaGo go:From church-turing thesis to AlphaGo thesis and beyond[J].IEEE/CAA Journal of Automatica Sinica,2016,3(2):113-120. [14]PENG J,WILLIAMS R J.Incremental multi-step Q-learning[J].Machine Learning,1996,22(1/2/3):283. [15]HU J,WELLMAN M P.Nash Q-learning for general-sum stochastic games[J].Journal of Machine Learning Research,2004,4(6):1039-1069. [16]FANG Z,WANG J,JIANG C,et al.QLACO:Q-learning Aided Ant Colony Routing Protocol for Underwater Acoustic Sensor Networks[C] //Republic of:Institute of Electrical and Electronics Engineers Inc.IEEE Wireless Communications and Networking Conference (WCNC).Seoul,Korea,2020:1-6. [17]CUI R,WEIM.Research and application of Successive Over-Relaxation Iterative Algorithm[C]//Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology.2011:3856-3858. [18]REETZ D.Solution of a Markovian decision problem by successive overrelaxation[J].Zeitschrift für Operations Research,1973,17(1):29-32. [19]TOMMI J,MICHAEL I,SATINDER P.Convergence of Sto-chastic Iterative Dynamic Programming Algorithms [J].Neural Computation,1994,6:1185-1201. [20]JOHN I,KAMANCHI C,BHATNAGAR S.Generalized Speedy Q-Learning[J].IEEE Control Systems Letters,2020,4(3):524-529. [21]KAMANCHI C,DIDDIGI R B,BHATNAGAR S.SuccessiveOver-Relaxation Q-Learning[J].IEEE Control Systems Letters,2020,4(1):55-60. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[3] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[7] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[8] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[9] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[10] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[11] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[12] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[13] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
[14] | HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352. |
[15] | HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40. |
|