Computer Science ›› 2020, Vol. 47 ›› Issue (7): 179-185.doi: 10.11896/jsjkx.190500143
• Artificial Intelligence • Previous Articles Next Articles
ZHENG Shuai, LUO Fei, GU Chun-hua, DING Wei-chao, LU Hai-feng
CLC Number:
[1]MAO H,ALIZADEH M,MENACHE I,et al.Resource Management with Deep Reinforcement Learning[C]//ACM Workshop on Hot Topics in Networks.ACM,2016:50-56. [2]BRAFMAN R I,TENNENHOLTZ M.R-max-a general polynomial time algorithm for near-optimal reinforcement learning[J].Journal of Machine Learning Research,2002,3(Oct):213-231. [3]WATKINS C J C H,DAYAN P.Technical Note:Q-Learning[J].Machine Learning,1992,8(3/4):279-292. [4]BERTSEKAS D P.Stable optimal control and semicontractive dynamic programming[J].SIAM Journal on Control and Optimization,2018,56(1):231-252. [5]SCHEFFLER K,YOUNG S.Automatic learning of dialoguestrategy using dialogue simulation and reinforcement learning[C]//Proceedings of the second international conference on Human Language Technology Research.San Francisco:Morgan Kaufmann Publishers Inc,2002:12-19. [6]YANG G S,CHEN E K,AN C W.Mobile robot navigation using neural Q-learning[C]//Proceedings of 2004 International Conference on Machine Learning and Cybernetics.New York:IEEE,2004:48-52. [7]WANG Y C,USHER J M.Application of reinforcement learning for agent-based production scheduling[J].Engineering Applications of Artificial Intelligence,2005,18(1):73-82. [8]DJONIN D V,KRISHNAMURTHY V.Q-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies:Application to MIMO Transmission Control[J].IEEE Transactions on Signal Processing,2007,55(5):2170-2181. [9]EVEN-DAR E,MANSOUR Y.Learning rates for Q-learning[J].Journal of Machine Learning Research,2003,5(Dec):1-25. [10]SZEPESVÁRI C.The asymptotic convergence-rate of Q-lear-ning[C]//Advances in Neural Information Processing Systems.MIT Press,1998:1064-1070. [11]HASSELT H V.Double Q-learning[C]//Advances in Neural Information Processing Systems.MIT Press,2010:2613-2621. [12]GHAVAMZADEH M,KAPPEN H J,AZAR M G,et al.Speedy Q-learning[C]//Advances in Neural Information Processing Systems.MIT Press,2011:2411-2419. [13]LEE D,POWELL W B.An intelligent battery controller using bias-corrected Q-learning[C]//Twenty-Sixth AAAI Conference on Artificial Intelligence.AAAI Press,2012:316-322. [14]STREHL A L,LI L,WIEWIORA E,et al.PAC model-free reinforcement learning[C]//Proceedings of the 23rd International Conference on Machine Learning.ACM,2006:881-888. [15]D’ERAMO C,RESTELLI M,NUARA A.Estimating maximum expected value through gaussian approximation[C]//International Conference on Machine Learning.2016:1032-1040. [16]LE CUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436. [17]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Thirtieth AAAIConfe-rence on Artificial Intelligence.2016:2094-2100. [18]ZHANG Z,PAN Z,KOCHENDERFER M J.Weighted Double Q-learning[C]//International Joint Conferences on Artificial Intelligence Organization.2017:3455-3461. [19]BELLMAN R.Dynamic programming[J].Science,1966,153(3731):34-37. [20]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT Press,1998:107-127. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[7] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[8] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[9] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[10] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[11] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[12] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[13] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[14] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
[15] | HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352. |
|