Computer Science ›› 2020, Vol. 47 ›› Issue (12): 210-217.doi: 10.11896/jsjkx.191100084
Previous Articles Next Articles
LI Bin1, LIU Quan1,2,3,4
CLC Number:
[1] MOERLAND T M,BROEKENS J,JONKER C M.Emotion in reinforcement learning agents and robots:a survey[J].Machine Learning,2018,107(2):4480. [2] LIU T,TIAN B,CAO D,et al.Parallel Reinforcement Lear-ning:A Framework and Case Study[J].IEEE/CAA Journal of Automatica Sinica,2018,5(4):65-73. [3] DU W,DING S F.Overview on Multi-agent Reinforcement Lear-ning[J].Computer Science,2019,46(8):1-8. [4] ZHAO X Y,DING S F.Research on Deep Reinforcement Lear-ning[J].Computer Science,2018,45(7):1-6. [5] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [6] SUTTON R S,BARTO A.Reinforcement Learning:An Intro-duction[M].MIT Press,2019. [7] DEGRIS T.PILARSKI P M.SUTTON R S.Model-free rein-forcement learning with continuous action in practice[C]//Proceedings of 2012 American Control Conference.Montreal,QC,Canada,2012:2177-2182. [8] NEDIC' A,BERTSEKAS D.Convergence Rate of IncrementalSubgradient Algorithms[J].Stochastic Optimization:Algorithms and Applications,2001,54:223. [9] LI L,WILLIAMS J D,BALAKRISHNAN S.Reinforcementlearning for dialog management using least-squares policy iteration and fast feature selection[C]//Proceedings of the 10th Annual Conference of the International Speech Communation Association.Brighton,UK,2009. [10] WOOKEY D S,KONIDARIS G D.Regularized feature selection in reinforcement learning[J].Machine Learning,2015,100(2/3):655-676. [11] LAGOUDAKIS M G,PARR R.Least-Squares Policy Iteration[J].Journal of Machine Learning Research,2004,4(6):1107-1149. [12] JUNG T,POLANI D.Least squares SVM for least squares TD learning[C]//Procedings of the 17th European Conference on Artificial Intelligence.Riva del Garda,Italy,2006. [13] WANG J K,LIN S D.Parallel Least-Squares Policy Iteration[C]//2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).2016:166-173. [14] ZHOU X,LIU Q,FU Q M,et al.Batch Least-squares PolicyIteration[J].Computer Science,2014,41(9):232-238. [15] GEORGE J A,SHALABH B.An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method[J].Machine Learning,2018,107(8/9/10):1385-1429. [16] GEIST M,PIETQUIN O.Parametric value function approximation:A unified view[C]//Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.Piscataway,USA,2011. [17] BUSONIU L BRUINB T D,TOLICD,et al.ReinforcementLearning for Control:Performance,Stability,and Deep Appro-ximators[J].Annual Reviews in Control,2018,46:8-28. [18] JIN Y J,ZHU W W,FU Y C,et al.Actor-Critic Algorithm Based on Tile Coding and Model Learning[J].Computer Scien-ce,2014,41(6):239-242,249. [19] VAN SEIJEN H,MAHMOOD A R,PILARSKI P M,et al.True Online Temporal-Difference Learning[J].Journal of Machine Learning esearch,2015,17(1):5057-5096. [20] GRONDMAN I,BUSONIU L,LOPES G A D,et al.A Survey of ActorCritic Reinforcement Learning:Standard and Natural Policy Gradients[J].IEEE Transactions on Systems,Man,and Cybernetics,Part C (Applications and Reviews),2012,42(6):1291-1307. [21] GHORBANI F,DERHAMI V,AFSHARCHI M.Fuzzy Least Square Policy Iteration and Its Mathematical Analysis[J].International Journal of Fuzzy Systems,2017,19(3):849-862. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[7] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[8] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[9] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[10] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[11] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[12] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[13] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[14] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
[15] | HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352. |
|