Computer Science ›› 2021, Vol. 48 ›› Issue (10): 37-43.doi: 10.11896/jsjkx.200900208
• Artificial Intelligence • Previous Articles Next Articles
ZHANG Jian-hang1, LIU Quan1,2,3,4
CLC Number:
[1]DORPINGHAUS M,ROLDAN E,NERI I,et al.An information theoretic analysis of sequential decision-making[C]//International Symposium on Information Theory (ISIT).IEEE,2017:3050-3054. [2]QIN Z H,LI N,LIU X T,et al.Overview of Research on Model-free Reinforcement Learning[J].Computer Science,2021,48(3):180-187. [3]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems.2000:1057-1063. [4]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2019,521(7553):436-444. [5]TORRADO R R,BONTRAGER P,TOGEL-IUS J,et al.Deep reinforcement learning for general video game[C]//Conference on Computational Intelligence and Games(CLG).IEEE,2018:1-8. [6]KRETZSHMAR H,SPIES M,SPRUNK C,et al.Socially compliant mobile robot navigation via inverse reinforcement learning[J].The International Journal of Robotics Research,2016,35(11):1289-1307. [7]LAMPLE G,CHAPLOT D S.Playing FPS games with deepreinforcement learning[C]//AAAI Conference on Artificial Intelligence.2017:2140-2146. [8]ZHAO X,ZHANG L,DING Z,et al.Recommendations withnegative feedback via pairwise deep reinforcement learning [C]//Proceedings of the 24th ACM SIGKDD International Confe-rence on Knowledge Discovery & Data Mining.2018:1040-1048. [9]MMIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [10]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].Computer Science,2015,8(6):A187. [11]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Networks,2015,61:85-117. [12]BAI C J,LIU P,ZHAO W,et al.Active Sampling for DeepQ-Learning Based on TD-error Adaptive Correction[J].Journal of Computer Science & Information Systems,2019,56(2):262-280. [13]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015. [14]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.2015:1889-1897. [15]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [16]LEVIN E,PIERACCINI R,ECKERT W.Using Markov deci-sion process for learning dialogue strategies[C]//Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.1998:201-204. [17]GRONDMAN I,BUSONIU L,LOPES G A D,et al.A survey of actor-critic reinforcement learning:standard and natural policy gradients[J].IEEE Transactions on Systems,Man,and Cybernetics,Part C (Applications and Reviews),2012,42(6):1291-1307. [18]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]//Proceedings of the International Conference on Machine Learning.2014:387-395. [19]UHLENBECK G E,ORNSTEIN L S.On the theory of theBrownian motion[J].Physical Review,1930,36(5):823. [20]NOVATI G,KOUMOUTSAKOS P.Remember and forget for experience replay[C]//International Conference on Machine Learning.2019:4851-4860. [21]ZHAO Y N,LIU P,ZHAO W,et al.Twice Sampling Method in Deep Q-Network[J].Acta Automatic Sinica,2019,45(10):1870-1882. |
[1] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[2] | LIU Zhi, CAO Shi-peng, SHEN Yang, YANG Xi. Signal Control of Single Intersection Based on Improved Deep Reinforcement Learning Method [J]. Computer Science, 2020, 47(12): 226-232. |
[3] | ZHANG Hao-yu, XIONG Kai. Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control [J]. Computer Science, 2019, 46(6A): 555-557. |
|