计算机科学 ›› 2014, Vol. 41 ›› Issue (6): 239-242.doi: 10.11896/j.issn.1002-137X.2014.06.047

• 人工智能 • 上一篇    下一篇

基于Tile Coding编码和模型学习的Actor-Critic算法

金玉净,朱文文,伏玉琛,刘全   

  1. 苏州大学计算机科学与技术学院 苏州215006;苏州大学计算机科学与技术学院 苏州215006;苏州大学计算机科学与技术学院 苏州215006;苏州大学计算机科学与技术学院 苏州215006
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61070122,4,61070223,61103045),江苏省自然科学基金(BK2009116),江苏省高校自然科学研究项目(09KJA520002)资助

Actor-Critic Algorithm Based on Tile Coding and Model Learning

JIN Yu-jing,ZHU Wen-wen,FU Yu-chen and LIU Quan   

  • Online:2018-11-14 Published:2018-11-14

摘要: Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状态和动作的编码方法以及参数对Actor-Critic方法有重要的影响。Tile Coding编码具有简单易用、计算时间复杂度较低等优点,因此,将Tile Coding编码与基于模型的Actor-Critic方法结合,并将所得算法应用于强化学习仿真实验。实验结果表明,所得算法具有较好的性能。

关键词: 强化学习,Tile Coding,Actor-Critic,模型学习,函数逼近 中图法分类号TP181文献标识码A

Abstract: The Actor-Critic(AC) approach is a class of reinforcement learning method which has good performance and ensures convergence,but the Agent does not study the dynamic of environment in the process of learning and improving policy,which causes the performance of the AC method to be restricted to a certain extent.In addition,the AC method needs to represent the policy and value function approximately,and the encoding methods of state and action and para-meters have important influence on AC method.Tile Coding has advantages of simple and low computing time complexity,so we combined the Tile Coding with Actor-Critic method based on model and applied the algorithm to the simulation experiment on reinforcement learning,and the results show that the algorithm has good performance.

Key words: Reinforcement learning,Tile Coding,Actor-Critic,Model learning,Function approximation

[1] Sutton R S,Barto A G.Reinforcement Learning:An Introduc-tion[M].MIT Press,1998
[2] Busoniu L,Babuska R,DeSchutter B,et al.ReimforcementLearning and Dynamic Programming Using Function Approximators[M].Boca Raton,FL:CRC Press,2010
[3] Grondman I,Bus,oniu L,et al.A Survey of Actor-Critic Rein-forcement Learning:Standard and Natural Policy Gradients[J].IEEE Transactions on Systems,Man,and Cybernetics—Part C:Applications and Reviews,2012,42(6):1291-1307
[4] Barto A G,Sutton R S,Anderson C W.Neuronlike Adaptive E-lement That Can Solve Difficult Learning Control Problems[J].IEEE Trans Syst Man Cybern,1983,13:834-846
[5] Konda V R,Tsitsiklis J N.Actor-Critic Algorithms [C]∥Proceedings of Advances in Neural Information Processing Systems.2000 (下转第249页)(上接第242页)
[6] Rosenstein M T,Barto A G.Supervised Learning Combinedwith an Actor-Critic Architecture[J].CMPSCI Technical Report 02-41.October 2002
[7] Peters J,Schaal S.Natural actor-critic[J].Neurocomputing,2008,71(7-9):1180-1190
[8] Bathnagar S,Sutton R S,Ghavamzadeh M,et al.Natural actor-critic algorithms[J].Automatica,2009,45(11):2471-2482
[9] Vamvoudakis K G,Lewis F L.Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J].Automatica,2010,46(5):878-888
[10] Grondman I,Vaandrager M,Busoniu L,et al.Efficient ModelLearning Methods for Actor-Critic Control[J].IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics,2012,42(3):591-602
[11] Grondman I,Vaandrager M,Busoniu L,et al.Actor-Critic Control with Reference Model Learning[C]∥Proceedings of the 18th IFAC World Congress.Milan,Italy,2011:14723-14728
[12] Kuvayev L,Sutton R.Model-Based Reinforcement Learningwith an Approximate,Learned Model[C]∥Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems.1996:101-105
[13] Goschin W S,Littman M.Integrating sample-based planning and model-based reinforcement learning[C]∥Proc.Assoc.Adv.Artif.Intell..Atlanta,GA,2010:612-617
[14] Santamaria J,Sutton R,Ram A.Experiments with reinforcement learning in problems with continuous state and action spaces[J].Adaptive Bechavior,1998,6:163-138
[15] Sherstov A A,Stone P.Function Approximation via Tile Co-ding:Automating parameter choice[C]∥Zucker J-D,Saitta L,eds.SARA,volume 3607of Lecture Notes in Computer Science.Springer,2005:194-205
[16] Lanzi P L,Loiacono D,Wilson S W,et al.Classifier Prediction based on Tile Coding[C]∥Proceedings of the 2006Geneticand Evolutionary Computation Conference Workshop Program(GECCO 2006).Seattle,Washington,2006:1497-1504

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!