基于Tile Coding编码和模型学习的Actor-Critic算法

doi:10.11896/j.issn.1002-137X.2014.06.047

Abstract

Abstract: The Actor-Critic(AC) approach is a class of reinforcement learning method which has good performance and ensures convergence,but the Agent does not study the dynamic of environment in the process of learning and improving policy,which causes the performance of the AC method to be restricted to a certain extent．In addition,the AC method needs to represent the policy and value function approximately,and the encoding methods of state and action and para-meters have important influence on AC method．Tile Coding has advantages of simple and low computing time complexity,so we combined the Tile Coding with Actor-Critic method based on model and applied the algorithm to the simulation experiment on reinforcement learning,and the results show that the algorithm has good performance.

Key words: Reinforcement learning,Tile Coding,Actor-Critic,Model learning,Function approximation

JIN Yu-jing,ZHU Wen-wen,FU Yu-chen and LIU Quan. Actor-Critic Algorithm Based on Tile Coding and Model Learning[J].Computer Science, 2014, 41(6): 239-242.

References

[1] Sutton R S,Barto A G．Reinforcement Learning:An Introduc-tion[M].MIT Press,1998
[2] Busoniu L,Babuska R,DeSchutter B,et al．ReimforcementLearning and Dynamic Programming Using Function Approximators[M]．Boca Raton,FL:CRC Press,2010
[3] Grondman I,Bus,oniu L,et al．A Survey of Actor-Critic Rein-forcement Learning:Standard and Natural Policy Gradients[J]．IEEE Transactions on Systems,Man,and Cybernetics—Part C:Applications and Reviews,2012,42(6):1291-1307
[4] Barto A G,Sutton R S,Anderson C W．Neuronlike Adaptive E-lement That Can Solve Difficult Learning Control Problems[J]．IEEE Trans Syst Man Cybern,1983,13:834-846
[5] Konda V R,Tsitsiklis J N．Actor-Critic Algorithms [C]∥Proceedings of Advances in Neural Information Processing Systems．2000 (下转第249页)(上接第242页)
[6] Rosenstein M T,Barto A G．Supervised Learning Combinedwith an Actor-Critic Architecture[J]．CMPSCI Technical Report 02-41．October 2002
[7] Peters J,Schaal S．Natural actor-critic[J].Neurocomputing,2008,71(7-9):1180-1190
[8] Bathnagar S,Sutton R S,Ghavamzadeh M,et al．Natural actor-critic algorithms[J]．Automatica,2009,45(11):2471-2482
[9] Vamvoudakis K G,Lewis F L．Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J]．Automatica,2010,46(5):878-888
[10] Grondman I,Vaandrager M,Busoniu L,et al．Efficient ModelLearning Methods for Actor-Critic Control[J]．IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics,2012,42(3):591-602
[11] Grondman I,Vaandrager M,Busoniu L,et al．Actor-Critic Control with Reference Model Learning[C]∥Proceedings of the 18th IFAC World Congress．Milan,Italy,2011:14723-14728
[12] Kuvayev L,Sutton R．Model-Based Reinforcement Learningwith an Approximate,Learned Model[C]∥Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems．1996:101-105
[13] Goschin W S,Littman M．Integrating sample-based planning and model-based reinforcement learning[C]∥Proc．Assoc.Adv．Artif．Intell.．Atlanta,GA,2010:612-617
[14] Santamaria J,Sutton R,Ram A．Experiments with reinforcement learning in problems with continuous state and action spaces[J]．Adaptive Bechavior,1998,6:163-138
[15] Sherstov A A,Stone P．Function Approximation via Tile Co-ding:Automating parameter choice[C]∥Zucker J-D,Saitta L,eds．SARA,volume 3607of Lecture Notes in Computer Science．Springer,2005:194-205
[16] Lanzi P L,Loiacono D,Wilson S W,et al．Classifier Prediction based on Tile Coding[C]∥Proceedings of the 2006Geneticand Evolutionary Computation Conference Workshop Program(GECCO 2006)．Seattle,Washington,2006:1497-1504

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Actor-Critic Algorithm Based on Tile Coding and Model Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0