Keepaway抢球任务中基于策略重用的迁移学习算法

doi:10.11896/j.issn.1002-137X.2015.04.038

Abstract

Abstract: In RoboCup Keepaway task,players can gain good high-level strategy with reinforcement learning.However,as Keepaway tasks have very huge state space,normal reinforcement learning requires a great many searching steps to converge,and needs very long time.To solve this problem,for 5v4 scale Keepaway task,policy reuse technique is applied to the reinforcement learning procedure of takers’ high-level decision to achieve transfer learning.The transferring plan along with the map of state and action space between 4v3 and 5v4 task were rationally designed.Then a policy reuse based algorithm was stated.Experiments show that after the same training time for 5v4 scale task,takers get shorter task finish time and higher stealing success rate during transfer learning than in normal reinforcement learning.So there are better policies learned by transfer learning.Transfer learning needs much less training time than normal reinforcement learning to get the same policy level.

Key words: RoboCup soccer,Keepaway, Stealing police,Policy reuse,Transfer learning

LI Xue-jun, CHEN Shi-yang, ZHANG Yi-wen and LI Long-shu. Transfer Learning Algorithm between Keepaway Tasks Based on Policy Reuse[J].Computer Science, 2015, 42(4): 190-193.

References

[1] Chen M,Klaus D,Ehsan F.User Manual:RoboCup Soccer ServerManual for Soccer Server Version 7.07 and Later[EB/OL].http://sourceforge.net/projects/sserver/files
[2] Stone P,Kuhlmann G,Taylor M E,et al.Keepaway Soccer:from Machine Learning Testbed to Benchmark[M].RoboCup 2005:Robot Soccer World Cup IX .Berlin:Springer Verlag,2006:93-105
[3] Stone P,Sutton R S,Kuhlmann G.Reinforcement Learning for RoboCup Soccer Keepaway [J].Adaptive Behavior,2005,13(3):165-188
[4] 左国玉,张红卫,韩光胜.基于多智能体强化学习的新强化函数设计[J].控制工程,2009,16(2):239-242
[5] Sutton R S,Barto A G.Reinforcement Learning:an Introduction [M].Cambridge,MA:The MIT Press,2012
[6] Taylor M,Stone P,Liu Y.Transfer Learning via Inter-taskMappings for Temporal Difference Learning [J].Journal of Machine Learning Research,2007,8(1):2125-2167
[7] Fernández F,García J,Veloso M.Probabilistic Policy Reuse for Inter-task Transfer Learning [J].Robotics and Autonomous Systems,2010,58(7):866-871
[8] Fernández F,Veloso M.Probabilistic Policy Reuse in a Rein-forcement Learning Agent[C]∥Nakashima H,Wellman M.AAMAS’06 Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2006:720-727
[9] Rummery G A,Niranjan M.On-Line Q-learning using Connectionist Systems[R].Cambridge,England:Cambridge University Engineering Department,1994
[10] Walsh T J,Li L,Littman M.Transferring State ions between MDPs[C]∥Proceedings of the ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006
[11] Taylor M E,Stone P.Behavior Transfer for Value-function-based Reinforcement Learning[C]∥Pechoucek M.The Fourth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2005:53-59
[12] Fernández F,Veloso M.Policy Reuse for Transfer LearningAcross Tasks with Different State and Action Spaces[C]∥ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006
[13] Riedmiller M,Gabel T,Hafner R.Reinforcement Learning forRobot Soccer[J].Autonomous Robots,2009,27(1):55-73
[14] Gabel T,Riedmiller M.On Progress in RoboCup:the Simulation League Showcase in RoboCup 2010:Robot Soccer World Cup XIV[M].Berlin:Springer Verlag,2011:36-47
[15] Kalyanakrishnan S,Stone P.Characterizing ReinforcementLearning Methods through Parameterized Learning Problems [J].Machine Learning,2011,84(1/2):205-247
[16] Sherstov A A,Stone P.Function Approximation via Tile Co-ding:Automating Parameter Choice in Abstaction,Reformulation and Approximation [M].Berlin:Springer Verlag,2005:194-205
[17] Stone P,Sutton R S.Scaling Reinforcement Learning towardRoboCup Soccer[C]∥the Eighteenth International Conference on Machine Learning.Massachusetts:Williamstown,2001:537-544
[18] 程显毅,朱倩.一种改进的强化学习方法在RoboCup中的应用[J].广西师范大学学报:自然科学版,2010,28(3):99-102
[19] 刘春阳,谭应清,柳长安.多智能体强化学习在足球机器人中的研究与应用 [J].电子学报,2010,38(8):1958-1962
[20] 李瑾,刘全,杨旭东.一种改进的平均奖赏强化学习方法在Robo-Cup训练中的应用[J].苏州大学学报,2012,8(2):21-26

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Transfer Learning Algorithm between Keepaway Tasks Based on Policy Reuse

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0