计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 190-193.doi: 10.11896/j.issn.1002-137X.2015.04.038

• 人工智能 • 上一篇    下一篇

Keepaway抢球任务中基于策略重用的迁移学习算法

李学俊,陈士洋,张以文,李龙澍   

  1. 安徽大学计算机科学与技术学院 合肥230601,安徽大学计算机科学与技术学院 合肥230601,安徽大学计算机科学与技术学院 合肥230601,安徽大学计算机科学与技术学院 合肥230601
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受安徽省自然科学基金项目(1408085MF132),安徽大学青年骨干教师培养(02303301)资助

Transfer Learning Algorithm between Keepaway Tasks Based on Policy Reuse

LI Xue-jun, CHEN Shi-yang, ZHANG Yi-wen and LI Long-shu   

  • Online:2018-11-14 Published:2018-11-14

摘要: 在RoboCup Keepaway中,球员使用强化学习能获得很好的高层策略。然而由于Keepaway任务的状态空间巨大,强化学习需要探索很多步才能收敛,学习过程十分耗时。针对这一问题,对于5v4规模的Keepaway任务,将策略重用技术应用于抢球球员高层决策的强化学习中,以实现迁移学习。首先合理设计了球员在4v3和5v4任务间的迁移学习方案及状态与动作空间的映射,然后提出了基于策略重用的迁移学习算法。实验表明,对于5v4任务,在训练时间约束下,迁移学习比强化学习获得了更短的任务完成时间和更高的抢断成功率,从而学习到了较优的高层策略。因此,为达到相同策略水平,迁移学习所需的训练时间明显比强化学习少。

关键词: 机器人足球,Keepaway,抢球策略,策略重用,迁移学习

Abstract: In RoboCup Keepaway task,players can gain good high-level strategy with reinforcement learning.However,as Keepaway tasks have very huge state space,normal reinforcement learning requires a great many searching steps to converge,and needs very long time.To solve this problem,for 5v4 scale Keepaway task,policy reuse technique is applied to the reinforcement learning procedure of takers’ high-level decision to achieve transfer learning.The transferring plan along with the map of state and action space between 4v3 and 5v4 task were rationally designed.Then a policy reuse based algorithm was stated.Experiments show that after the same training time for 5v4 scale task,takers get shorter task finish time and higher stealing success rate during transfer learning than in normal reinforcement learning.So there are better policies learned by transfer learning.Transfer learning needs much less training time than normal reinforcement learning to get the same policy level.

Key words: RoboCup soccer,Keepaway, Stealing police,Policy reuse,Transfer learning

[1] Chen M,Klaus D,Ehsan F.User Manual:RoboCup Soccer ServerManual for Soccer Server Version 7.07 and Later[EB/OL].http://sourceforge.net/projects/sserver/files
[2] Stone P,Kuhlmann G,Taylor M E,et al.Keepaway Soccer:from Machine Learning Testbed to Benchmark[M].RoboCup 2005:Robot Soccer World Cup IX .Berlin:Springer Verlag,2006:93-105
[3] Stone P,Sutton R S,Kuhlmann G.Reinforcement Learning for RoboCup Soccer Keepaway [J].Adaptive Behavior,2005,13(3):165-188
[4] 左国玉,张红卫,韩光胜.基于多智能体强化学习的新强化函数设计[J].控制工程,2009,16(2):239-242
[5] Sutton R S,Barto A G.Reinforcement Learning:an Introduction [M].Cambridge,MA:The MIT Press,2012
[6] Taylor M,Stone P,Liu Y.Transfer Learning via Inter-taskMappings for Temporal Difference Learning [J].Journal of Machine Learning Research,2007,8(1):2125-2167
[7] Fernández F,García J,Veloso M.Probabilistic Policy Reuse for Inter-task Transfer Learning [J].Robotics and Autonomous Systems,2010,58(7):866-871
[8] Fernández F,Veloso M.Probabilistic Policy Reuse in a Rein-forcement Learning Agent[C]∥Nakashima H,Wellman M.AAMAS’06 Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2006:720-727
[9] Rummery G A,Niranjan M.On-Line Q-learning using Connectionist Systems[R].Cambridge,England:Cambridge University Engineering Department,1994
[10] Walsh T J,Li L,Littman M.Transferring State ions between MDPs[C]∥Proceedings of the ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006
[11] Taylor M E,Stone P.Behavior Transfer for Value-function-based Reinforcement Learning[C]∥Pechoucek M.The Fourth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2005:53-59
[12] Fernández F,Veloso M.Policy Reuse for Transfer LearningAcross Tasks with Different State and Action Spaces[C]∥ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006
[13] Riedmiller M,Gabel T,Hafner R.Reinforcement Learning forRobot Soccer[J].Autonomous Robots,2009,27(1):55-73
[14] Gabel T,Riedmiller M.On Progress in RoboCup:the Simulation League Showcase in RoboCup 2010:Robot Soccer World Cup XIV[M].Berlin:Springer Verlag,2011:36-47
[15] Kalyanakrishnan S,Stone P.Characterizing ReinforcementLearning Methods through Parameterized Learning Problems [J].Machine Learning,2011,84(1/2):205-247
[16] Sherstov A A,Stone P.Function Approximation via Tile Co-ding:Automating Parameter Choice in Abstaction,Reformulation and Approximation [M].Berlin:Springer Verlag,2005:194-205
[17] Stone P,Sutton R S.Scaling Reinforcement Learning towardRoboCup Soccer[C]∥the Eighteenth International Conference on Machine Learning.Massachusetts:Williamstown,2001:537-544
[18] 程显毅,朱倩.一种改进的强化学习方法在RoboCup中的应用[J].广西师范大学学报:自然科学版,2010,28(3):99-102
[19] 刘春阳,谭应清,柳长安.多智能体强化学习在足球机器人中的研究与应用 [J].电子学报,2010,38(8):1958-1962
[20] 李瑾,刘全,杨旭东.一种改进的平均奖赏强化学习方法在Robo-Cup训练中的应用[J].苏州大学学报,2012,8(2):21-26

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!