计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 190-193.doi: 10.11896/j.issn.1002-137X.2015.04.038
李学俊,陈士洋,张以文,李龙澍
LI Xue-jun, CHEN Shi-yang, ZHANG Yi-wen and LI Long-shu
摘要: 在RoboCup Keepaway中,球员使用强化学习能获得很好的高层策略。然而由于Keepaway任务的状态空间巨大,强化学习需要探索很多步才能收敛,学习过程十分耗时。针对这一问题,对于5v4规模的Keepaway任务,将策略重用技术应用于抢球球员高层决策的强化学习中,以实现迁移学习。首先合理设计了球员在4v3和5v4任务间的迁移学习方案及状态与动作空间的映射,然后提出了基于策略重用的迁移学习算法。实验表明,对于5v4任务,在训练时间约束下,迁移学习比强化学习获得了更短的任务完成时间和更高的抢断成功率,从而学习到了较优的高层策略。因此,为达到相同策略水平,迁移学习所需的训练时间明显比强化学习少。
[1] Chen M,Klaus D,Ehsan F.User Manual:RoboCup Soccer ServerManual for Soccer Server Version 7.07 and Later[EB/OL].http://sourceforge.net/projects/sserver/files [2] Stone P,Kuhlmann G,Taylor M E,et al.Keepaway Soccer:from Machine Learning Testbed to Benchmark[M].RoboCup 2005:Robot Soccer World Cup IX .Berlin:Springer Verlag,2006:93-105 [3] Stone P,Sutton R S,Kuhlmann G.Reinforcement Learning for RoboCup Soccer Keepaway [J].Adaptive Behavior,2005,13(3):165-188 [4] 左国玉,张红卫,韩光胜.基于多智能体强化学习的新强化函数设计[J].控制工程,2009,16(2):239-242 [5] Sutton R S,Barto A G.Reinforcement Learning:an Introduction [M].Cambridge,MA:The MIT Press,2012 [6] Taylor M,Stone P,Liu Y.Transfer Learning via Inter-taskMappings for Temporal Difference Learning [J].Journal of Machine Learning Research,2007,8(1):2125-2167 [7] Fernández F,García J,Veloso M.Probabilistic Policy Reuse for Inter-task Transfer Learning [J].Robotics and Autonomous Systems,2010,58(7):866-871 [8] Fernández F,Veloso M.Probabilistic Policy Reuse in a Rein-forcement Learning Agent[C]∥Nakashima H,Wellman M.AAMAS’06 Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2006:720-727 [9] Rummery G A,Niranjan M.On-Line Q-learning using Connectionist Systems[R].Cambridge,England:Cambridge University Engineering Department,1994 [10] Walsh T J,Li L,Littman M.Transferring State ions between MDPs[C]∥Proceedings of the ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006 [11] Taylor M E,Stone P.Behavior Transfer for Value-function-based Reinforcement Learning[C]∥Pechoucek M.The Fourth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2005:53-59 [12] Fernández F,Veloso M.Policy Reuse for Transfer LearningAcross Tasks with Different State and Action Spaces[C]∥ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006 [13] Riedmiller M,Gabel T,Hafner R.Reinforcement Learning forRobot Soccer[J].Autonomous Robots,2009,27(1):55-73 [14] Gabel T,Riedmiller M.On Progress in RoboCup:the Simulation League Showcase in RoboCup 2010:Robot Soccer World Cup XIV[M].Berlin:Springer Verlag,2011:36-47 [15] Kalyanakrishnan S,Stone P.Characterizing ReinforcementLearning Methods through Parameterized Learning Problems [J].Machine Learning,2011,84(1/2):205-247 [16] Sherstov A A,Stone P.Function Approximation via Tile Co-ding:Automating Parameter Choice in Abstaction,Reformulation and Approximation [M].Berlin:Springer Verlag,2005:194-205 [17] Stone P,Sutton R S.Scaling Reinforcement Learning towardRoboCup Soccer[C]∥the Eighteenth International Conference on Machine Learning.Massachusetts:Williamstown,2001:537-544 [18] 程显毅,朱倩.一种改进的强化学习方法在RoboCup中的应用[J].广西师范大学学报:自然科学版,2010,28(3):99-102 [19] 刘春阳,谭应清,柳长安.多智能体强化学习在足球机器人中的研究与应用 [J].电子学报,2010,38(8):1958-1962 [20] 李瑾,刘全,杨旭东.一种改进的平均奖赏强化学习方法在Robo-Cup训练中的应用[J].苏州大学学报,2012,8(2):21-26 |
No related articles found! |
|