计算机科学 ›› 2012, Vol. 39 ›› Issue (Z6): 235-237.
• • 上一篇 下一篇
乔林,罗杰
出版日期:
发布日期:
Online:
Published:
摘要: 传统的Q学习算法是基于单奖惩标准的。基于单奖惩标准的Q学习算法往往不能适应multi-agent system 关键词: Q学习算法,多奖惩标准,MAS,三维围捕 Abstract: Traditional C}learning algorithm is based on a single standard of reward, when the environments and the state is changed, the single standard of reward may not be able to adapt to new environments and state in multi agent system(MAS) , instead, it may restrict the learning efficiency. hhis paper proposed a method of multi agent "lcarning algorithm with multi-standard of reward. It adapt well to the changing environment and the state, complete the task in stages, different stages use different standards, so it can quickly complete the stage goal. In this paper, the simulation platform is pursuit problem in threcdimensional world. We increased the difficulty of rounding up and the complexity of the environment and state. Simulation results show that "lcarning algorithm based on multi-standard of reward can flexibly adapt to different environments and state,and efficiently complete learning tasks. Key words: "learning algorithm, Multi-standard of reward, MAS, Pursuit problem in threcdimensional world 引用本文 乔林,罗杰. MAS中基于多奖惩标准的Q学习算法研究[J]. 计算机科学, 2012, 39(Z6): 235-237. https://doi.org/ 使用本文 0 / / 推荐 导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks 链接本文: https://www.jsjkx.com/CN/ https://www.jsjkx.com/CN/Y2012/V39/IZ6/235
关键词: Q学习算法,多奖惩标准,MAS,三维围捕
Abstract: Traditional C}learning algorithm is based on a single standard of reward, when the environments and the state is changed, the single standard of reward may not be able to adapt to new environments and state in multi agent system(MAS) , instead, it may restrict the learning efficiency. hhis paper proposed a method of multi agent "lcarning algorithm with multi-standard of reward. It adapt well to the changing environment and the state, complete the task in stages, different stages use different standards, so it can quickly complete the stage goal. In this paper, the simulation platform is pursuit problem in threcdimensional world. We increased the difficulty of rounding up and the complexity of the environment and state. Simulation results show that "lcarning algorithm based on multi-standard of reward can flexibly adapt to different environments and state,and efficiently complete learning tasks.
Key words: "learning algorithm, Multi-standard of reward, MAS, Pursuit problem in threcdimensional world
乔林,罗杰. MAS中基于多奖惩标准的Q学习算法研究[J]. 计算机科学, 2012, 39(Z6): 235-237. https://doi.org/
0 / / 推荐
导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks
链接本文: https://www.jsjkx.com/CN/
https://www.jsjkx.com/CN/Y2012/V39/IZ6/235
Cited