一种多步Q强化学习方法

计算机科学 ›› 2006, Vol. 33 ›› Issue (3): 147-150.

一种多步Q强化学习方法

出版日期:2018-11-17 发布日期:2018-11-17
基金资助:
本文得到国防预研基金项目资助.

Online:2018-11-17 Published:2018-11-17

摘要/Abstract

摘要： Q学习是一种重要的强化学习算法.本文针对Q学习和Q（λ）算法的不足，提出了一种具有多步预见能力的Q学习方法：MQ方法.首先给出了MDP模型，在分析Q学习和Q（λ）算法的基础上给出了MQ算法的推导过程，并分析了算法的更新策略和k值的确定原则.通过悬崖步行仿真试验验证了该算法的有效性.理论分析和数值试验均表明，该算法具有较强的预见能力，同时能降低计算复杂度，是一种有效平衡更新速度和复杂度的强化学习方法.

关键词: 强化学习 MQ算法 Q学习 Q（λ）算法

Abstract: Q learning is of great importance in reinforcement learning. MQ algorithm with multi-step predicting capability is proposed to compensate the drawbacks of Q learning and Q（λ）algorithmin in this paper . Firsly MDP model is presented. Then based on the anal

Key words: Reinforcement learning, MQ algorithm, Q learning, Q（λ）algorithm

. 一种多步Q强化学习方法[J]. 计算机科学, 2006, 33(3): 147-150. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed