计算机科学 ›› 2010, Vol. 37 ›› Issue (12): 186-189.

• 人工智能 • 上一篇    下一篇

基于TD(λ)的自然梯度强化学习算法

陈圣磊,谷瑞军,陈耿,薛晖   

  1. (南京审计学院信息科学学院 南京211815);(东南大学计算机科学与工程学院 南京210096)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(70971067,60905002),江苏省高校自然科学重大基础研究项目(08KJA520001),江苏省六大人才高峰项目(2007148)资助。

Natural Gradient Reinforcement Learning Algorithm with TD(λ)

CHEN Sheng-lei,GU Rui-jun,CHEN Geng,XUE Hui   

  • Online:2018-12-01 Published:2018-12-01

摘要: 近年来强化学习中的策略梯度方法以其良好的收敛性能吸引了广泛的关注。研究了平均模型中的自然梯度算法,针对现有算法估计梯度时效率较低的问题,在梯度估计的值函数逼近中采用了TD(λ)方法。TD(λ)中的资格迹使学习经验的传播更加高效,从而能够降低梯度估计的方差,提升算法的收敛速度。车杆平衡系统仿真实验验证了所提算法的有效性。

关键词: 策略梯度,自然梯度,TD(λ),资格迹

Abstract: In recent years,policy gradient methods arouse extensive interests in reinforcement learning with its excellent convergence property. Natural gradient algorithms were investigated in this paper. To resolve the problem of low efficiency when estimating the gradient in present algorithms,TD(λ) method was used to approximate the value functions when estimating the gradient. The eligibility traces in TD(λ) make the propagation of learning experience more efficient.As a result, the variance in gradient estimation can be decreased and the convergence speed can be improved. The simulation experiment in cart pole balancing system demonstrates the effectiveness of the algorithm.

Key words: Policy gradient, Natural gradient,TD(λ) , Eligibility trace

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!