基于TD(λ)的自然梯度强化学习算法

Computer Science ›› 2010, Vol. 37 ›› Issue (12): 186-189.

Natural Gradient Reinforcement Learning Algorithm with TD(λ)

CHEN Sheng-lei,GU Rui-jun,CHEN Geng,XUE Hui

Online:2018-12-01 Published:2018-12-01

Abstract

Abstract: In recent years,policy gradient methods arouse extensive interests in reinforcement learning with its excellent convergence property. Natural gradient algorithms were investigated in this paper. To resolve the problem of low efficiency when estimating the gradient in present algorithms,TD(λ) method was used to approximate the value functions when estimating the gradient. The eligibility traces in TD(λ) make the propagation of learning experience more efficient.As a result, the variance in gradient estimation can be decreased and the convergence speed can be improved. The simulation experiment in cart pole balancing system demonstrates the effectiveness of the algorithm.

Key words: Policy gradient, Natural gradient,TD(λ) , Eligibility trace

CHEN Sheng-lei,GU Rui-jun,CHEN Geng,XUE Hui. Natural Gradient Reinforcement Learning Algorithm with TD(λ)[J].Computer Science, 2010, 37(12): 186-189.