Computer Science ›› 2014, Vol. 41 ›› Issue (9): 232-238.doi: 10.11896/j.issn.1002-137X.2014.09.044
Previous Articles Next Articles
ZHOU Xin,LIU Quan,FU Qi-ming and XIAO Fei
[1] Sutton R S,Barto A G.Reinforcement learning:An introduction [M].Cambridge:MIT Press,1998 [2] 刘全,闫其粹,伏玉琛,等.一种基于启发式奖赏函数的分层强化学习方法 [J].计算机研究与发展,2011,48(12):2352-2358 [3] Kaelbing L P,Littman M L,Moore A W.Reinforcement lear-ning:A survey [J].Journal of Artificial Intelligence Research,1996,4(2):237-285 [4] 刘全,傅启明,龚声蓉,等.最小状态变元平均奖赏的强化学习方法 [J].通信学报,2011,32(1):66-71 [5] Gao Yang,Chen Shi-fu,Lu Xin.Research on reinforcementlearning technology:A review [J].Journal of Acta Automatica Sinica,2004,30(1):86-100 [6] Geist M,Pietquin O.Parametric value function approximation:Aunified view [C]∥Proc of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.NJ:IEEE,2011:9-16 [7] Bradtke S J,Barto A G.Linear least-squares algorithms for temporal difference learning[J].Journal of Machine Learning,1996,22:33-57 [8] Boyan J.Technical update Least-squares temporal differencelearning [J].Journal of Machine Learning,2002,49:233-246 [9] Maei H R,Szepesvari C,Bhatnagar S,et al.Toward off-policylearning control with function approximation [C]∥Proc of the 27th International Conference on Machine Learning.Haifa:Omnipress,2010:719-726 [10] Sutton R S.Learning to predict by the method of temporaldifferences [J].Journal of Machine Learning,1988,22:33-57 [11] Sutton R S,Szepesvari Cs,Maei H R.A convergent O(n) algorithm for off-policy temporal-difference learning with Linear function approximation[C]∥Proc of the 25th Annual Confe-rence on Neural Information Processing Systems.Granada,2008:1609-1616 [12] Lagoudakis M,Parr R,Littman M.Least-squares methods in reinforcement learning for control[J].Methods and Applications of Artificial Intelligence,2002,2308:249-260 [13] Lagoudakis M,Parr R.Least squares policy iteration [J].Journal of Machine Learning Research,2003(4):1107-1149 [14] Busoniu L,Babuska R,Schutter B D,et al.ReinforcementLearning and Dynamic Programming using Function Approximators [M].New York:CRC Press,2010 [15] Kalyanakrishnan S,Stone P.Batch reinforcement learning in a complex domain[C]∥Proc of the 6th International Conference on Autonomous Agents and Multiagent Systems.New York,2007:650-657 [16] Jung T,Polani D.Kernelizing LSPE (λ) [C]∥Proc of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.NJ:IEEE,2007 [17] Jung T,Polani D.Least squares SVM for least squares TDlearning[C]∥Proc of the 17th European Conference on Artificial Intelligence.Riva del Garda,2006:499-503 |
No related articles found! |
|