一种新的基于函数逼近协同更新的DQN算法

doi:10.11896/JsJkx.190700038

计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 130-134.doi: 10.11896/JsJkx.190700038

一种新的基于函数逼近协同更新的DQN算法

刘青松^{1, 2}, 陈建平^{1, 2}, 傅启明^{1, 2}, 高振¹, 陆悠¹, 吴宏杰¹

1 苏州科技大学电子与信息工程学院江苏苏州 215009;
2 苏州科技大学江苏省建筑智慧节能重点实验室江苏苏州 215009

发布日期:2020-07-07
通讯作者: 陈建平(alanJpchen@yahoo.com)
作者简介:1622703301@qq.com
基金资助:
国家自然科学基金(61772357,61750110519,61772355,61702055,61672371,61602334);江苏省重点研发计划项目(BE2017663)

Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism

LIU Qing-song^{1, 2}, CHEN Jian-ping^{1, 2}, FU Qi-ming^{1, 2}, GAO Zhen¹, LU You¹ and WU Hong-Jie¹

1 College of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China

Published:2020-07-07
About author:LIU Qing-song, master candidate.His main research interests include reinforcement learning and building energy efficiency.
CHEN Jian-ping, doctor, professor. His research interests include big data and analytics, building energy efficiency, and intelligent information.
Supported by:
This work was supported by the National Natural Science Foundation of China(61772357,61750110519,61772355,61702055,61672371,61602334) and Natural Science Foundation of the Jiangsu Higher Education Institutions of China(BE2017663).

摘要/Abstract

摘要： 针对经典深度Q网络(Deep Q-Network,DQN)算法在训练初期收敛速度慢的问题,文中提出一种新的基于函数逼近协同更新的DQN算法。该算法在经典的DQN算法的基础上融合了线性函数方法,在训练的初期利用线性函数逼近器来代替神经网络中的行为值函数网络,并提出一种离策略值函数更新规则,与DQN协同更新值函数参数,加快神经网络的参数优化,进而加快算法的收敛速度。将改进后的算法与DQN算法用于CartPole和Mountain Car问题,实验结果表明,改进后的算法具有更快的收敛速度。

关键词: DQN, MDP, 强化学习, 线性函数

Abstract: With respect to the problem that the classical DQN (Deep Q-Network) algorithm has slow convergence in the early stage of the training process,this paper proposes a novel DQN algorithm based on function approximation and collaborative update mechanism,which combines the linear function method with the classical DQN algorithm.In the early stage of the training,the linear function network is used to replace the behavior value function network and proposed an update rule from the strategy value function,which can accelerate the parameter optimization process of the neural network and speed up the convergence rate.The proposed algorithm and DQN algorithm are applied to the CartPole and Mountain Car problems,and the experimental results show that the proposed algorithm has faster convergence rate.

Key words: Deep Q-Network, Linear function, MDP, Reinforcement learning

中图分类号:

TP391

刘青松, 陈建平, 傅启明, 高振, 陆悠, 吴宏杰. 一种新的基于函数逼近协同更新的DQN算法[J]. 计算机科学, 2020, 47(6A): 130-134. https://doi.org/10.11896/JsJkx.190700038

LIU Qing-song, CHEN Jian-ping, FU Qi-ming, GAO Zhen, LU You and WU Hong-Jie. Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism[J]. Computer Science, 2020, 47(6A): 130-134. https://doi.org/10.11896/JsJkx.190700038

参考文献

[1] SUTTON R S.Learning to Predict by the Methods of Temporal Differences.Machine Learning,1988,3(1):9-44.
[2] ZHANG R B,GU G C,LIU Z D,et al.Reinforcement Learning Theory,Algorithms and Its Application.Control Theory and Application,2000,17(5):637-642.
[3] SUTTON R S,BARTO A G.Reinforcement learning:An intro-duction.Massachusetts:MIT press,2018.
[4] WATKINS C J C H,DAYAN P.Q-learning.Machine Learning,1992,8(3/4):279-292.
[5] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensional-ity of data with neural networks.Science,2006,313(5786):504-507.
[6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level con-trol through deep reinforcement learning.Nature,2015,518(7540):529.
[7] OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN//Advances in Neural Information Processing Systems.Barcelona,Spain,2016:4026-4034.
[8] ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Variance Reduction and Stabilization for Deep Reinforcement Learn-ing//Advances in International Conference on Machine Learning.New York,USA,2016.
[9] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning.arXiv:1511.06581,2015.
[10] LEVINE S,FINN C,DARRELL T,et al.End-to-End Training of Deep Visuomotor Policies.Journal of Machine Learning Research,2015,17(39):1-40.
[11] LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with large-scale data collection//Proceedings of International Symposium on Experimental Robotics.Berlin:Springer,2016:173-184.

相关文章 15

[1]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[4]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[5]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[6]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[7]	郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[8]	范静宇, 刘全. 基于随机加权三重Q学习的异策略最大熵强化学习算法 Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning 计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[9]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[10]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[11]	张佳能, 李辉, 吴昊霖, 王壮. 一种平衡探索和利用的优先经验回放方法 Exploration and Exploitation Balanced Experience Replay 计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[12]	李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[13]	欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[14]	周琴, 罗飞, 丁炜超, 顾春华, 郑帅. 基于逐次超松弛技术的Double Speedy Q-Learning算法 Double Speedy Q-Learning Based on Successive Over Relaxation 计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173
[15]	李素, 宋宝燕, 李冬, 王俊陆. 面向金融活动的复合区块链关联事件溯源方法 Composite Blockchain Associated Event Tracing Method for Financial Activities 计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种新的基于函数逼近协同更新的DQN算法

Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0