计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 130-134.doi: 10.11896/JsJkx.190700038
刘青松1, 2, 陈建平1, 2, 傅启明1, 2, 高振1, 陆悠1, 吴宏杰1
LIU Qing-song1, 2, CHEN Jian-ping1, 2, FU Qi-ming1, 2, GAO Zhen1, LU You1 and WU Hong-Jie1
摘要: 针对经典深度Q网络(Deep Q-Network,DQN)算法在训练初期收敛速度慢的问题,文中提出一种新的基于函数逼近协同更新的DQN算法。该算法在经典的DQN算法的基础上融合了线性函数方法,在训练的初期利用线性函数逼近器来代替神经网络中的行为值函数网络,并提出一种离策略值函数更新规则,与DQN协同更新值函数参数,加快神经网络的参数优化,进而加快算法的收敛速度。将改进后的算法与DQN算法用于CartPole和Mountain Car问题,实验结果表明,改进后的算法具有更快的收敛速度。
中图分类号:
[1] SUTTON R S.Learning to Predict by the Methods of Temporal Differences.Machine Learning,1988,3(1):9-44. [2] ZHANG R B,GU G C,LIU Z D,et al.Reinforcement Learning Theory,Algorithms and Its Application.Control Theory and Application,2000,17(5):637-642. [3] SUTTON R S,BARTO A G.Reinforcement learning:An intro-duction.Massachusetts:MIT press,2018. [4] WATKINS C J C H,DAYAN P.Q-learning.Machine Learning,1992,8(3/4):279-292. [5] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensional-ity of data with neural networks.Science,2006,313(5786):504-507. [6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level con-trol through deep reinforcement learning.Nature,2015,518(7540):529. [7] OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN//Advances in Neural Information Processing Systems.Barcelona,Spain,2016:4026-4034. [8] ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Variance Reduction and Stabilization for Deep Reinforcement Learn-ing//Advances in International Conference on Machine Learning.New York,USA,2016. [9] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning.arXiv:1511.06581,2015. [10] LEVINE S,FINN C,DARRELL T,et al.End-to-End Training of Deep Visuomotor Policies.Journal of Machine Learning Research,2015,17(39):1-40. [11] LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with large-scale data collection//Proceedings of International Symposium on Experimental Robotics.Berlin:Springer,2016:173-184. |
[1] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[2] | 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148 |
[3] | 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174 |
[4] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[5] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[6] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[7] | 郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101 |
[8] | 范静宇, 刘全. 基于随机加权三重Q学习的异策略最大熵强化学习算法 Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning 计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081 |
[9] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[10] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[11] | 张佳能, 李辉, 吴昊霖, 王壮. 一种平衡探索和利用的优先经验回放方法 Exploration and Exploitation Balanced Experience Replay 计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084 |
[12] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[13] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
[14] | 周琴, 罗飞, 丁炜超, 顾春华, 郑帅. 基于逐次超松弛技术的Double Speedy Q-Learning算法 Double Speedy Q-Learning Based on Successive Over Relaxation 计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173 |
[15] | 李素, 宋宝燕, 李冬, 王俊陆. 面向金融活动的复合区块链关联事件溯源方法 Composite Blockchain Associated Event Tracing Method for Financial Activities 计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068 |
|