改进深度确定性策略梯度算法及其在控制中的应用

计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 555-557.

改进深度确定性策略梯度算法及其在控制中的应用

张浩昱, 熊凯

北京控制工程研究所空间智能控制技术国家级重点实验室北京100190

出版日期:2019-06-14 发布日期:2019-07-02
通讯作者: 熊凯(1976-),男,博士,研究员,主要研究方向为自适应滤波和航天器自主导航,E-mail:17600517255@163.com(通信作者)。
作者简介:张浩昱(1994-),男,硕士生,主要研究方向为深度强化学习,E-mail:Haoy_Zhang@163.com;
基金资助:
本文受北京市自然科学基金(4162070),国家自然科学基金 (61573059)资助。

Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control

ZHANG Hao-yu, XIONG Kai

Science and Technology on Space Intelligent Control Laboratory,Beijing Institute of Control Engineering,Beijing 100190,China

Online:2019-06-14 Published:2019-07-02

摘要/Abstract

摘要： 深度强化学习往往存在采样效率过低的问题,优先级采样可以在一定程度上提高采样效率。将优先级采样用于深度确定性策略梯度算法,并针对普通优先级采样算法复杂度高的问题提出一种小样本排序的思路。仿真实验结果表明,这种改进的深度确定性策略梯度算法提高了采样效率,具有好的训练效果。将深度确定性策略梯度算法用于小车方向控制,相比于传统的PID控制,该算法避免了人工调整参数的问题,具有更广阔的应用前景。

关键词: 方向控制, 深度强化学习, 深度确定性策略梯度, 优先级采样

Abstract: Deep reinforcement learning often has the problem of low sampling efficiency.Priority sampling can improve sampling efficiency to a certain extent.The prioritized experience replay was applied to the deep deterministic policy gradient algorithm,and a small sample sorting method was proposed for the high complexity of the general prioritized experience replay algorithm.Simulation results show that the improved deep deterministic policy gradient algorithm improves the sampling efficiency and has better training effect.The algorithm is applied in the direction control of a car,compared with traditional PID control,this algorithm can avoid the problem of manual adjustment of parameters and has a wider application prospect.

Key words: Deep deterministic policy gradient, Deep reinforcement learning, Direction control, Prioritized experience replay

中图分类号:

TP183

张浩昱, 熊凯. 改进深度确定性策略梯度算法及其在控制中的应用[J]. 计算机科学, 2019, 46(6A): 555-557. https://doi.org/

ZHANG Hao-yu, XIONG Kai. Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control[J]. Computer Science, 2019, 46(6A): 555-557. https://doi.org/

参考文献

[1]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484.
[2]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].Computer Science,2013.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529.
[5]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]∥International Conference on Machine Learning (ICML).2014.
[6]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):A187.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[OL].http://mailer.oailer.net/paper/4054420.
[8]周志华.机器学习[M].北京:清华大学出版社,2016:377-382 [9]SCHULMAN J,MORITZ P,LEVINE S,et al.High-Dimensional Continuous Control Using Generalized Advantage Estimation[OL].http://arXiv.org/pdf/1506.02438v1.pdf.
[10]SUTTON,RICHARD S,BARTO,et al.Introduction to Rein-forcement Learning[J].Machine Learning,2005,16(1):285-286.
[11]KONDA V.Actor-critic algorithms[J].Siam Journal on Control & Optimization,2006,42(4):1143-1166.
[12]VAN H V,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]∥Proceedings of the AAAI Conference on Artificial Intelligence.Phoenix,USA,2016:2094-2100
[13]THRUN S,SCHWARTZ A.Issues in using function approxima tion for reinforcement learning[C]∥Proceedings of the 1993 Connectionist Models Summer School.Hillsdale,NJ,1993.

相关文章 15

[1]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[3]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[4]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[5]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[6]	李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[7]	欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[8]	代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[9]	成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[10]	梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展 Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing 计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
[11]	王英恺, 王青山. 能量收集无线通信系统中基于强化学习的能量分配策略 Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting 计算机科学, 2021, 48(7): 333-339. https://doi.org/10.11896/jsjkx.201100154
[12]	周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现 Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning 计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
[13]	李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统 DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things 计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
[14]	范家宽, 王皓月, 赵生宇, 周添一, 王伟. 数据驱动的开源贡献度量化评估与持续优化方法 Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions 计算机科学, 2021, 48(5): 45-50. https://doi.org/10.11896/jsjkx.201000107
[15]	范艳芳, 袁爽, 蔡英, 陈若愚. 车载边缘计算中基于深度强化学习的协同计算卸载方案 Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing 计算机科学, 2021, 48(5): 270-276. https://doi.org/10.11896/jsjkx.201000005

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed