改进深度确定性策略梯度算法及其在控制中的应用

Abstract

Abstract: Deep reinforcement learning often has the problem of low sampling efficiency.Priority sampling can improve sampling efficiency to a certain extent.The prioritized experience replay was applied to the deep deterministic policy gradient algorithm,and a small sample sorting method was proposed for the high complexity of the general prioritized experience replay algorithm.Simulation results show that the improved deep deterministic policy gradient algorithm improves the sampling efficiency and has better training effect.The algorithm is applied in the direction control of a car,compared with traditional PID control,this algorithm can avoid the problem of manual adjustment of parameters and has a wider application prospect.

Key words: Deep deterministic policy gradient, Deep reinforcement learning, Direction control, Prioritized experience replay

CLC Number:

TP183

ZHANG Hao-yu, XIONG Kai. Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control[J].Computer Science, 2019, 46(6A): 555-557.

References

[1]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484.
[2]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].Computer Science,2013.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529.
[5]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]∥International Conference on Machine Learning (ICML).2014.
[6]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):A187.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[OL].http://mailer.oailer.net/paper/4054420.
[8]周志华.机器学习[M].北京:清华大学出版社,2016:377-382 [9]SCHULMAN J,MORITZ P,LEVINE S,et al.High-Dimensional Continuous Control Using Generalized Advantage Estimation[OL].http://arXiv.org/pdf/1506.02438v1.pdf.
[10]SUTTON,RICHARD S,BARTO,et al.Introduction to Rein-forcement Learning[J].Machine Learning,2005,16(1):285-286.
[11]KONDA V.Actor-critic algorithms[J].Siam Journal on Control & Optimization,2006,42(4):1143-1166.
[12]VAN H V,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]∥Proceedings of the AAAI Conference on Artificial Intelligence.Phoenix,USA,2016:2094-2100
[13]THRUN S,SCHWARTZ A.Issues in using function approxima tion for reinforcement learning[C]∥Proceedings of the 1993 Connectionist Models Summer School.Hillsdale,NJ,1993.

Related Articles 15

[1]	YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[2]	LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[3]	XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[4]	HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[5]	LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[6]	OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[7]	DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[8]	CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[9]	LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing [J]. Computer Science, 2021, 48(7): 316-323.
[10]	WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[11]	ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju. Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning [J]. Computer Science, 2021, 48(7): 40-46.
[12]	LI Bei-bei, SONG Jia-rui, DU Qing-yun, HE Jun-jiang. DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things [J]. Computer Science, 2021, 48(7): 47-54.
[13]	FAN Yan-fang, YUAN Shuang, CAI Ying, CHEN Ruo-yu. Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing [J]. Computer Science, 2021, 48(5): 270-276.
[14]	FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50.
[15]	HUANG Zhi-yong, WU Hao-lin, WANG Zhuang, LI Hui. DQN Algorithm Based on Averaged Neural Network Parameters [J]. Computer Science, 2021, 48(4): 223-228.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0