一种新的基于函数逼近协同更新的DQN算法

doi:10.11896/JsJkx.190700038

Computer Science ›› 2020, Vol. 47 ›› Issue (6A): 130-134.doi: 10.11896/JsJkx.190700038

• Artificial Intelligence • Previous Articles Next Articles

Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism

LIU Qing-song^{1, 2}, CHEN Jian-ping^{1, 2}, FU Qi-ming^{1, 2}, GAO Zhen¹, LU You¹ and WU Hong-Jie¹

1 College of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China

Published:2020-07-07
About author:LIU Qing-song, master candidate.His main research interests include reinforcement learning and building energy efficiency.
CHEN Jian-ping, doctor, professor. His research interests include big data and analytics, building energy efficiency, and intelligent information.
Supported by:
This work was supported by the National Natural Science Foundation of China(61772357,61750110519,61772355,61702055,61672371,61602334) and Natural Science Foundation of the Jiangsu Higher Education Institutions of China(BE2017663).

Abstract

Abstract: With respect to the problem that the classical DQN (Deep Q-Network) algorithm has slow convergence in the early stage of the training process,this paper proposes a novel DQN algorithm based on function approximation and collaborative update mechanism,which combines the linear function method with the classical DQN algorithm.In the early stage of the training,the linear function network is used to replace the behavior value function network and proposed an update rule from the strategy value function,which can accelerate the parameter optimization process of the neural network and speed up the convergence rate.The proposed algorithm and DQN algorithm are applied to the CartPole and Mountain Car problems,and the experimental results show that the proposed algorithm has faster convergence rate.

Key words: Deep Q-Network, Linear function, MDP, Reinforcement learning

CLC Number:

TP391

LIU Qing-song, CHEN Jian-ping, FU Qi-ming, GAO Zhen, LU You and WU Hong-Jie. Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism[J].Computer Science, 2020, 47(6A): 130-134.

References

[1] SUTTON R S.Learning to Predict by the Methods of Temporal Differences.Machine Learning,1988,3(1):9-44.
[2] ZHANG R B,GU G C,LIU Z D,et al.Reinforcement Learning Theory,Algorithms and Its Application.Control Theory and Application,2000,17(5):637-642.
[3] SUTTON R S,BARTO A G.Reinforcement learning:An intro-duction.Massachusetts:MIT press,2018.
[4] WATKINS C J C H,DAYAN P.Q-learning.Machine Learning,1992,8(3/4):279-292.
[5] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensional-ity of data with neural networks.Science,2006,313(5786):504-507.
[6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level con-trol through deep reinforcement learning.Nature,2015,518(7540):529.
[7] OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN//Advances in Neural Information Processing Systems.Barcelona,Spain,2016:4026-4034.
[8] ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Variance Reduction and Stabilization for Deep Reinforcement Learn-ing//Advances in International Conference on Machine Learning.New York,USA,2016.
[9] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning.arXiv:1511.06581,2015.
[10] LEVINE S,FINN C,DARRELL T,et al.End-to-End Training of Deep Visuomotor Policies.Journal of Machine Learning Research,2015,17(39):1-40.
[11] LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with large-scale data collection//Proceedings of International Symposium on Experimental Robotics.Berlin:Springer,2016:173-184.

Related Articles 15

[1]	LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[2]	YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[3]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[4]	YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[5]	LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[6]	XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[7]	HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[8]	GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318.
[9]	FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341.
[10]	ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[11]	LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[12]	OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[13]	ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245.
[14]	LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353.
[15]	HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0