Computer Science ›› 2020, Vol. 47 ›› Issue (6A): 130-134.doi: 10.11896/JsJkx.190700038

• Artificial Intelligence • Previous Articles     Next Articles

Novel DQN Algorithm Based on Function Approximation and Collaborative Update Mechanism

LIU Qing-song1, 2, CHEN Jian-ping1, 2, FU Qi-ming1, 2, GAO Zhen1, LU You1 and WU Hong-Jie1   

  1. 1 College of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
    2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
  • Published:2020-07-07
  • About author:LIU Qing-song, master candidate.His main research interests include reinforcement learning and building energy efficiency.
    CHEN Jian-ping, doctor, professor. His research interests include big data and analytics, building energy efficiency, and intelligent information.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61772357,61750110519,61772355,61702055,61672371,61602334) and Natural Science Foundation of the Jiangsu Higher Education Institutions of China(BE2017663).

Abstract: With respect to the problem that the classical DQN (Deep Q-Network) algorithm has slow convergence in the early stage of the training process,this paper proposes a novel DQN algorithm based on function approximation and collaborative update mechanism,which combines the linear function method with the classical DQN algorithm.In the early stage of the training,the linear function network is used to replace the behavior value function network and proposed an update rule from the strategy value function,which can accelerate the parameter optimization process of the neural network and speed up the convergence rate.The proposed algorithm and DQN algorithm are applied to the CartPole and Mountain Car problems,and the experimental results show that the proposed algorithm has faster convergence rate.

Key words: Deep Q-Network, Linear function, MDP, Reinforcement learning

CLC Number: 

  • TP391
[1] SUTTON R S.Learning to Predict by the Methods of Temporal Differences.Machine Learning,1988,3(1):9-44.
[2] ZHANG R B,GU G C,LIU Z D,et al.Reinforcement Learning Theory,Algorithms and Its Application.Control Theory and Application,2000,17(5):637-642.
[3] SUTTON R S,BARTO A G.Reinforcement learning:An intro-duction.Massachusetts:MIT press,2018.
[4] WATKINS C J C H,DAYAN P.Q-learning.Machine Learning,1992,8(3/4):279-292.
[5] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensional-ity of data with neural networks.Science,2006,313(5786):504-507.
[6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level con-trol through deep reinforcement learning.Nature,2015,518(7540):529.
[7] OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN//Advances in Neural Information Processing Systems.Barcelona,Spain,2016:4026-4034.
[8] ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Variance Reduction and Stabilization for Deep Reinforcement Learn-ing//Advances in International Conference on Machine Learning.New York,USA,2016.
[9] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning.arXiv:1511.06581,2015.
[10] LEVINE S,FINN C,DARRELL T,et al.End-to-End Training of Deep Visuomotor Policies.Journal of Machine Learning Research,2015,17(39):1-40.
[11] LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with large-scale data collection//Proceedings of International Symposium on Experimental Robotics.Berlin:Springer,2016:173-184.
[1] LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[2] YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[3] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[4] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[5] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[6] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[7] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[8] GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318.
[9] FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341.
[10] ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[11] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[12] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[13] ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245.
[14] LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353.
[15] HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Load-balanced Geographic Routing Protocol in Aerial Sensor Network [J]. Computer Science, 2022, 49(2): 342-352.
Full text



No Suggested Reading articles found!