Computer Science ›› 2021, Vol. 48 ›› Issue (4): 223-228.doi: 10.11896/jsjkx.200600177

• Artificial Intelligence • Previous Articles     Next Articles

DQN Algorithm Based on Averaged Neural Network Parameters

HUANG Zhi-yong, WU Hao-lin, WANG Zhuang, LI Hui   

  1. College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2020-06-24 Revised:2020-09-14 Online:2021-04-15 Published:2021-04-09
  • About author:HUANG Zhi-yong,born in 1995,postgraduate.His main research interests include deep reinforcement learning and so on.(huangzhiyong1995@163.com)
    LI Hui,born in 1970,Ph.D,professor.His main research interests include computational intelligence,battlefield simulation and virtual reality.
  • Supported by:
    Joint Foundation of the Ministry of Education(6141A02011607).

Abstract: In the field of deep reinforcement learning,how to efficiently explore environment is a hard problem.Deep Q-network algorithm explores environment with epsilon-greedy policy whose size and decay need manual tuning.Unsuitable tuning will cause a poor performance.The epsilon-greedy policy is ineffective and cannot resolve deep exploration problem.In this paper,in order to solve the problem,a deep reinforcement learning algorithm based on averaged neural network parameters (AP-DQN) is proposed.At the beginning of episode,the algorithm averages the multiple online network parameters learned by the agent to obtain a perturbed neural network parameter,and then selects an action through the perturbed neural network,which can improve the agent’s exploration efficiency.Experiment results show that the exploration efficiency of AP-DQN is better than that of DQN on deep exploration problem and AP-DQN get higher scores than DQN in five Atari games.The normalized score increases by 112.50% at most and 19.07% at least compared with DQN.

Key words: Deep exploration, Deep Q-network, Deep reinforcement learning, Neural network parameters

CLC Number: 

  • TP181
[1]SUTTON R,BARTO A.Reinforcement learning:An introduc-tion[M].Massachusetts:MIT Press,2018.
[2]GAO Y,ZHOU R I,WANG H,et al.Research on average reward reinforcement learning algorithm [J].Chinese Journal of Computers,2007,30(8):1372-1378.
[3]YANG W C,ZHANG L.Multi-agent reinforcement learningbased traffic signal control for integrated urban network:survey of state of art [J].Application Research of Computers,2018,35(6):1613-1618.
[4]TAN M.Multi-agent reinforcement learning:independent vs.cooperative agents [C]//Proceeding of the 10th International Conference on Machine Learning.San Francisco,CA:Morgan Kaufmann Publishing,1993:487-494.
[5]PETER D.Q-learning [J].Machine Learning,1992,8(3):279-292.
[6]MABU S,HATAKEYAMA H,HIRASAWA K,et al.Genetic Network Programming with Reinforcement Learning Using Sarsa Algorithm [C]//IEEE Congress on Evolutionary Computation.IEEE,2006.
[7]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning [J].arXiv:1312.5602v1,2013.
[8]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533.
[9]VAN H V,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning [C]//Proceeding of the AAAI Conference on Artificial Intelligence.2016:2094-2100.
[10]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay [C]//Proceeding of International Conference on Learning Representations.2016:53-73.
[11]WANG Z Y,SCHAUL T,HESSEL M,et al.Dueling networkarchitectures for deep reinforcement learning [C]//Proceeding of the 33rd International Conference on Machine Learning.2016:1995-2003.
[12]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN [J].arXiv:1602.04621v3,2016.
[13]PLAPPERT M,HOUTHOOFT R,DHARIWAL P,et al.Pa-rameter space noise for exploration[J/OL].https://arxiv.org/abs/1706.01905.
[14]LIU Q,YAN Y,ZHU F,et al.A deep recurrent Q Network with exploratory noise [J].Chinese Journal of Computers,2019(7):1588-1604.
[15]YANG M,WANG J.A Bayesian deep reinforcement learning algorithm for solving deep exploration problems [J].Journal of Frontiers of Computer Science and Technology,2020,14(2):307-316.
[16]BELLEMARE M G,NADDAF Y,VENESS J,et al.The arcade learning environment:an evaluation platform for general agents [J].Journal of Artificial Intelligence Research,2013,47:253-279.
[17]SAMUEL CASTRO P,MOITRA S,GELADA C,et al.Dopa-mine:a research framework for deep reinforcement learning [J/OL].https://arxiv.org/abs/1812.06110.
[18]LIU Q,ZHAI J W,ZHONG S,et al.A deep recurrent q-network based on visual attention mechanism [J].Chinese Journal of Computers,2017,40(6):1353-1366.
[19]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms [J/OL].https://arxiv.org/abs/1707.06347.
[1] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[2] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[3] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[4] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[5] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[6] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[7] DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[8] CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[9] ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju. Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning [J]. Computer Science, 2021, 48(7): 40-46.
[10] LI Bei-bei, SONG Jia-rui, DU Qing-yun, HE Jun-jiang. DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things [J]. Computer Science, 2021, 48(7): 47-54.
[11] LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing [J]. Computer Science, 2021, 48(7): 316-323.
[12] WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[13] FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50.
[14] FAN Yan-fang, YUAN Shuang, CAI Ying, CHEN Ruo-yu. Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing [J]. Computer Science, 2021, 48(5): 270-276.
[15] LI Li, ZHENG Jia-li, LUO Wen-cong, QUAN Yi-xuan. RFID Indoor Positioning Algorithm Based on Proximal Policy Optimization [J]. Computer Science, 2021, 48(4): 274-281.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!