Computer Science ›› 2024, Vol. 51 ›› Issue (12): 277-285.doi: 10.11896/jsjkx.240500082

• Artificial Intelligence • Previous Articles     Next Articles

Novel Probability Distribution Update Strategy for Distributed Deep Q-Networks Based on Sigmoid Function

GAO Zhuofan, GUO Wenli   

  1. China Aviation Industry Corporation Luoyang Electric and Optical Equipment Research Institute, Luoyang, Henan 471000, China
  • Received:2024-05-21 Revised:2024-09-14 Online:2024-12-15 Published:2024-12-10
  • About author:GAO Zhuofan,born in 1999,postgra-duate.His main research interests include deep reinforcement learning and artificial intelligence.
    GUO Wenli,born in 1975,Ph.D,professor.His main research interests include overall design of airborne weapon control management and artificial intelligence.
  • Supported by:
    Aeronautical Science Fund(2022Z015013001,2022Z015013002).

Abstract: Based on expected value DQN,distributed deep Q network(Dist-DQN) can solve the stochastic reward problem in complex environments by continuing discrete action reward into an interval and continuously updating the probability distribution of support intervals.The distribution update strategy of reward probability,as an important function for Dist-DQN implementation,significantly affect the learning efficiency of agents in the environment.A new Sig-Dist-DQN probability distribution update strategy is proposed to address the above issues.This strategy comprehensively considers the strength of the correlation between reward probability subsets,improving the probability quality update rate of strongly correlated subsets while reducing the probability quality update rate of weakly correlated subsets.In the environment provided by OpenAI Gym,experiments are conducted,and the exponential update and harmonic series update strategies show significant differences in each training session,while the training images of the Sig-Dist-DQN strategy are very stable.Compared with the exponential update and harmonic sequence update strategies,the intelligent agent applying Sig-Dist-DQN has significantly improved the convergence speed and stability of the loss function during the learning process.

Key words: Distributed deep Q network, Continuation of reward intervals, Updating the probability distribution, Learning rate, Training stability

CLC Number: 

  • TP181
[1]ARULKUMARAN K,DEISENROTH M P,BRUNDAGE M,et al.Deep reinforcement learning:A brief survey[J].IEEE Signal Processing Magazine,2017,34(6):26-38.
[2]HUANG Z Y,WU H L,WANG Z,et al.DQN Algorithm Based on Averaged Neural Network Parameters[J].Computer Science,2021,48(4):223-228.
[3]LI Q R,GENG X.Robot path planning based on improved DQN algorithm[J].Computer Engineering,2023,49(12):111-120.
[4]WANG Y,REN T,FAN Z.Air combat maneuver decision-ma-king of unmanned aerial vehicle based on guided Minimax-DDQN[J].Computer Applications,2023,43(8):2636-2643.
[5]SHI D X,PENG Y X,YANG H H,et al.DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Lear-ning[J].Computer Science,2024,51(2):268-277.
[6]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[7]TANG X,CHEN J,LIU T,et al.Distributed deep reinforcement learning-based energy and emission management strategy for hybrid electric vehicles[J].IEEE Transactions on Vehicular Technology,2021,70(10):9922-9934.
[8]YANG H,ZHAO J,LAM K Y,et al.Distributed deep reinforcement learning-based spectrum and power allocation for heterogeneous networks[J].IEEE Transactions on Wireless Communications,2022,21(9):6935-6948.
[9]SAMSAMI M R,ALIMADAD H.Distributed deep reinforcement learning:An overview[J].arXiv:2011.11012,2020.
[10]AL-ABBASI A O,GHOSH A,AGGARWAL V.Deeppool:Distributed model-free algorithm for ride-sharing using deep reinforcement learning[J].IEEE Transactions on Intelligent Transportation Systems,2019,20(12):4714-4727.
[11]VERDECCHIA P,CAVALLINI C,ANGELI F.Advances in the treatment strategies in hypertension:present and future[J].Journal of Cardiovascular Development and Disease,2022,9(3):72.
[12]RYSZ J,FRANCZYK B,RYSZ-GÓRZYŃSKA M,et al.Pharmacogenomics of hypertension treatment[J].International Journal of Molecular Sciences,2020,21(13):4709.
[13]ZHAO Z.Variants of Bellman equation on reinforcement lear-ning problems[C]//2nd International Conference on Artificial Intelligence,Automation,and High-Performance Computing(AIAHPC 2022).SPIE,2022,12348:470-481.
[14]ZHU L,WEI H,SONG X,et al.A Spatial Interpolation Method Based on BP NeuralNetwork with Bellman Equation[C]//Paci-fic Rim International Conference on Artificial Intelligence.Singapore:Springer Nature Singapore,2023:3-15.
[15]KIM J,YANG I.Hamilton-Jacobi-Bellman equations for Q-learning in continuous time[C]//Learning for Dynamics and Control.PMLR,2020:739-748.
[16]DU W,DING S,ZHANG C,et al.Modified action decoder using Bayesian reasoning for multi-agent deep reinforcement learning[J].International Journal of Machine Learning and Cybernetics,2021,12:2947-2961.
[17]ALEXANDER Z,BRANDONB.Deep Reinforcement Learningin Action[M].Manning Publications,2020:158-161.
[18]ISLAM M,CHEN G,JIN S.An overview of neural network[J].American Journal of Neural Networks and Applications,2019,5(1):7-11.
[19]KAISER L,BABAEIZADEH M,MILOS P,et al.Model-based reinforcement learning foratari[J].arXiv:1903.00374,2019.
[20]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
[21]SEWAK M,SEWAK M.Deep q network(dqn),double dqn,and dueling dqn:A step towards general artificial intelligence[J].Deep Reinforcement Learning:Frontiers of Artificial Intelligence,2019:95-108.
[22]SOYDANER D.Attention mechanism in neural networks:where it comes and where it goes[J].Neural Computing and Applications,2022,34(16):13371-13385.
[23]NIU Z,ZHONG G,YU H.A review on the attention mechanism of deep learning[J].Neurocomputing,2021,452:48-62.
[24]WONG A,BÄCK T,KONONOVA A V,et al.Deep multiagent reinforcement learning:Challenges and directions[J].Artificial Intelligence Review,2023,56(6):5023-5056.
[1] GAO Beibei, ZHANG Yangsen. Polyphone Disambiguation Based on Pre-trained Model [J]. Computer Science, 2024, 51(11): 273-279.
[2] QI Hui, SHI Ying, LI Deng-ao, MU Xiao-fang, HOU Ming-xing. Software Reliability Prediction Based on Continuous Deep Confidence Neural Network [J]. Computer Science, 2021, 48(5): 86-90.
[3] CHEN Yuan, HUI Yan, HU Xiu-hua. Background-aware Correlation Filter Tracking Algorithm with Adaptive Scaling and Learning Rate Adjustment [J]. Computer Science, 2021, 48(5): 177-183.
[4] JIAN Cheng-feng, KUANG Xiang, ZHANG Mei-yu. Improved Learning Model for Cloud Computing Swarm Optimization Time Efficiency [J]. Computer Science, 2019, 46(5): 290-297.
[5] LIU Zhen ZHOU Ming-Tian (College of Computer Science and Engineering,UESTC, Chengdu 610054). [J]. Computer Science, 2008, 35(1): 171-175.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!