一种新的基于Sigmoid函数的分布式深度Q网络概率分布更新策略

doi:10.11896/jsjkx.240500082

计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 277-285.doi: 10.11896/jsjkx.240500082

一种新的基于Sigmoid函数的分布式深度Q网络概率分布更新策略

高卓凡, 郭文利

中国航空工业集团公司洛阳电光设备研究所河南洛阳 471000

收稿日期:2024-05-21 修回日期:2024-09-14 出版日期:2024-12-15 发布日期:2024-12-10
通讯作者: 郭文利(13526906198@139.com)
作者简介:(2459194015@qq.com)
基金资助:
航空科学基金(2023Z015013001,2022Z015013002)

Novel Probability Distribution Update Strategy for Distributed Deep Q-Networks Based on Sigmoid Function

GAO Zhuofan, GUO Wenli

China Aviation Industry Corporation Luoyang Electric and Optical Equipment Research Institute, Luoyang, Henan 471000, China

Received:2024-05-21 Revised:2024-09-14 Online:2024-12-15 Published:2024-12-10
About author:GAO Zhuofan,born in 1999,postgra-duate.His main research interests include deep reinforcement learning and artificial intelligence.
GUO Wenli,born in 1975,Ph.D,professor.His main research interests include overall design of airborne weapon control management and artificial intelligence.
Supported by:
Aeronautical Science Fund(2022Z015013001,2022Z015013002).

摘要/Abstract

摘要： 分布式深度Q网络(Distributed-Deep Q Network,Dist-DQN)是在传统期望值深度Q网络的基础上将离散的动作奖励在一个区间上连续化,通过不断更新支集区间的概率分布来解决复杂环境的随机奖励问题。奖励概率的分布更新策略作为Dist-DQN实现的重要函数,会显著影响智能体在环境中的学习效率。针对上述问题,提出了一种新的Sig-Dist-DQN概率分布更新策略。该策略综合考虑奖励概率支集之间的相关性强弱关系,提高与观察奖励强相关支集的概率质量更新速率,同时降低弱相关支集概率质量的更新速率。在OpenAI gym提供的环境下进行实验,结果表明,指数更新和调和序列更新策略在每次训练的差异性较大,而Sig-Dist-DQN策略的训练图像非常稳定。相较于指数更新和调和序列更新策略,应用Sig-Dist-DQN的智能体在学习过程中损失函数的收敛速度和收敛过程的稳定性都有显著提高。

关键词: 分布式深度Q网络, 奖励区间连续化, 概率分布更新, 学习效率, 训练稳定性

Abstract: Based on expected value DQN,distributed deep Q network(Dist-DQN) can solve the stochastic reward problem in complex environments by continuing discrete action reward into an interval and continuously updating the probability distribution of support intervals.The distribution update strategy of reward probability,as an important function for Dist-DQN implementation,significantly affect the learning efficiency of agents in the environment.A new Sig-Dist-DQN probability distribution update strategy is proposed to address the above issues.This strategy comprehensively considers the strength of the correlation between reward probability subsets,improving the probability quality update rate of strongly correlated subsets while reducing the probability quality update rate of weakly correlated subsets.In the environment provided by OpenAI Gym,experiments are conducted,and the exponential update and harmonic series update strategies show significant differences in each training session,while the training images of the Sig-Dist-DQN strategy are very stable.Compared with the exponential update and harmonic sequence update strategies,the intelligent agent applying Sig-Dist-DQN has significantly improved the convergence speed and stability of the loss function during the learning process.

Key words: Distributed deep Q network, Continuation of reward intervals, Updating the probability distribution, Learning rate, Training stability

中图分类号:

TP181

高卓凡, 郭文利. 一种新的基于Sigmoid函数的分布式深度Q网络概率分布更新策略[J]. 计算机科学, 2024, 51(12): 277-285. https://doi.org/10.11896/jsjkx.240500082

GAO Zhuofan, GUO Wenli. Novel Probability Distribution Update Strategy for Distributed Deep Q-Networks Based on Sigmoid Function[J]. Computer Science, 2024, 51(12): 277-285. https://doi.org/10.11896/jsjkx.240500082

参考文献

[1]ARULKUMARAN K,DEISENROTH M P,BRUNDAGE M,et al.Deep reinforcement learning:A brief survey[J].IEEE Signal Processing Magazine,2017,34(6):26-38.
[2]HUANG Z Y,WU H L,WANG Z,et al.DQN Algorithm Based on Averaged Neural Network Parameters[J].Computer Science,2021,48(4):223-228.
[3]LI Q R,GENG X.Robot path planning based on improved DQN algorithm[J].Computer Engineering,2023,49(12):111-120.
[4]WANG Y,REN T,FAN Z.Air combat maneuver decision-ma-king of unmanned aerial vehicle based on guided Minimax-DDQN[J].Computer Applications,2023,43(8):2636-2643.
[5]SHI D X,PENG Y X,YANG H H,et al.DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Lear-ning[J].Computer Science,2024,51(2):268-277.
[6]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[7]TANG X,CHEN J,LIU T,et al.Distributed deep reinforcement learning-based energy and emission management strategy for hybrid electric vehicles[J].IEEE Transactions on Vehicular Technology,2021,70(10):9922-9934.
[8]YANG H,ZHAO J,LAM K Y,et al.Distributed deep reinforcement learning-based spectrum and power allocation for heterogeneous networks[J].IEEE Transactions on Wireless Communications,2022,21(9):6935-6948.
[9]SAMSAMI M R,ALIMADAD H.Distributed deep reinforcement learning:An overview[J].arXiv:2011.11012,2020.
[10]AL-ABBASI A O,GHOSH A,AGGARWAL V.Deeppool:Distributed model-free algorithm for ride-sharing using deep reinforcement learning[J].IEEE Transactions on Intelligent Transportation Systems,2019,20(12):4714-4727.
[11]VERDECCHIA P,CAVALLINI C,ANGELI F.Advances in the treatment strategies in hypertension:present and future[J].Journal of Cardiovascular Development and Disease,2022,9(3):72.
[12]RYSZ J,FRANCZYK B,RYSZ-GÓRZYŃSKA M,et al.Pharmacogenomics of hypertension treatment[J].International Journal of Molecular Sciences,2020,21(13):4709.
[13]ZHAO Z.Variants of Bellman equation on reinforcement lear-ning problems[C]//2nd International Conference on Artificial Intelligence,Automation,and High-Performance Computing(AIAHPC 2022).SPIE,2022,12348:470-481.
[14]ZHU L,WEI H,SONG X,et al.A Spatial Interpolation Method Based on BP NeuralNetwork with Bellman Equation[C]//Paci-fic Rim International Conference on Artificial Intelligence.Singapore:Springer Nature Singapore,2023:3-15.
[15]KIM J,YANG I.Hamilton-Jacobi-Bellman equations for Q-learning in continuous time[C]//Learning for Dynamics and Control.PMLR,2020:739-748.
[16]DU W,DING S,ZHANG C,et al.Modified action decoder using Bayesian reasoning for multi-agent deep reinforcement learning[J].International Journal of Machine Learning and Cybernetics,2021,12:2947-2961.
[17]ALEXANDER Z,BRANDONB.Deep Reinforcement Learningin Action[M].Manning Publications,2020:158-161.
[18]ISLAM M,CHEN G,JIN S.An overview of neural network[J].American Journal of Neural Networks and Applications,2019,5(1):7-11.
[19]KAISER L,BABAEIZADEH M,MILOS P,et al.Model-based reinforcement learning foratari[J].arXiv:1903.00374,2019.
[20]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
[21]SEWAK M,SEWAK M.Deep q network(dqn),double dqn,and dueling dqn:A step towards general artificial intelligence[J].Deep Reinforcement Learning:Frontiers of Artificial Intelligence,2019:95-108.
[22]SOYDANER D.Attention mechanism in neural networks:where it comes and where it goes[J].Neural Computing and Applications,2022,34(16):13371-13385.
[23]NIU Z,ZHONG G,YU H.A review on the attention mechanism of deep learning[J].Neurocomputing,2021,452:48-62.
[24]WONG A,BÄCK T,KONONOVA A V,et al.Deep multiagent reinforcement learning:Challenges and directions[J].Artificial Intelligence Review,2023,56(6):5023-5056.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种新的基于Sigmoid函数的分布式深度Q网络概率分布更新策略

Novel Probability Distribution Update Strategy for Distributed Deep Q-Networks Based on Sigmoid Function

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0