基于值函数分解的多智能体深度强化学习方法研究综述

doi:10.11896/jsjkx.230300170

摘要/Abstract

摘要： 多智能体深度强化学习方法是深度强化学习方法在多智能体问题上的扩展,其中基于值函数分解的多智能体深度强化学习方法取得了较好的表现效果,是目前研究和应用的热点。文中介绍了基于值函数分解的多智能体深度强化学习方法的主要原理和框架;根据近期相关研究,总结出了提高混合网络拟合能力问题、提高收敛效果问题和提高算法可扩展性问题3个研究热点,从算法约束、环境复杂度、神经网络限制等方面分析了3个热点问题产生的原因;根据拟解决的问题和使用的方法对现有研究进行了分类梳理,总结了同类方法的共同点,分析了不同方法的优缺点;对基于值函数分解的多智能体深度强化学习方法在网络节点控制、无人编队控制两个热点领域的应用进行了阐述。

关键词: 多智能体深度强化学习, 值函数分解, 拟合能力, 收敛效果, 可扩展性

Abstract: The multi-agent deep reinforcement learning is an extension of the deep reinforcement learning method to the multi-agents problem,in which the multi-agents deep reinforcement learning based on the value function factorization has achieved better performance and is a hotspot for research and application at present.This paper introduces the main principles and framework of the multi-agents deep reinforcement learning based on the value function factorization.Based on the recent related research,three research hotspots are summarized:the problem of improving the fitting ability of mixing network,the problem of improving the convergence effect and the problem of improving the scalability of algorithms,and the reasons for the three hotspot problems are analyzed in terms of algorithm constraints,environmental complexity and neural network limitations.The existing research is classified according to the problems to be solved and the methods to be used,the common points of similar methods are summarized,and the advantages and disadvantages of different methods are analyzed;the application of multi-agent deep reinforcement learning method based on value function decomposition in two hot fields of network node control and unmanned formation control is expounded.

Key words: Multi-agent deep reinforcement learning, Value function factorization, Fitting ability, Convergence effect, Scalability

中图分类号:

TP181

高玉钊, 聂一鸣. 基于值函数分解的多智能体深度强化学习方法研究综述[J]. 计算机科学, 2024, 51(6A): 230300170-9. https://doi.org/10.11896/jsjkx.230300170

GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization[J]. Computer Science, 2024, 51(6A): 230300170-9. https://doi.org/10.11896/jsjkx.230300170

参考文献

[1]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning[J].Plos One,2017,12(4):e0172395.
[2]LOWE R,WU Y,TAMAR A,et al.Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments[C]//Advances in Neural Information Processing Systems 30(NIPS 2017).2017.
[3]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual Multi-Agent Policy Gradients[C]//The Thirty-second AAAI Conference on Artificial Intelligence.New Orleans,Louisiana,Usa:AAAI Press,2018:2974-2982.
[4]WONG A,BÄCK T,KONONOVA A V,et al.Deep multiagent reinforcement learning:challenges and directions[J].Artificial Intelligence Review,2022,56:5023-5056.
[5]HAO J,YANG T,TANG H,et al.Exploration in Deep Rein-forcement Learning:From Single-Agent to Multiagent Domain[J].IEEE Transactions on Neural Networks and Learning Systems,2023,1(1):1-21.
[6]DU F,DING S F.A survey of multi-agent Reinforcement lear-ning[J].Computer Science,2019,46(8):1-8.
[7]SUN Y,CAO L,CHEN X L,et al.Overview of multi-agent deep reinforcement learning[J].Computer Engineering and Applications,2020,56(5):13-24.
[8]YAN C,XIANG X J,XU X,et al.A Survey on the Scalability and Transferability of Multi-Agent Deep Reinforcement Lear-ning[J].Control and Decision,2023,37(12):3083-3102.
[9]XIONG L Q,CAO L,LAI J,et al.Overview of Multi-agent DeepReinforcement Learning Based on Value Factorization[J].Computer Science,2022,49(9):172-182
[10]LI T,ZHU K,LUONG N C,et al.Applications of Multi-Agent Reinforcement Learning in Future Internet:A Comprehensive Survey[J].IEEE Communications Surveys & Amp;Tutorials,2022,24(2):1240-1279.
[11]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[EB/OL].arXiv:1312.5602,2013.https://ui.adsabs.harvard.edu/abs/2013arXiv1312.5602M.
[12]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[C]//Proceedings of the Thirtieth Aaai Conference on Artificial Intelligence.Phoenix,Arizona:AAAI Press,2016:2094-2100.
[13]SON K,KIM D,KANG W,et al.QTRAN:Learning to Factori-ze with Transformation for Cooperative Multi-Agent Reinforcement learning[C]//International Conference on Machine Learning.2019.
[14]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning[EB/OL].arXiv:1706.05296,2017.https://ui.adsabs.harvard.edu/abs/2017arXiv170605296S.
[15]MAHAJAN A,RASHID T,SAMVELYAN M,et al.MAVEN:Multi-Agent Variational Exploration[C]//Advances in Neural Information Processing Systems 32(NIPS 2019).California:Neural Information Processing Systems(NIPS),2019.
[16]LI B.Hierarchical Architecture for Multi-Agent ReinforcementLearning in Intelligent Game[C]//2022 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2022.
[17]WANG W,YANG T,LIU Y,et al.From Few to More:Large-Scale Dynamic Multiagent Curriculum Learning[C]//Thirty-fourth Aaai Conference on Artificial Intelligence,the Thirty-se-cond Innovative Applications of Artificial Intelligence Conference and the Tenth Aaai Symposium on Educational Advances in Artificial Intelligence.New York:Assoc Advancement Artificial Intelligence,2020:7293-7300.
[18]COHEN A,TENG E,BERGES V,et al.On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning[EB/OL].arXiv:2111.05992,2021.https://ui.adsabs.harvard.edu/abs/2021arXiv211105992C.10.48550/arXiv.2111.05992.
[19]RASHID T,SAMVELYAN M,DE WITT C,et al.MonotonicValue Function Factorisation for Deep Multi-Agent Reinforcement Learning[J].Journal of Machine Learning Research,2020,21.
[20]YANG Y,HAO J,LIAO B,et al.Qatten:A General Framework for Cooperative Multiagent Reinforcement Learning[EB/OL].arXiv:2002.03939,2020.https://ui-adsabs-harvard-edu-s.libyc.nudt.edu.cn:443/abs/2020arXiv200203939Y.
[21]WANG J,REN Z,LIU T,et al.QPLEX:Duplex Dueling Multi-Agent Q-Learning[EB/OL].arXiv:2008.01062,2020.https://ui-adsabs-harvard-edu-s.libyc.nudt.edu.cn:443/abs/2020arXiv200801062W.
[22]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//International Conference on Machine Learning.2016.
[23]SIQI S,MENGWEI Q,JUN L.ResQ:A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization[C]//36th Conference on Neural Information Processing Systems.New York:Curran Associates,2022:5471-5483.
[24]PINA R,DE SILVA V,HOOK J,et al.Residual Q-Networksfor Value Function Factorizing in Multi-Agent Reinforcement Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(2):1534-1544.
[25]RASHID T,FARQUHAR G,PENG B,et al.Weighted QMIX:Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[C]//Advances in Neural Information Processing Systems 33(NEURIPS 2020).New York:Curran Associates,2020:10199-10210.
[26]DU W,DING S,GUO L,et al.Value function factorization with dynamic weighting for deep multi-agent reinforcement learning[J].Information Sciences,2022,615:191-208.
[27]REHMAN H M R U,ON B,NINGOMBAM D D,et al.QSOD:Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning[J].Ieee Access,2021,9:129728-129741.
[28]HALL G,HOLLADAY K.Adaptive Average Exploration inMulti-Agent Reinforcement Learning[C]//2020 AIAA/IEEE 39th Digital Avionics Systems Conference(DASC) Proceedings.New York:IEEE,2020.
[29]JIANG H,SHI D,XUE C,et al.GHGC:Goal-based Hierarchical Group Communication in Multi-Agent Reinforcement Lear-ning[C]//2020 IEEE International Conference on Systems,Man,and Cybernetics(SMC).New York:IEEE,2020:3507-3514.
[30]XIONG L,CAO L,CHEN X,et al.A Value Factorization Me-thod for MARL Based on Correlation between Individuals[J].Mathematical Problems in Engineering,2022,2022:1-8.
[31]BAI Y,GONG C,ZHANG B,et al.Cooperative Multi-AgentReinforcement Learning with Hypergraph Convolution[C]//2022 International Joint Conference on Neural Networks(IJCNHN).New York:IEEE,2022.
[32]YUN W J,YI S,KIM J.Multi-Agent Deep ReinforcementLearning using Attentive Graph Neural Architectures for Real-Time Strategy Games[C]//2021 IEEE International Conference on Systems,Man,and Cybernetics(SMC).New York:IEEE,2021:2967-2972.
[33]SUN W,LEE C,LEE C.DFAC Framework:Factorizing theValue Function via Quantile Mixture for Multi-Agent Distributional Q-Learning[C]//International Conference on Machine Learning.2021.
[34]XU Z,LI D,BAI Y,et al.MMD-MIX:Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning[C]//2021 International Joint Conference on Neural Networks(IJCNHN).New York:IEEE,2021.
[35]HUANG L,FU M,RAO A,et al.A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(3):4246-4259.
[36]YANG G,CHEN H,ZHANG J,et al.Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning[C]//2022 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2022:1-8.
[37]LIU X,LI X,LI Y,et al.PS-QMix:A Parallel Learning Framework for Q-Mix Using Parameter Server[C]//Advanced Data Mining and Applications(ADMA 2021).2022:341-352.
[38]WAN K,XU X,LI Y.Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning[C]//Artificial Neural Networks and Machine Learning(ICANN 2021).Switzerland:Springer International Publishing AG,2021:544-555.
[39]LIQIN X,LEI C,XILIANG C,et al.Character-Based Value Factorization For MADRL[J].The Computer Journal,2023,66(11):2782-2793.
[40]WU H,ZHANG J,WANG Z,et al.Sub-AVG:Overestimation reduction for cooperative multi-agent reinforcement learning[J].Neurocomputing,2022,474:94-106.
[41]CHAI J,LI W,ZHU Y,et al.UNMAS:Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(4):2093-2104.
[42]NADERIALIZADEH N,HUNG F H,SOLEYMAN S,et al.Graph Convolutional Value Decomposition in Multi-Agent Reinforcement Learning[EB/OL].2020:arXiv:2010.04740.https://ui.adsabs.harvard.edu/abs/2020arXiv201004740N.10.48550/arXiv.2010.04740.
[43]ZHOU T,ZHANG F,SHAO K,et al.Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment[EB/OL].arXiv:2106.00517,2021.https://ui.adsabs.harvard.edu/abs/2021arXiv210600517Z.10.48550/arXiv.2106.00517.
[44]CHEN H,YANG G,ZHANG J,et al.RACA:Relation-AwareCredit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning[C]//2022 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2022.
[45]ZHANG T,XU H,WANG X,et al.Multi-Agent Collaboration via Reward Attribution Decomposition[EB/OL].arXiv:2010.08531,2020.https://ui.adsabs.harvard.edu/abs/2020arXiv201008531Z.
[46]HU S,ZHU F,CHANG X,et al.UPDeT:Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transfor-mers[EB/OL].arXiv:2101.08001,2021.https://ui.adsabs.harvard.edu/abs/2021arXiv210108001H.10.48550/arXiv.2101.08001.
[47]IQBAL S,DE WITT C,PENG B,et al.Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning[C]//International Conference on Machine Learning.San Diego:Jmlr-Journal Machine Learning Research,2021.
[48]LEI C,ZHAO H,ZHOU L,et al.Intelligent Dynamic Spectrum Allocation in MEC-Enabled Cognitive Networks:A Multiagent Reinforcement Learning Approach[J].Wireless Communications and Mobile Computing,2022,2022:1-13.
[49]GUO Z,CHEN Z,LIU P,et al.Multi-Agent ReinforcementLearning-Based Distributed Channel Access for Next Generation Wireless Networks[J].IEEE Journal on Selected Areas in Communications,2022,40(5):1587-1599.
[50]WANG Z,ZONG J,ZHOU Y,et al.Decentralized Multi-Agent Power Control in Wireless Networks With Frequency Reuse[J].IEEE Transactions on Communications,2022,70(3):1666-1681.
[51]HAN C,YAO H,MAI T,et al.QMIX Aided Routing in Social-Based Delay-Tolerant Networks[J].IEEE Transactions on Vehicular Technology,2022,71(2):1952-1963.
[52]MSEDDI A,JAAFAR W,MOUSSAID A,et al.CollaborativeD2D Pairing in Cache-Enabled Underlay Cellular Networks[C]//2021 IEEE Global Communications Conference(globecom).New York:IEEE,2021:1-6.
[53]YU Z,NING N W,ZHENG Y L,et al.Survey of Intelligent Traffic Signal Control Strategies Driven by Deep reinforcement learning[J].Computer Science,2023,50(4):159-171.
[54]WANG Z,ZHU H,HE M,et al.GAN and Multi-Agent DRLBased Decentralized Traffic Light Signal Control[J].IEEE Transactions on Vehicular Technology,2022,71(2):1333-1348.
[55]ZHANG Z,QIAN J,FANG C,et al.Coordinated Control of Distributed Traffic Signal Based on Multiagent Cooperative Game[J].Wireless Communications and Mobile Computing,2021,2021:1-13.
[56]CHEN X,XIONG G,LV Y,et al.A Collaborative Communication-Qmix Approach for Large-scale Networked Traffic Signal Control[C]//2021 IEEE International Intelligent Transportation Systems Conference(ITSC).New York:IEEE,2021:3450-3455.
[57]ZHANG S,ZHUAN X.Distributed Model Predictive Controlfor Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm[J].Symmetry,2022,14(10):2069.
[58]YUAN Z,WU T,WANG Q,et al.T3OMVP:A Transformer-Based Time and Team Reinforcement Learning Scheme for Observation-Constrained Multi-Vehicle Pursuit in Urban Area[J].Electronics,2022,11(9):1339.
[59]ZHOU T,KRIS M L,CREIGHTON D,et al.GMIX:Graph-based spatial-temporal multi-agent reinforcement learning for dynamic electric vehicle dispatching system[J].Transportation Research Part C:Emerging Technologies,2022,144:103886.
[60]YIN Y,GUO Y,SU Q,et al.Task Allocation of Multiple Unmanned Aerial Vehicles Based on Deep Transfer Reinforcement Learning[J].Drones,2022,6(8):215.
[61]WANG J,ZHANG X,HE X,et al.Bandwidth Allocation andTrajectory Control in UAV-Assisted IoV Edge Computing Using Multiagent Reinforcement Learning[J].IEEE Transactions on Reliability,2023,72(2):599-608.
[62]DING R,CHEN J,WU W,et al.Packet Routing in Dynamic Multi-Hop UAV Relay Network:A Multi-Agent Learning Approach[J].IEEE Transactions on Vehicular Technology,2022,71(9):10059-10072.
[63]RITZ F,PHAN T,MÜLLER R,et al.SAT-MARL:Specifica-tion Aware Training in Multi-Agent Reinforcement Learning[C]//Proceedings of the 13th International Conference on Agents and Artificial Intelligence:SCITEPRESS-Science and Technology Publications.2021:28-37.
[64]CHOI H,KIM J,HAN Y,et al.MARL-Based CooperativeMulti-AGV Control in Warehouse Systems[J].IEEE Access,2022,10:100478-100488.
[65]HANHAN Z,TIAN L,VANEET A.PAC:Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning[C]//36th Conference on Neural Information Processing Systems.New York:Curran Associates,2022:15757-15769.
[66]WANG Y,HAN B,WANG T,et al.Off-Policy Multi-Agent Decomposed Policy Gradients[EB/OL].arXiv:2007.12322.https://ui.adsabs.harvard.edu/abs/2020arXiv200712322W.10.48550/arXiv.2007.12322.
[67]WANG T,WANG J,ZHENG C,et al.Learning Nearly Decomposable Value Functions Via Communication Minimization[EB/OL].arXiv:1910.05366.https://ui.adsabs.harvard.edu/abs/2019arXiv191005366W.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed