Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230300170-9.doi: 10.11896/jsjkx.230300170

• Artificial Intelligenc • Previous Articles     Next Articles

Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization

GAO Yuzhao, NIE Yiming   

  1. National Innovation Institute of Defense Technology,Academic of Military Science,Beijing 100071,China
  • Published:2024-06-06
  • About author:GAO Yuzhao,born in 1994,postgra-duate.His main research interests include multi-agent deep reinforcement learning,UGV and task planning.
    NIE Yiming,born in 1982,associate research fellow.His main research inte-rests include intelligence unmanned systems and UGV.

Abstract: The multi-agent deep reinforcement learning is an extension of the deep reinforcement learning method to the multi-agents problem,in which the multi-agents deep reinforcement learning based on the value function factorization has achieved better performance and is a hotspot for research and application at present.This paper introduces the main principles and framework of the multi-agents deep reinforcement learning based on the value function factorization.Based on the recent related research,three research hotspots are summarized:the problem of improving the fitting ability of mixing network,the problem of improving the convergence effect and the problem of improving the scalability of algorithms,and the reasons for the three hotspot problems are analyzed in terms of algorithm constraints,environmental complexity and neural network limitations.The existing research is classified according to the problems to be solved and the methods to be used,the common points of similar methods are summarized,and the advantages and disadvantages of different methods are analyzed;the application of multi-agent deep reinforcement learning method based on value function decomposition in two hot fields of network node control and unmanned formation control is expounded.

Key words: Multi-agent deep reinforcement learning, Value function factorization, Fitting ability, Convergence effect, Scalability

CLC Number: 

  • TP181
[1]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning[J].Plos One,2017,12(4):e0172395.
[2]LOWE R,WU Y,TAMAR A,et al.Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments[C]//Advances in Neural Information Processing Systems 30(NIPS 2017).2017.
[3]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual Multi-Agent Policy Gradients[C]//The Thirty-second AAAI Conference on Artificial Intelligence.New Orleans,Louisiana,Usa:AAAI Press,2018:2974-2982.
[4]WONG A,BÄCK T,KONONOVA A V,et al.Deep multiagent reinforcement learning:challenges and directions[J].Artificial Intelligence Review,2022,56:5023-5056.
[5]HAO J,YANG T,TANG H,et al.Exploration in Deep Rein-forcement Learning:From Single-Agent to Multiagent Domain[J].IEEE Transactions on Neural Networks and Learning Systems,2023,1(1):1-21.
[6]DU F,DING S F.A survey of multi-agent Reinforcement lear-ning[J].Computer Science,2019,46(8):1-8.
[7]SUN Y,CAO L,CHEN X L,et al.Overview of multi-agent deep reinforcement learning[J].Computer Engineering and Applications,2020,56(5):13-24.
[8]YAN C,XIANG X J,XU X,et al.A Survey on the Scalability and Transferability of Multi-Agent Deep Reinforcement Lear-ning[J].Control and Decision,2023,37(12):3083-3102.
[9]XIONG L Q,CAO L,LAI J,et al.Overview of Multi-agent DeepReinforcement Learning Based on Value Factorization[J].Computer Science,2022,49(9):172-182
[10]LI T,ZHU K,LUONG N C,et al.Applications of Multi-Agent Reinforcement Learning in Future Internet:A Comprehensive Survey[J].IEEE Communications Surveys & Amp;Tutorials,2022,24(2):1240-1279.
[11]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[EB/OL].arXiv:1312.5602,2013.https://ui.adsabs.harvard.edu/abs/2013arXiv1312.5602M.
[12]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[C]//Proceedings of the Thirtieth Aaai Conference on Artificial Intelligence.Phoenix,Arizona:AAAI Press,2016:2094-2100.
[13]SON K,KIM D,KANG W,et al.QTRAN:Learning to Factori-ze with Transformation for Cooperative Multi-Agent Reinforcement learning[C]//International Conference on Machine Learning.2019.
[14]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning[EB/OL].arXiv:1706.05296,2017.https://ui.adsabs.harvard.edu/abs/2017arXiv170605296S.
[15]MAHAJAN A,RASHID T,SAMVELYAN M,et al.MAVEN:Multi-Agent Variational Exploration[C]//Advances in Neural Information Processing Systems 32(NIPS 2019).California:Neural Information Processing Systems(NIPS),2019.
[16]LI B.Hierarchical Architecture for Multi-Agent ReinforcementLearning in Intelligent Game[C]//2022 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2022.
[17]WANG W,YANG T,LIU Y,et al.From Few to More:Large-Scale Dynamic Multiagent Curriculum Learning[C]//Thirty-fourth Aaai Conference on Artificial Intelligence,the Thirty-se-cond Innovative Applications of Artificial Intelligence Conference and the Tenth Aaai Symposium on Educational Advances in Artificial Intelligence.New York:Assoc Advancement Artificial Intelligence,2020:7293-7300.
[18]COHEN A,TENG E,BERGES V,et al.On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning[EB/OL].arXiv:2111.05992,2021.https://ui.adsabs.harvard.edu/abs/2021arXiv211105992C.10.48550/arXiv.2111.05992.
[19]RASHID T,SAMVELYAN M,DE WITT C,et al.MonotonicValue Function Factorisation for Deep Multi-Agent Reinforcement Learning[J].Journal of Machine Learning Research,2020,21.
[20]YANG Y,HAO J,LIAO B,et al.Qatten:A General Framework for Cooperative Multiagent Reinforcement Learning[EB/OL].arXiv:2002.03939,2020.https://ui-adsabs-harvard-edu-s.libyc.nudt.edu.cn:443/abs/2020arXiv200203939Y.
[21]WANG J,REN Z,LIU T,et al.QPLEX:Duplex Dueling Multi-Agent Q-Learning[EB/OL].arXiv:2008.01062,2020.https://ui-adsabs-harvard-edu-s.libyc.nudt.edu.cn:443/abs/2020arXiv200801062W.
[22]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//International Conference on Machine Learning.2016.
[23]SIQI S,MENGWEI Q,JUN L.ResQ:A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization[C]//36th Conference on Neural Information Processing Systems.New York:Curran Associates,2022:5471-5483.
[24]PINA R,DE SILVA V,HOOK J,et al.Residual Q-Networksfor Value Function Factorizing in Multi-Agent Reinforcement Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(2):1534-1544.
[25]RASHID T,FARQUHAR G,PENG B,et al.Weighted QMIX:Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[C]//Advances in Neural Information Processing Systems 33(NEURIPS 2020).New York:Curran Associates,2020:10199-10210.
[26]DU W,DING S,GUO L,et al.Value function factorization with dynamic weighting for deep multi-agent reinforcement learning[J].Information Sciences,2022,615:191-208.
[27]REHMAN H M R U,ON B,NINGOMBAM D D,et al.QSOD:Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning[J].Ieee Access,2021,9:129728-129741.
[28]HALL G,HOLLADAY K.Adaptive Average Exploration inMulti-Agent Reinforcement Learning[C]//2020 AIAA/IEEE 39th Digital Avionics Systems Conference(DASC) Proceedings.New York:IEEE,2020.
[29]JIANG H,SHI D,XUE C,et al.GHGC:Goal-based Hierarchical Group Communication in Multi-Agent Reinforcement Lear-ning[C]//2020 IEEE International Conference on Systems,Man,and Cybernetics(SMC).New York:IEEE,2020:3507-3514.
[30]XIONG L,CAO L,CHEN X,et al.A Value Factorization Me-thod for MARL Based on Correlation between Individuals[J].Mathematical Problems in Engineering,2022,2022:1-8.
[31]BAI Y,GONG C,ZHANG B,et al.Cooperative Multi-AgentReinforcement Learning with Hypergraph Convolution[C]//2022 International Joint Conference on Neural Networks(IJCNHN).New York:IEEE,2022.
[32]YUN W J,YI S,KIM J.Multi-Agent Deep ReinforcementLearning using Attentive Graph Neural Architectures for Real-Time Strategy Games[C]//2021 IEEE International Conference on Systems,Man,and Cybernetics(SMC).New York:IEEE,2021:2967-2972.
[33]SUN W,LEE C,LEE C.DFAC Framework:Factorizing theValue Function via Quantile Mixture for Multi-Agent Distributional Q-Learning[C]//International Conference on Machine Learning.2021.
[34]XU Z,LI D,BAI Y,et al.MMD-MIX:Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning[C]//2021 International Joint Conference on Neural Networks(IJCNHN).New York:IEEE,2021.
[35]HUANG L,FU M,RAO A,et al.A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2024,35(3):4246-4259.
[36]YANG G,CHEN H,ZHANG J,et al.Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning[C]//2022 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2022:1-8.
[37]LIU X,LI X,LI Y,et al.PS-QMix:A Parallel Learning Framework for Q-Mix Using Parameter Server[C]//Advanced Data Mining and Applications(ADMA 2021).2022:341-352.
[38]WAN K,XU X,LI Y.Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning[C]//Artificial Neural Networks and Machine Learning(ICANN 2021).Switzerland:Springer International Publishing AG,2021:544-555.
[39]LIQIN X,LEI C,XILIANG C,et al.Character-Based Value Factorization For MADRL[J].The Computer Journal,2023,66(11):2782-2793.
[40]WU H,ZHANG J,WANG Z,et al.Sub-AVG:Overestimation reduction for cooperative multi-agent reinforcement learning[J].Neurocomputing,2022,474:94-106.
[41]CHAI J,LI W,ZHU Y,et al.UNMAS:Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(4):2093-2104.
[42]NADERIALIZADEH N,HUNG F H,SOLEYMAN S,et al.Graph Convolutional Value Decomposition in Multi-Agent Reinforcement Learning[EB/OL].2020:arXiv:2010.04740.https://ui.adsabs.harvard.edu/abs/2020arXiv201004740N.10.48550/arXiv.2010.04740.
[43]ZHOU T,ZHANG F,SHAO K,et al.Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment[EB/OL].arXiv:2106.00517,2021.https://ui.adsabs.harvard.edu/abs/2021arXiv210600517Z.10.48550/arXiv.2106.00517.
[44]CHEN H,YANG G,ZHANG J,et al.RACA:Relation-AwareCredit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning[C]//2022 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2022.
[45]ZHANG T,XU H,WANG X,et al.Multi-Agent Collaboration via Reward Attribution Decomposition[EB/OL].arXiv:2010.08531,2020.https://ui.adsabs.harvard.edu/abs/2020arXiv201008531Z.
[46]HU S,ZHU F,CHANG X,et al.UPDeT:Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transfor-mers[EB/OL].arXiv:2101.08001,2021.https://ui.adsabs.harvard.edu/abs/2021arXiv210108001H.10.48550/arXiv.2101.08001.
[47]IQBAL S,DE WITT C,PENG B,et al.Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning[C]//International Conference on Machine Learning.San Diego:Jmlr-Journal Machine Learning Research,2021.
[48]LEI C,ZHAO H,ZHOU L,et al.Intelligent Dynamic Spectrum Allocation in MEC-Enabled Cognitive Networks:A Multiagent Reinforcement Learning Approach[J].Wireless Communications and Mobile Computing,2022,2022:1-13.
[49]GUO Z,CHEN Z,LIU P,et al.Multi-Agent ReinforcementLearning-Based Distributed Channel Access for Next Generation Wireless Networks[J].IEEE Journal on Selected Areas in Communications,2022,40(5):1587-1599.
[50]WANG Z,ZONG J,ZHOU Y,et al.Decentralized Multi-Agent Power Control in Wireless Networks With Frequency Reuse[J].IEEE Transactions on Communications,2022,70(3):1666-1681.
[51]HAN C,YAO H,MAI T,et al.QMIX Aided Routing in Social-Based Delay-Tolerant Networks[J].IEEE Transactions on Vehicular Technology,2022,71(2):1952-1963.
[52]MSEDDI A,JAAFAR W,MOUSSAID A,et al.CollaborativeD2D Pairing in Cache-Enabled Underlay Cellular Networks[C]//2021 IEEE Global Communications Conference(globecom).New York:IEEE,2021:1-6.
[53]YU Z,NING N W,ZHENG Y L,et al.Survey of Intelligent Traffic Signal Control Strategies Driven by Deep reinforcement learning[J].Computer Science,2023,50(4):159-171.
[54]WANG Z,ZHU H,HE M,et al.GAN and Multi-Agent DRLBased Decentralized Traffic Light Signal Control[J].IEEE Transactions on Vehicular Technology,2022,71(2):1333-1348.
[55]ZHANG Z,QIAN J,FANG C,et al.Coordinated Control of Distributed Traffic Signal Based on Multiagent Cooperative Game[J].Wireless Communications and Mobile Computing,2021,2021:1-13.
[56]CHEN X,XIONG G,LV Y,et al.A Collaborative Communication-Qmix Approach for Large-scale Networked Traffic Signal Control[C]//2021 IEEE International Intelligent Transportation Systems Conference(ITSC).New York:IEEE,2021:3450-3455.
[57]ZHANG S,ZHUAN X.Distributed Model Predictive Controlfor Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm[J].Symmetry,2022,14(10):2069.
[58]YUAN Z,WU T,WANG Q,et al.T3OMVP:A Transformer-Based Time and Team Reinforcement Learning Scheme for Observation-Constrained Multi-Vehicle Pursuit in Urban Area[J].Electronics,2022,11(9):1339.
[59]ZHOU T,KRIS M L,CREIGHTON D,et al.GMIX:Graph-based spatial-temporal multi-agent reinforcement learning for dynamic electric vehicle dispatching system[J].Transportation Research Part C:Emerging Technologies,2022,144:103886.
[60]YIN Y,GUO Y,SU Q,et al.Task Allocation of Multiple Unmanned Aerial Vehicles Based on Deep Transfer Reinforcement Learning[J].Drones,2022,6(8):215.
[61]WANG J,ZHANG X,HE X,et al.Bandwidth Allocation andTrajectory Control in UAV-Assisted IoV Edge Computing Using Multiagent Reinforcement Learning[J].IEEE Transactions on Reliability,2023,72(2):599-608.
[62]DING R,CHEN J,WU W,et al.Packet Routing in Dynamic Multi-Hop UAV Relay Network:A Multi-Agent Learning Approach[J].IEEE Transactions on Vehicular Technology,2022,71(9):10059-10072.
[63]RITZ F,PHAN T,MÜLLER R,et al.SAT-MARL:Specifica-tion Aware Training in Multi-Agent Reinforcement Learning[C]//Proceedings of the 13th International Conference on Agents and Artificial Intelligence:SCITEPRESS-Science and Technology Publications.2021:28-37.
[64]CHOI H,KIM J,HAN Y,et al.MARL-Based CooperativeMulti-AGV Control in Warehouse Systems[J].IEEE Access,2022,10:100478-100488.
[65]HANHAN Z,TIAN L,VANEET A.PAC:Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning[C]//36th Conference on Neural Information Processing Systems.New York:Curran Associates,2022:15757-15769.
[66]WANG Y,HAN B,WANG T,et al.Off-Policy Multi-Agent Decomposed Policy Gradients[EB/OL].arXiv:2007.12322.https://ui.adsabs.harvard.edu/abs/2020arXiv200712322W.10.48550/arXiv.2007.12322.
[67]WANG T,WANG J,ZHENG C,et al.Learning Nearly Decomposable Value Functions Via Communication Minimization[EB/OL].arXiv:1910.05366.https://ui.adsabs.harvard.edu/abs/2019arXiv191005366W.
[1] Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
[2] LI Bei, WU Hao, HE Xiaowei, WANG Bin, XU Ergang. Survey of Storage Scalability in Blockchain Systems [J]. Computer Science, 2023, 50(1): 318-333.
[3] CHAO Le-men, WANG Rui. Data Science Platform:Features,Technologies and Trends [J]. Computer Science, 2021, 48(8): 1-12.
[4] LI Ying, YU Ya-xin, ZHANG Hong-yu, LI Zhen-guo. High Trusted Cloud Storage Model Based on TBchain Blockchain [J]. Computer Science, 2020, 47(9): 330-338.
[5] ZHUANG Yuan, GUO Qiang, ZHANG Jie, ZENG Yun-hui. Large Scalability Method of 2D Computation on Shenwei Many-core [J]. Computer Science, 2020, 47(8): 87-92.
[6] YE Shao-jie, WANG Xiao-yi, XU Cai-chao, SUN Jian-ling. BitXHub:Side-relay Chain Based Heterogeneous Blockchain Interoperable Platform [J]. Computer Science, 2020, 47(6): 294-302.
[7] WU Bin-feng. Design of IoT Middleware Based on Microservices Architecture [J]. Computer Science, 2019, 46(6A): 580-584.
[8] ZHAO Xing-wang,LIANG Ji-ye,GUO Lan-jie. Collaborative Filtering Recommendation Algorithm Based on Space Transformation [J]. Computer Science, 2018, 45(7): 16-21.
[9] ZHANG Shi-jiang, CHAI Jing, CHEN Ze-hua and HE Hai-wu. Byzantine Consensus Algorithm Based on Gossip Protocol [J]. Computer Science, 2018, 45(2): 20-24.
[10] HAI Mo and ZHANG You. Performance Comparison of Clustering Algorithms in Spark [J]. Computer Science, 2017, 44(Z6): 414-418.
[11] ZHOU Qiang, XIE Jing and ZHAO Hua-ming. Architecture and Solution for Large Web Sites [J]. Computer Science, 2017, 44(Z6): 587-590.
[12] TANG Bing, Laurent BOBELIN and HE Hai-wu. Parallel Algorithm of Nonnegative Matrix Factorization Based on Hybrid MPI and OpenMP Programming Model [J]. Computer Science, 2017, 44(3): 51-54.
[13] LIU Lin and ZHOU Jian-tao. Review for Research of Control Plane in Software-defined Network [J]. Computer Science, 2017, 44(2): 75-81.
[14] ZHENG Sheng and LI Tong. Data Placement Algorithm for Large-scale Storage System [J]. Computer Science, 2013, 40(Z11): 270-273.
[15] . Parallel Benchmark for Evaluating Parallel Simulation Engine [J]. Computer Science, 2013, 40(3): 41-45.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!