Computer Science ›› 2022, Vol. 49 ›› Issue (9): 172-182.doi: 10.11896/jsjkx.210800112

• Artificial Intelligence • Previous Articles     Next Articles

Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization

XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang   

  1. College of Command and Control Engineering,Army Engineering University,Nanjing 210007,China
  • Received:2021-08-12 Revised:2021-12-29 Online:2022-09-15 Published:2022-09-09
  • About author:XIONG Li-qin,born in 1997,postgra-duate.Her main research interests include multi-agent deep reinforcement and intelligent command and control.
    CAO Lei,born in 1965,Ph.D,professor,Ph.D supervisor.His main research interests include machine learning,command information system and intelligent decision making.

Abstract: Multi-agent deep reinforcement learning based on value factorization is one of many multi-agent deep reinforcement learning algorithms,and it is also a research hotspot in the field of multi-agent deep reinforcement learning.Under some constraints,the joint action value function of multi-agent system is factorized into a certain combination of individual action value function,which is able to effectively solve the problems of environment instability and exponential explosion of action space in multi-agent system.Firstly,this paper explains why value function factorization should be carried out and introduces the basic theory of multi-agent deep reinforcement learning.Secondly,according to whether to introduce other mechanisms and the diffe-rence of introduced mechanism,multi-agent deep reinforcement learning(MADRL)algorithm based on value factorization is divi-ded into three categories:simple factorization type,based on the individual-global-max(IGM)principle and based on attention mechanism.Then,according to the classifications,this paper emphatically introduces several typical algorithms and compares and analyzes their strengths and weaknesses.Finally,it briefly describes the application and development prospect of these algorithms.

Key words: Factorization of value function, MADRL, Attention mechanism, Principle of IGM

CLC Number: 

  • TP181
[1]SUN Y,CAO L,CHEN X L,et al.Overview of multi-agent deep reinforcement learning[J].Computer Engineering and Application,2020,56(5):13-24.
[2]SUTTON R S,BARTO A G.Introduction to reinforcementlearning[M].Cambridge:MIT press,1998.
[3]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,3207-3214.
[4]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[5]EGOROV M.Multi-agent deep reinforcement learning[R].Stanford University:EGOROV M,2016:1-8.
[6]SUN C Y,MU Z X.Important Scientific Problems of Multi-Agent Deep Reinforcement Learning[J].Acta Automatica Sinica,2020,46(7):1301-1312.
[7]NGUYEN T T,NGUYEN N D,NAHAVANDI S.Deep reinforcement learning for multiagent systems:A review of challenges,solutions,and applications[J].IEEE Transactions on Cybernetics,2020,50(9):3826-3839.
[8]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS One,2017,12(4):e0172395.
[9]FOERSTER J,ASSAEL I.A,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2016:2137-2145.
[10]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83.
[11]LEIBO Z,ZAMBALDI V,LANCTOT M,et al.Multi-agent reinforcement learning in sequential social dilemmas[C]//Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems.International Foundation for Autonomous Agents and Multiagent Systems,2017:464-473.
[12]ZHANG K Q,YANG Z R,BAAR T.Decentralized multi-agent reinforcement learning with networked agents:recent advances[J].Frontiers of Information Technology & Electronic Engineering,2021,22:802-814.
[13]STANKOVIĆ M S,BEKO M,STANKOVIĆ S S.DistributedValue Function Approximation for Collaborative Multiagent Reinforcement Learning[J].IEEE Transactions on Control of Network Systems,2021,8(3):1270-1280.
[14]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of AAMAS.2018:2085-2087.
[15]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:4295-4304.
[16]PAPOUDAKIS G,CHRISTIANOS F,SCHÄFER L,et al.Comparative evaluation of cooperative multi-agent deep reinforcement learning algorithms[J].arXiv:2006.07869,2020.
[17]WANG J,REN Z,HAN B,et al.Towards Understanding Co-operative Multi-Agent Q-Learning with Value Factorization [C]//Advances in Neural Information Processing Systems.2021:29142-29155.
[18]WANG J,REN Z,LIU T,et al.QPLEX:Duplex Dueling Multi-Agent Q-Learning[J].arXiv:2008.01062,2020.
[19]SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44.
[20]WATKINS C,DAYAN P.Q-learning[J].Machine Learning,1992,8(3/4):279-292.
[21]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[22]HASSELT H V,GUEZ A,SILVER D.Deep reinforcementlearning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016,30(1):2094-2100.
[23]LIPTON Z,LI X,GAO J,et al.BBQ-Networks:Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018:5237-5244.
[24]ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Va-riance Reduction and Stabilization for Deep Reinforcement Learning[C]//Proceedings of the 34th International Conference on Machine Learning.2017:176-185.
[25]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[26]HAUSKNECHT M,STONE P.Deep recurrent q-learning forpartially observable MDPs[C]//2015 AAAI Fall Symposium Series.2015:29-37.
[27]NAIR A,SRINIVASAN P,BLACKWELL S,et al.Massivelyparallel methods for deep reinforcement learning[J].arXiv:1507.04296,2015.
[28]SOROKIN I,SELEZNEV A,PAVLOV M,et al.Deep attention recurrent Q-network[J].arXiv:1512.01693,2015.
[29]OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353.
[30]OROOJLOOYJADID A,HAJINEZHAD D.A review of coope-rative multi-agent deep reinforcement learning[J].arXiv:1908.03963,2019.
[31]WANG Q L,PSILLAKIS H E,SUN C Y.Cooperative control of multiple agents with unknown high-frequency gain signs under unbalanced and switching topologies[J].IEEE Transactions on Automatic Control,2019,64(6):2495-2501.
[32]BU ŞONIU L,BABU?KA R,DE SCHUTTER B.Multi-agentreinforcement learning:An overview[C]//IEEE Transactions on Systems,Man,and Cybernetics—Part C:Applications and Reviews,2008,38(2):156-172.
[33]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:2974-2982.
[34]RASHID T,FARQUHAR G,PENG B,et al.Weighted QMIX:Expanding monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2020:10199-10210.
[35]SHAO K,ZHU Y,TANG Z,et al.Cooperative Multi-AgentDeep Reinforcement Learning with Counterfactual Reward[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8.
[36]SON K,KIM D,KANG W J,et al.Qtran:Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2019:5887-5896.
[37]SON K,AHN S,REYES R D,et al.QTRAN++:Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning[J].arXiv:2006.12010,2020.
[38]SUN W F,LEE C K,LEE C Y.A Distributional Perspective on Value Function Factorization Methods for Multi-Agent Reinforcement Learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems.2021:1671-1673.
[39]YANG Y,HAO J,LIAO B,et al.Qatten:A general framework for cooperative multiagent reinforcement learning[J].arXiv:2002.03939,2020.
[40]IQBAL S,DE WITT C A S,PENG B,et al.Randomized Entity-wise Factorization for Multi Agent Reinforcement Learning[C]//International Conference on Machine Learning.PMLR,2021:4596-4606.
[41]ZHANG Y,MA H,WANG Y.AVD-Net:Attention Value Decomposition Network For Deep Multi-Agent Reinforcement Learning[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:7810-7816.
[42]WU B,YANG X,SUN C,et al.Learning Effective Value Function Factorization via Attentional Communication[C]//2020 IEEE International Conference on Systems,Man,and Cyberne-tics(SMC).IEEE,2020:629-634.
[43]LIU X,TAN Y.Attentive relational state representation in decentralized multiagent reinforcement learning[J].IEEE Transa-ctions on Cybernetics,2020,52(1):252-264.
[44]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[45]SCHROEDER DE WITT C,FOERSTER J,FARQUHAR G,et al.Multi-agent common knowledge reinforcement learning[J].Advances in Neural Information Processing Systems,2019,32:9927-9939.
[46]ZHENG J,CHEN J,ZHU K.Unmanned Swarm Cooperative Design Based on Multi-agent Reinforcement Learning[J].Command Information System and Technology,2020,11(6):6.
[47]CHU T,WANG J,CODECÀ L,et al.Multi-agent deep rein-forcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2019,21(3):1086-1095.
[48]ZHU F,YANG Z,LIN F,et al.Decentralized cooperative control of multiple energy storage systems in urban railway based on multiagent deep reinforcement learning[J].IEEE Transactions on Power Electronics,2020,35(9):9368-9379.
[49]WANG Y,ZHENG K,TIAN D,et al.Cooperative channel assignment for VANETs based on multiagent reinforcementlear-ning[J].Frontiers of Information Technology & Electronic Engineering,2020,21(7):1047-1058.
[50]ZHANG P,TIAN H,ZHAO P T,et al.Computation offloading strategy in multi-agent cooperation scenario based on reinforcement learning with value-decomposition[J].Journal on Communications,2021,42(6):1-15.
[51]XU S,GUO C,HU R Q,et al.Value Decomposition basedMulti-Task Multi-Agent Deep Reinforcement Learning in Vehicular Networks[C]//GLOBECOM 2020-2020 IEEE Global Communications Conference.IEEE,2020:1-6.
[52]ZHANG L X,GUO Y,LI N,et al.Path planning method of autonomous vehicles based on multi agent reinforcement learning[J].Audio Engineering,2021,45(3):52-57.
[53]SU J,ADAMS S,BELING P.Value-Decomposition Multi-Agent Actor-Critics[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:11352-11360.
[54]FANG F B,MA Y T,WANG Z J,et al.Emotion-Based Heterogeneous Multi-agent Reinforcement Learning with Sparse Reward[J].Pattern Recognition and Artificial Intelligence,2021,34(3):223-231.
[55]PU Y,WANG S,YANG R,et al.Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning[J].arXiv:2104.06655,2021.
[56]SHEIKH H U,BÖLÖNI L.Multi-agent reinforcement learning for problems with combined individual and team reward[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[5] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[6] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[7] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[9] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[10] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[11] XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
[12] PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247.
[13] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[14] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[15] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
Full text



No Suggested Reading articles found!