Computer Science ›› 2019, Vol. 46 ›› Issue (8): 1-8.doi: 10.11896/j.issn.1002-137X.2019.08.001

• Big Data & Data Science •     Next Articles

Overview on Multi-agent Reinforcement Learning

DU Wei1, DING Shi-fei1,2   

  1. (School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China)1
    (Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-07-06 Online:2019-08-15 Published:2019-08-15

Abstract: Multi-agent system is a distributed computing technology,which can be used to solve problems in various fields,including robot system,distributed decision-making,traffic control and business management.Multi-agent reinforcement learning is an important branch in the field of multi-agent system research.It applies reinforcement learning technology and game theory to multi-agent systems,enabling multiple agents to complete more complicated tasks through interaction and decision-making in higher-dimensional and dynamic real scenes.This paper reviewed the recent research progress and development of multi-agent reinforcement learning.Firstly,the theoretical background of multi-agent reinforcement learning was introduced,and the learning objectives and classical algorithms of multi-agent reinforcement learning proposed in the literature were reviewed,which are respectively applied to complete cooperation,complete competition and more general (neither cooperation nor competition) tasks.Secondly,the latest development of multi-agent reinforcement learning was summarized.With the maturity of deep learning technology in recent years,in more and more complex realistic scene tasks,researchers use deep learning technology to automatically learn abstract features of massive input data,and then use these data to optimize the decision-making of agents in reinforcement lear-ning.Recently,researchers have combined deep learning and other technologies to improve and innovate algorithms in different aspects,such as scalability,agent intent,incentive mechanism,and environmental framework.At the end of this paper,the prospect of the application of multi-agent reinforcement learning were summarized.Multi-agent reinforcement learning has made good progress in the fields of robot system,man-machine game and autonomous driving,and will be applied in the fields of resource management,transportation system,medical treatment and finance in the future

Key words: Deep learning, Game theory, Multi-agent reinforcement learning, Multi-agent systems, Reinforcement learning

CLC Number: 

  • TP181
[1]ZHAO Z H,GAO Y,LUO B,et al.Reinforcement Learning Technology in Multi-Agent System[J].Computer Science,2004,31(3):23-27.(in Chinese) 赵志宏,高阳,骆斌,等.多Agent系统中强化学习的研究现状和发展趋势[J].计算机科学,2004,31(3):23-27.
[2]GAO Y,CHEN S F,LU X.Research on Reinforcement Lear- ning Technology:A Review[J].ACTA AUTOMATICA SINICA,2004,30(1):86-100.(in Chinese) 高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100.
[3]LIU Q,ZHAI J W,ZHANG Z C,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,40(1):1-27.(in Chinese) 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,40(1):1-27.
[4]YANG W C,ZHANG L.Multi-agent reinforcement learning based traffic signal control for integrated urban network:survey of state of art[J].Application Research of Computers,2018,35(6):13-18.(in Chinese) 杨文臣,张轮.多智能体强化学习在城市交通网络信号控制方法中的应用综述[J].计算机应用研究,2018,35(6):13-18.
[5]ZHANG W X,MA L,WANG X D.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(1):82-87.(in Chinese) 张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82-87.
[6]XI L,CHEN J F,HUANG Y H,et al.Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel[J].Scientia Sinica,2018,48(4):441-456.(in Chinese) 席磊,陈建峰,黄悦华,等.基于具有时间隧道思想的多智能体强化学习的智能发电控制方法[J].中国科学:技术科学,2018,48(4):441-456.
[7]LITTMAN M L.Markov games as a framework for multi-agent reinforcement learning[M].New Brunswick:Machine Learning Proceedings,1994:157-163.
[8]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear- ning[J].Computer Science,2018,45(7):1-6.(in Chinese) 赵星宇,丁世飞.深度强化学习研究综述[J].计算机科学,2018,45(7):1-6.
[9]GU S,HOLLY E,LILLICRAP T,et al.Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]∥IEEE International Conference on Robotics and Automation.Singapore:IEEE Press,2017:3389-3396.
[10]FOERSTER J,ASSAEL I,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:2137-2145.
[11]LOWE R,WU Y,et al.Multi-agent actor-critic for mixed coo- perative-competitive environments[C]∥Advances in Neural Information Processing Systems.Los Angeles:NIPS Press,2017:6379-6390.
[12]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A unified game-theoretic approach to multi-agent reinforcement learning[C]∥Advances in Neural Information Processing Systems.Los Angeles:NIPS Press,2017:4190-4203.
[13]LEIBO J,ZAMBALDI V,LANCTOT M,et al.Multi-agent reinforcement learning in sequential social dilemmas[C]∥Procee-dings of the 16th Conference on Autonomous Agents and Multi-agent Systems.Singapore:AAMAS Press,2017:464-473.
[14]SHALEV-SHWARTZ S,SHAMMAH S,SHASHUA A.Safe,multi-agent,reinforcement learning for autonomous driving[J/OL].
[15]JIN J,SONG C,LI H,et al.Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising[J/OL].
[16]XI L,CHEN J,HUANG Y,et al.Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel[J].Energy,2018,153:977-987.
[17]PEROLAT J,LEIBO J Z,ZAMBALDI V,et al.A multi-agent reinforcement learning model of common-pool resource appropriation[C]∥Advances in Neural Information Processing Systems.Los Angeles:NIPS Press,2017:3643-3652.
[18]SUTTON R.Introduction:The challenge of reinforcement learning [M].Springer,Boston,MA:Reinforcement Learning,1992:1-3.
[19]BUSONIU L,BABUŠKA R,DE SCHUTTER B.Multi-agent reinforcement learning:An overview[J].Innovations in multi-agent systems and applications-1,2010,310:183-221.
[20]WATKINS C,DAYAN P.Q-learning[J].Machine Learning, 1992,8(3/4):279-292.
[21]LITTMAN M.Value-function reinforcement learning in Markov games[J].Cognitive Systems Research,2001,2(1):55-66.
[22]LAUER M,RIEDMILLER M.An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems[C]∥Seventeenth International Conference on Machine Lear-ning.Stanford:Morgan Kaufmann Press,2000:535-542.
[23]GREENWALD A,HALL K,SERRANO R.Correlated Q-lear- ning[C]∥ICML.Washington:ICML Press,2003:242-249.
[24]KONONEN V.Asymmetric multiagent reinforcement learning [C]∥International Conference on Intelligent Agent Technology.Canada:IEEE Press,2003:336-342.
[25]HU J,WELLMAN M.Multiagent reinforcement learning:theoretical framework and an algorithm[C]∥ICML.Wisconsin:ICML Press,1998:242-250.
[26]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA:NIPS Press,2013:201-220.
[27]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥AAAI.Arizona:AAAI Press,2016:5.
[28]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico:ICLR Press,2016:322-355.
[29]OSBAND I,VAN ROY B,WEN Z.Generalization and exploration via randomized value functions[J].Proceedings of the 33rd International Conference on International Conference on Machine Learning,2014,48(1):2377-2386.
[30]MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:1054-1062.
[31]FRANÇOIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:Towards new dynamic strategies[C]∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada:NIPS Press,2015:1107-1160.
[32]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning:U.S.Patent Application 15/217,758[P].2017-1-26.
[33]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.New York City:ICML press,2016:1928-1937.
[34]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J/OL].
[35]HEESS N,SRIRAM S,LEMMON J,et al.Emergence of locomotion behaviors in rich environments[J/OL].
[36]FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabilizing experience replay for deep multi-agent reinforcement lear-ning[J].International Conference on Machine Learning,2017,70(3):1146-1155.
[37]CIOSEK K,WHITESON S.Offer:Off environment reinforcement learning[J].AAAI Conference on Artificial Intelligence,2017.
[38]TESAURO G.Extending q-learning to general adaptivemulti-agent systems[J].Advances in Neural Information Processing Systems,2004,16(4):871-878.
[39]TAN M.Multi-Agent Reinforcement Learning:Independent vs.Cooperative Agents[C]∥Proceedings of the Tenth International Conference on Machine Learning.MA,USA:ICML Press,1993:330-337.
[40]SHOHAM Y,LEYTON K.Multiagent Systems:Algorithmic, Game-Theoretic,and Logical Foundations[M].New York:Cambridge University Press,2009.
[41]ZAWADZKI E,LIPSON A,LEYTON K.Empirically evaluating multiagent learning algorithms[J/OL].
[42]YANG Y,LUO R,LI M,et al.Mean Field Multi-Agent Reinforcement Learning[J/OL].
[43]PALMER G,TUYLS K,BLOEMBERGEN D,et al.Lenient multi-agent deep reinforcement learning[C]∥Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems.Swede:AAMAS press,2018:443-451.
[44]ZHENG Y,MENG Z,HAO J,et al.Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments[C]∥ Pacific Rim International Conference on Artificial Intelligence.Springer,Cham:PRICAI press,2018:421-429.
[45]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning [J].Plus One,2017,12(4):e0172395.
[46]SONG J,REN H,SADIGH D,et al.Multi-agent generative adversarial imitation learning[J/OL].
[47]WAI H T,YANG Z,WANG Z,et al.Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization[J/OL].
[48]ABOUHEAF M,GUEAIEB W.Multi-agent reinforcement learning approach based on reduced value function approximations[C]∥2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).Canada:IEEE Press,2017:111-116.
[49]QI S,ZHU S C.Intent-aware Multi-agent Reinforcement Lear- ning[J/OL].
[50]RAILEANU R,DENTON E,SZLAM A,et al.Modeling Others using Oneself in Multi-Agent Reinforcement Learning[J/OL].
[51]RABINOWITZ N,PERBET F,SONG H,et al.Machine Theory of Mind[J/OL].
[52]OMIDSHAFIEI S,KIM D,LIU M,et al.Learning to Teach in Cooperative Multiagent Reinforcement Learning[J/OL].
[53]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:2829-2838.
[54]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:1329-1338.
[55]KOFINAS P,DOUNIS A I,VOUROS G A.Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids[J].Applied Energy,2018,219(3):53-67.
[56]CHEN W,ZHOU K,CHEN C.Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning[C]∥ 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).Brazil:IEEE Press,2016:100-106.
[57]VIDHATE D A,KULKARNI P.Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control[C]∥2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).India:IEEE Press,2017:325-331.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[3] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4] JIANG Yang-yang, SONG Li-hua, XING Chang-you, ZHANG Guo-min, ZENG Qing-wei. Belief Driven Attack and Defense Policy Optimization Mechanism in Honeypot Game [J]. Computer Science, 2022, 49(9): 333-339.
[5] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[6] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[7] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[8] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[9] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[10] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[11] YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[12] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[13] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[14] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[15] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
Full text



No Suggested Reading articles found!