Computer Science ›› 2019, Vol. 46 ›› Issue (8): 1-8.doi: 10.11896/j.issn.1002-137X.2019.08.001

• Big Data & Data Science •     Next Articles

Overview on Multi-agent Reinforcement Learning

DU Wei1, DING Shi-fei1,2   

  1. (School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China)1
    (Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-07-06 Online:2019-08-15 Published:2019-08-15

Abstract: Multi-agent system is a distributed computing technology,which can be used to solve problems in various fields,including robot system,distributed decision-making,traffic control and business management.Multi-agent reinforcement learning is an important branch in the field of multi-agent system research.It applies reinforcement learning technology and game theory to multi-agent systems,enabling multiple agents to complete more complicated tasks through interaction and decision-making in higher-dimensional and dynamic real scenes.This paper reviewed the recent research progress and development of multi-agent reinforcement learning.Firstly,the theoretical background of multi-agent reinforcement learning was introduced,and the learning objectives and classical algorithms of multi-agent reinforcement learning proposed in the literature were reviewed,which are respectively applied to complete cooperation,complete competition and more general (neither cooperation nor competition) tasks.Secondly,the latest development of multi-agent reinforcement learning was summarized.With the maturity of deep learning technology in recent years,in more and more complex realistic scene tasks,researchers use deep learning technology to automatically learn abstract features of massive input data,and then use these data to optimize the decision-making of agents in reinforcement lear-ning.Recently,researchers have combined deep learning and other technologies to improve and innovate algorithms in different aspects,such as scalability,agent intent,incentive mechanism,and environmental framework.At the end of this paper,the prospect of the application of multi-agent reinforcement learning were summarized.Multi-agent reinforcement learning has made good progress in the fields of robot system,man-machine game and autonomous driving,and will be applied in the fields of resource management,transportation system,medical treatment and finance in the future

Key words: Reinforcement learning, Multi-agent systems, Game theory, Multi-agent reinforcement learning, Deep learning

CLC Number: 

  • TP181
[19] BUSONIU L,BABUŠKA R,DE SCHUTTER B.Multi-agent reinforcement learning:An overview[J].Innovations in multi-agent systems and applications-1,2010,310:183-221.
[20] WATKINS C,DAYAN P.Q-learning[J].Machine Learning, 1992,8(3/4):279-292.
[21] LITTMAN M.Value-function reinforcement learning in Markov games[J].Cognitive Systems Research,2001,2(1):55-66.
[22] LAUER M,RIEDMILLER M.An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems[C]∥Seventeenth International Conference on Machine Lear-ning.Stanford:Morgan Kaufmann Press,2000:535-542.
[23] GREENWALD A,HALL K,SERRANO R.Correlated Q-lear- ning[C]∥ICML.Washington:ICML Press,2003:242-249.
[24] KONONEN V.Asymmetric multiagent reinforcement learning [C]∥International Conference on Intelligent Agent Technology.Canada:IEEE Press,2003:336-342.
[25] HU J,WELLMAN M.Multiagent reinforcement learning:theoretical framework and an algorithm[C]∥ICML.Wisconsin:ICML Press,1998:242-250.
[26] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA:NIPS Press,2013:201-220.
[27] VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥AAAI.Arizona:AAAI Press,2016:5.
[28] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico:ICLR Press,2016:322-355.
[29] OSBAND I,VAN ROY B,WEN Z.Generalization and exploration via randomized value functions[J].Proceedings of the 33rd International Conference on International Conference on Machine Learning,2014,48(1):2377-2386.
[30] MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:1054-1062.
[31] FRANÇOIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:Towards new dynamic strategies[C]∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada:NIPS Press,2015:1107-1160.
[32] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning:U.S.Patent Application 15/217,758[P].2017-1-26.
[33] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.New York City:ICML press,2016:1928-1937.
[34] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J/OL].
[35] HEESS N,SRIRAM S,LEMMON J,et al.Emergence of locomotion behaviors in rich environments[J/OL].
[36] FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabilizing experience replay for deep multi-agent reinforcement lear-ning[J].International Conference on Machine Learning,2017,70(3):1146-1155.
[37] CIOSEK K,WHITESON S.Offer:Off environment reinforcement learning[J].AAAI Conference on Artificial Intelligence,2017.
[38] TESAURO G.Extending q-learning to general adaptivemulti-agent systems[J].Advances in Neural Information Processing Systems,2004,16(4):871-878.
[39] TAN M.Multi-Agent Reinforcement Learning:Independent vs.Cooperative Agents[C]∥Proceedings of the Tenth International Conference on Machine Learning.MA,USA:ICML Press,1993:330-337.
[40] SHOHAM Y,LEYTON K.Multiagent Systems:Algorithmic, Game-Theoretic,and Logical Foundations[M].New York:Cambridge University Press,2009.
[41] ZAWADZKI E,LIPSON A,LEYTON K.Empirically evaluating multiagent learning algorithms[J/OL].
[42] YANG Y,LUO R,LI M,et al.Mean Field Multi-Agent Reinforcement Learning[J/OL].
[43] PALMER G,TUYLS K,BLOEMBERGEN D,et al.Lenient multi-agent deep reinforcement learning[C]∥Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems.Swede:AAMAS press,2018:443-451.
[44] ZHENG Y,MENG Z,HAO J,et al.Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments[C]∥ Pacific Rim International Conference on Artificial Intelligence.Springer,Cham:PRICAI press,2018:421-429.
[45] TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning [J].Plus One,2017,12(4):e0172395.
[46] SONG J,REN H,SADIGH D,et al.Multi-agent generative adversarial imitation learning[J/OL].
[47] WAI H T,YANG Z,WANG Z,et al.Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization[J/OL].
[48] ABOUHEAF M,GUEAIEB W.Multi-agent reinforcement learning approach based on reduced value function approximations[C]∥2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).Canada:IEEE Press,2017:111-116.
[49] QI S,ZHU S C.Intent-aware Multi-agent Reinforcement Lear- ning[J/OL].
[50] RAILEANU R,DENTON E,SZLAM A,et al.Modeling Others using Oneself in Multi-Agent Reinforcement Learning[J/OL].
[51] RABINOWITZ N,PERBET F,SONG H,et al.Machine Theory of Mind[J/OL].
[52] OMIDSHAFIEI S,KIM D,LIU M,et al.Learning to Teach in Cooperative Multiagent Reinforcement Learning[J/OL].
[53] GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:2829-2838.
[54] DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:1329-1338.
[55] KOFINAS P,DOUNIS A I,VOUROS G A.Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids[J].Applied Energy,2018,219(3):53-67.
[56] CHEN W,ZHOU K,CHEN C.Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning[C]∥ 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).Brazil:IEEE Press,2016:100-106.
[57] VIDHATE D A,KULKARNI P.Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control[C]∥2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).India:IEEE Press,2017:325-331.
[1] DING Yu, WEI Hao, PAN Zhi-song, LIU Xin. Survey of Network Representation Learning [J]. Computer Science, 2020, 47(9): 52-59.
[2] HE Xin, XU Juan, JIN Ying-ying. Action-related Network:Towards Modeling Complete Changeable Action [J]. Computer Science, 2020, 47(9): 123-128.
[3] YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming. Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation [J]. Computer Science, 2020, 47(9): 142-149.
[4] DENG Liang, XU Geng-lin, LI Meng-jie, CHEN Zhang-jin. Fast Face Recognition Based on Deep Learning and Multiple Hash Similarity Weighting [J]. Computer Science, 2020, 47(9): 163-168.
[5] BAO Yu-xuan, LU Tian-liang, DU Yan-hui. Overview of Deepfake Video Detection Technology [J]. Computer Science, 2020, 47(9): 283-292.
[6] LIU Ling-yun, QIAN Hui, XING Hong-jie, DONG Chun-ru, ZHANG Feng. Incremental Classification Model Based on Q-learning Algorithm [J]. Computer Science, 2020, 47(8): 171-177.
[7] LIU Jun-liang, LI Xiao-guang. Techniques for Recommendation System:A Survey [J]. Computer Science, 2020, 47(7): 47-55.
[8] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[9] WANG Wen-dao, WANG Run-ze, WEI Xin-lei, QI Yun-liang, MA Yi-de. Automatic Recognition of ECG Based on Stacked Bidirectional LSTM [J]. Computer Science, 2020, 47(7): 118-124.
[10] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[11] ZHANG Zhi-yang, ZHANG Feng-li, TAN Qi, WANG Rui-jin. Review of Information Cascade Prediction Methods Based on Deep Learning [J]. Computer Science, 2020, 47(7): 141-153.
[12] ZHENG Shuai, LUO Fei, GU Chun-hua, DING Wei-chao, LU Hai-feng. Improved Speedy Q-learning Algorithm Based on Double Estimator [J]. Computer Science, 2020, 47(7): 179-185.
[13] JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian. 4Bit-based Gradient Compression Method for Distributed Deep Learning System [J]. Computer Science, 2020, 47(7): 220-226.
[14] HUANG Jin-hao, DING Yu-zhen, XIAO Liang, SHEN Zhi-rong, ZHU Zhen-min. Reinforcement Learning Based Cache Scheduling Against Denial-of-Service Attacks in Embedded Systems [J]. Computer Science, 2020, 47(7): 282-286.
[15] CHEN Jin-yin, ZHANG Dun-Jie, LIN Xiang, XU Xiao-dong and ZHU Zi-ling. False Message Propagation Suppression Based on Influence Maximization [J]. Computer Science, 2020, 47(6A): 17-23.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105, 130 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111, 142 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .
[10] YANG Yu-qi, ZHANG Guo-an and JIN Xi-long. Dual-cluster-head Routing Protocol Based on Vehicle Density in VANETs[J]. Computer Science, 2018, 45(4): 126 -130 .