计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 1-8.doi: 10.11896/j.issn.1002-137X.2019.08.001

• 大数据与数据科学* •    下一篇

多智能体强化学习综述

杜威1, 丁世飞1,2   

  1. (中国矿业大学计算机科学与技术学院 江苏 徐州221116)1
    (中国科学院计算技术研究所智能信息处理重点实验室 北京100190)2
  • 收稿日期:2018-07-06 发布日期:2019-08-15
  • 通讯作者: 丁世飞(1963-),男,博士后,教授,CCF会员,主要研究方向为机器学习与人工智能,E-mail:dingsf@cumt.edu.cn
  • 作者简介:杜威(1994-),男,硕士生,主要研究方向为深度强化学习,E-mail:1394471165@qq.com
  • 基金资助:
    国家自然科学基金(61672522,61379101),国家重点基础研究发展计划(973)(2013CB329502)

Overview on Multi-agent Reinforcement Learning

DU Wei1, DING Shi-fei1,2   

  1. (School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China)1
    (Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-07-06 Published:2019-08-15

摘要: 多智能体系统是一种分布式计算技术,可用于解决各种领域的问题,包括机器人系统、分布式决策、交通控制和商业管理等。多智能体强化学习是多智能体系统研究领域中的一个重要分支,它将强化学习技术、博弈论等应用到多智能体系统,使得多个智能体能在更高维且动态的真实场景中通过交互和决策完成更错综复杂的任务。文中综述了多智能体强化学习的最新研究进展与发展动态,首先介绍了多智能体强化学习的基础理论背景,回顾了文献中提出的多智能体强化学习的学习目标和经典算法,其被分别应用于完全合作、完全竞争和更一般(不合作也不竞争)的任务。其次,综述了多智能体强化学习的最新进展,近年来随着深度学习技术的成熟,在越来越多的复杂现实场景任务中,研究人员利用深度学习技术来自动学习海量输入数据的抽象特征,并以此来优化强化学习问题中智能体的决策。近期,研究人员结合深度学习等技术,从可扩展性、智能体意图、奖励机制、环境框架等不同方面对算法进行了改进和创新。最后,对多智能体强化学习的应用前景和发展趋势进行了总结与展望。目前多智能体强化学习在机器人系统、人机博弈、自动驾驶等领域取得了不错的进展,未来将被更广泛地应用于资源管理、交通系统、医疗、金融等各个领域。

关键词: 强化学习, 多智能体系统, 博弈论, 多智能体强化学习, 深度学习

Abstract: Multi-agent system is a distributed computing technology,which can be used to solve problems in various fields,including robot system,distributed decision-making,traffic control and business management.Multi-agent reinforcement learning is an important branch in the field of multi-agent system research.It applies reinforcement learning technology and game theory to multi-agent systems,enabling multiple agents to complete more complicated tasks through interaction and decision-making in higher-dimensional and dynamic real scenes.This paper reviewed the recent research progress and development of multi-agent reinforcement learning.Firstly,the theoretical background of multi-agent reinforcement learning was introduced,and the learning objectives and classical algorithms of multi-agent reinforcement learning proposed in the literature were reviewed,which are respectively applied to complete cooperation,complete competition and more general (neither cooperation nor competition) tasks.Secondly,the latest development of multi-agent reinforcement learning was summarized.With the maturity of deep learning technology in recent years,in more and more complex realistic scene tasks,researchers use deep learning technology to automatically learn abstract features of massive input data,and then use these data to optimize the decision-making of agents in reinforcement lear-ning.Recently,researchers have combined deep learning and other technologies to improve and innovate algorithms in different aspects,such as scalability,agent intent,incentive mechanism,and environmental framework.At the end of this paper,the prospect of the application of multi-agent reinforcement learning were summarized.Multi-agent reinforcement learning has made good progress in the fields of robot system,man-machine game and autonomous driving,and will be applied in the fields of resource management,transportation system,medical treatment and finance in the future

Key words: Reinforcement learning, Multi-agent systems, Game theory, Multi-agent reinforcement learning, Deep learning

中图分类号: 

  • TP181
[19] BUSONIU L,BABUŠKA R,DE SCHUTTER B.Multi-agent reinforcement learning:An overview[J].Innovations in multi-agent systems and applications-1,2010,310:183-221.
[20] WATKINS C,DAYAN P.Q-learning[J].Machine Learning, 1992,8(3/4):279-292.
[21] LITTMAN M.Value-function reinforcement learning in Markov games[J].Cognitive Systems Research,2001,2(1):55-66.
[22] LAUER M,RIEDMILLER M.An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems[C]∥Seventeenth International Conference on Machine Lear-ning.Stanford:Morgan Kaufmann Press,2000:535-542.
[23] GREENWALD A,HALL K,SERRANO R.Correlated Q-lear- ning[C]∥ICML.Washington:ICML Press,2003:242-249.
[24] KONONEN V.Asymmetric multiagent reinforcement learning [C]∥International Conference on Intelligent Agent Technology.Canada:IEEE Press,2003:336-342.
[25] HU J,WELLMAN M.Multiagent reinforcement learning:theoretical framework and an algorithm[C]∥ICML.Wisconsin:ICML Press,1998:242-250.
[26] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA:NIPS Press,2013:201-220.
[27] VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥AAAI.Arizona:AAAI Press,2016:5.
[28] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico:ICLR Press,2016:322-355.
[29] OSBAND I,VAN ROY B,WEN Z.Generalization and exploration via randomized value functions[J].Proceedings of the 33rd International Conference on International Conference on Machine Learning,2014,48(1):2377-2386.
[30] MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:1054-1062.
[31] FRANÇOIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:Towards new dynamic strategies[C]∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada:NIPS Press,2015:1107-1160.
[32] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning:U.S.Patent Application 15/217,758[P].2017-1-26.
[33] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.New York City:ICML press,2016:1928-1937.
[34] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J/OL].https://arxiv.org/abs/1707.06347.
[35] HEESS N,SRIRAM S,LEMMON J,et al.Emergence of locomotion behaviors in rich environments[J/OL].https://arxiv.org/abs/1707.02286.
[36] FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabilizing experience replay for deep multi-agent reinforcement lear-ning[J].International Conference on Machine Learning,2017,70(3):1146-1155.
[37] CIOSEK K,WHITESON S.Offer:Off environment reinforcement learning[J].AAAI Conference on Artificial Intelligence,2017.
[38] TESAURO G.Extending q-learning to general adaptivemulti-agent systems[J].Advances in Neural Information Processing Systems,2004,16(4):871-878.
[39] TAN M.Multi-Agent Reinforcement Learning:Independent vs.Cooperative Agents[C]∥Proceedings of the Tenth International Conference on Machine Learning.MA,USA:ICML Press,1993:330-337.
[40] SHOHAM Y,LEYTON K.Multiagent Systems:Algorithmic, Game-Theoretic,and Logical Foundations[M].New York:Cambridge University Press,2009.
[41] ZAWADZKI E,LIPSON A,LEYTON K.Empirically evaluating multiagent learning algorithms[J/OL].https://arxiv.org/abs/1401.8074.
[42] YANG Y,LUO R,LI M,et al.Mean Field Multi-Agent Reinforcement Learning[J/OL].https://arxiv.org/abs/1802.05438.
[43] PALMER G,TUYLS K,BLOEMBERGEN D,et al.Lenient multi-agent deep reinforcement learning[C]∥Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems.Swede:AAMAS press,2018:443-451.
[44] ZHENG Y,MENG Z,HAO J,et al.Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments[C]∥ Pacific Rim International Conference on Artificial Intelligence.Springer,Cham:PRICAI press,2018:421-429.
[45] TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning [J].Plus One,2017,12(4):e0172395.
[46] SONG J,REN H,SADIGH D,et al.Multi-agent generative adversarial imitation learning[J/OL].https://arxiv.org/abs/1807.09936.
[47] WAI H T,YANG Z,WANG Z,et al.Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization[J/OL].https://arxiv.org/abs/1806.00877.
[48] ABOUHEAF M,GUEAIEB W.Multi-agent reinforcement learning approach based on reduced value function approximations[C]∥2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).Canada:IEEE Press,2017:111-116.
[49] QI S,ZHU S C.Intent-aware Multi-agent Reinforcement Lear- ning[J/OL].https://arxiv.org/abs/1803.02018.
[50] RAILEANU R,DENTON E,SZLAM A,et al.Modeling Others using Oneself in Multi-Agent Reinforcement Learning[J/OL].https://arxiv.org/abs/1802.09640.
[51] RABINOWITZ N,PERBET F,SONG H,et al.Machine Theory of Mind[J/OL].https://arxiv.org/abs/1802.07740.
[52] OMIDSHAFIEI S,KIM D,LIU M,et al.Learning to Teach in Cooperative Multiagent Reinforcement Learning[J/OL].https://arxiv.org/abs/1805.07830.
[53] GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:2829-2838.
[54] DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:1329-1338.
[55] KOFINAS P,DOUNIS A I,VOUROS G A.Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids[J].Applied Energy,2018,219(3):53-67.
[56] CHEN W,ZHOU K,CHEN C.Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning[C]∥ 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).Brazil:IEEE Press,2016:100-106.
[57] VIDHATE D A,KULKARNI P.Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control[C]∥2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).India:IEEE Press,2017:325-331.
[1] 周燕, 曾凡智, 吴臣, 罗粤, 刘紫琴. 基于深度学习的三维形状特征提取方法[J]. 计算机科学, 2019, 46(9): 47-58.
[2] 马露, 裴伟, 朱永英, 王春立, 王鹏乾. 基于深度学习的跌倒行为识别[J]. 计算机科学, 2019, 46(9): 106-112.
[3] 李青华, 李翠平, 张静, 陈红, 王绍卿. 深度神经网络压缩综述[J]. 计算机科学, 2019, 46(9): 1-14.
[4] 王嫣然, 陈清亮, 吴俊君. 面向复杂环境的图像语义分割方法综述[J]. 计算机科学, 2019, 46(9): 36-46.
[5] 孙中锋, 王静. 用于基于方面情感分析的RCNN-BGRU-HN网络模型[J]. 计算机科学, 2019, 46(9): 223-228.
[6] 缪永伟, 李高怡, 鲍陈, 张旭东, 彭思龙. 基于卷积神经网络的图像局部风格迁移[J]. 计算机科学, 2019, 46(9): 259-264.
[7] 卢海峰, 顾春华, 罗飞, 丁炜超, 袁野, 任强. 强化学习下能耗优化的虚拟机放置策略[J]. 计算机科学, 2019, 46(9): 291-297.
[8] 邓存彬, 虞慧群, 范贵生. 融合动态协同过滤和深度学习的推荐算法[J]. 计算机科学, 2019, 46(8): 28-34.
[9] 郭旭, 朱敬华. 基于用户向量化表示和注意力机制的深度神经网络推荐模型[J]. 计算机科学, 2019, 46(8): 111-115.
[10] 张义杰, 李培峰, 朱巧明. 基于自注意力机制的事件时序关系分类方法[J]. 计算机科学, 2019, 46(8): 244-248.
[11] 李舟军,王昌宝. 基于深度学习的机器阅读理解综述[J]. 计算机科学, 2019, 46(7): 7-12.
[12] 张琳娜,陈建强,陈晓玲,岑翼刚,阚世超. 面向行车视频目标实时检测的轻量级SSD网络[J]. 计算机科学, 2019, 46(7): 233-237.
[13] 李健, 杨祥如, 何斌. 基于深度学习的几何特征匹配方法[J]. 计算机科学, 2019, 46(7): 274-279.
[14] 刘梦娟,曾贵川,岳威,仇笠舟,王加昌. 面向展示广告的点击率预测模型综述[J]. 计算机科学, 2019, 46(7): 38-49.
[15] 陈思文, 刘玉江, 刘冬, 苏晨, 赵地, 钱林学, 张佩珩. 基于AlexNet模型和自适应对比度增强的乳腺结节超声图像分类[J]. 计算机科学, 2019, 46(6A): 146-152.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .
[4] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[5] 庞博,金乾坤,合尼古力·吾买尔,齐兴斌. 软件定义网络中基于网络切片和ILP模型的路由方案[J]. 计算机科学, 2018, 45(4): 143 -147 .
[6] 陆佳炜,马俊,张元鸣,肖刚. 面向全局社交服务网的Web服务聚类方法[J]. 计算机科学, 2018, 45(3): 204 -212 .
[7] 吕涛,郝泳涛. 基于相似性匹配和聚类的K线模式可盈利性研究[J]. 计算机科学, 2018, 45(3): 182 -188 .
[8] 蔡莉,梁宇,朱扬勇,何婧. 数据质量的历史沿革和发展趋势[J]. 计算机科学, 2018, 45(4): 1 -10 .
[9] 锁延锋,王少杰,秦宇,李秋香,丰大军,李京春. 工业控制系统的安全技术与应用研究综述[J]. 计算机科学, 2018, 45(4): 25 -33 .
[10] 崔建京,龙军,闵尔学,于洋,殷建平. 同态加密在加密机器学习中的应用研究综述[J]. 计算机科学, 2018, 45(4): 46 -52 .