计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 1-8.doi: 10.11896/j.issn.1002-137X.2019.08.001

• 大数据与数据科学* •    下一篇

多智能体强化学习综述

杜威1, 丁世飞1,2   

  1. (中国矿业大学计算机科学与技术学院 江苏 徐州221116)1
    (中国科学院计算技术研究所智能信息处理重点实验室 北京100190)2
  • 收稿日期:2018-07-06 出版日期:2019-08-15 发布日期:2019-08-15
  • 通讯作者: 丁世飞(1963-),男,博士后,教授,CCF会员,主要研究方向为机器学习与人工智能,E-mail:dingsf@cumt.edu.cn
  • 作者简介:杜威(1994-),男,硕士生,主要研究方向为深度强化学习,E-mail:1394471165@qq.com
  • 基金资助:
    国家自然科学基金(61672522,61379101),国家重点基础研究发展计划(973)(2013CB329502)

Overview on Multi-agent Reinforcement Learning

DU Wei1, DING Shi-fei1,2   

  1. (School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China)1
    (Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-07-06 Online:2019-08-15 Published:2019-08-15

摘要: 多智能体系统是一种分布式计算技术,可用于解决各种领域的问题,包括机器人系统、分布式决策、交通控制和商业管理等。多智能体强化学习是多智能体系统研究领域中的一个重要分支,它将强化学习技术、博弈论等应用到多智能体系统,使得多个智能体能在更高维且动态的真实场景中通过交互和决策完成更错综复杂的任务。文中综述了多智能体强化学习的最新研究进展与发展动态,首先介绍了多智能体强化学习的基础理论背景,回顾了文献中提出的多智能体强化学习的学习目标和经典算法,其被分别应用于完全合作、完全竞争和更一般(不合作也不竞争)的任务。其次,综述了多智能体强化学习的最新进展,近年来随着深度学习技术的成熟,在越来越多的复杂现实场景任务中,研究人员利用深度学习技术来自动学习海量输入数据的抽象特征,并以此来优化强化学习问题中智能体的决策。近期,研究人员结合深度学习等技术,从可扩展性、智能体意图、奖励机制、环境框架等不同方面对算法进行了改进和创新。最后,对多智能体强化学习的应用前景和发展趋势进行了总结与展望。目前多智能体强化学习在机器人系统、人机博弈、自动驾驶等领域取得了不错的进展,未来将被更广泛地应用于资源管理、交通系统、医疗、金融等各个领域。

关键词: 强化学习, 多智能体系统, 博弈论, 多智能体强化学习, 深度学习

Abstract: Multi-agent system is a distributed computing technology,which can be used to solve problems in various fields,including robot system,distributed decision-making,traffic control and business management.Multi-agent reinforcement learning is an important branch in the field of multi-agent system research.It applies reinforcement learning technology and game theory to multi-agent systems,enabling multiple agents to complete more complicated tasks through interaction and decision-making in higher-dimensional and dynamic real scenes.This paper reviewed the recent research progress and development of multi-agent reinforcement learning.Firstly,the theoretical background of multi-agent reinforcement learning was introduced,and the learning objectives and classical algorithms of multi-agent reinforcement learning proposed in the literature were reviewed,which are respectively applied to complete cooperation,complete competition and more general (neither cooperation nor competition) tasks.Secondly,the latest development of multi-agent reinforcement learning was summarized.With the maturity of deep learning technology in recent years,in more and more complex realistic scene tasks,researchers use deep learning technology to automatically learn abstract features of massive input data,and then use these data to optimize the decision-making of agents in reinforcement lear-ning.Recently,researchers have combined deep learning and other technologies to improve and innovate algorithms in different aspects,such as scalability,agent intent,incentive mechanism,and environmental framework.At the end of this paper,the prospect of the application of multi-agent reinforcement learning were summarized.Multi-agent reinforcement learning has made good progress in the fields of robot system,man-machine game and autonomous driving,and will be applied in the fields of resource management,transportation system,medical treatment and finance in the future

Key words: Reinforcement learning, Multi-agent systems, Game theory, Multi-agent reinforcement learning, Deep learning

中图分类号: 

  • TP181
[19] BUSONIU L,BABUŠKA R,DE SCHUTTER B.Multi-agent reinforcement learning:An overview[J].Innovations in multi-agent systems and applications-1,2010,310:183-221.
[20] WATKINS C,DAYAN P.Q-learning[J].Machine Learning, 1992,8(3/4):279-292.
[21] LITTMAN M.Value-function reinforcement learning in Markov games[J].Cognitive Systems Research,2001,2(1):55-66.
[22] LAUER M,RIEDMILLER M.An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems[C]∥Seventeenth International Conference on Machine Lear-ning.Stanford:Morgan Kaufmann Press,2000:535-542.
[23] GREENWALD A,HALL K,SERRANO R.Correlated Q-lear- ning[C]∥ICML.Washington:ICML Press,2003:242-249.
[24] KONONEN V.Asymmetric multiagent reinforcement learning [C]∥International Conference on Intelligent Agent Technology.Canada:IEEE Press,2003:336-342.
[25] HU J,WELLMAN M.Multiagent reinforcement learning:theoretical framework and an algorithm[C]∥ICML.Wisconsin:ICML Press,1998:242-250.
[26] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA:NIPS Press,2013:201-220.
[27] VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥AAAI.Arizona:AAAI Press,2016:5.
[28] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico:ICLR Press,2016:322-355.
[29] OSBAND I,VAN ROY B,WEN Z.Generalization and exploration via randomized value functions[J].Proceedings of the 33rd International Conference on International Conference on Machine Learning,2014,48(1):2377-2386.
[30] MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:1054-1062.
[31] FRANÇOIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:Towards new dynamic strategies[C]∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada:NIPS Press,2015:1107-1160.
[32] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning:U.S.Patent Application 15/217,758[P].2017-1-26.
[33] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.New York City:ICML press,2016:1928-1937.
[34] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J/OL].https://arxiv.org/abs/1707.06347.
[35] HEESS N,SRIRAM S,LEMMON J,et al.Emergence of locomotion behaviors in rich environments[J/OL].https://arxiv.org/abs/1707.02286.
[36] FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabilizing experience replay for deep multi-agent reinforcement lear-ning[J].International Conference on Machine Learning,2017,70(3):1146-1155.
[37] CIOSEK K,WHITESON S.Offer:Off environment reinforcement learning[J].AAAI Conference on Artificial Intelligence,2017.
[38] TESAURO G.Extending q-learning to general adaptivemulti-agent systems[J].Advances in Neural Information Processing Systems,2004,16(4):871-878.
[39] TAN M.Multi-Agent Reinforcement Learning:Independent vs.Cooperative Agents[C]∥Proceedings of the Tenth International Conference on Machine Learning.MA,USA:ICML Press,1993:330-337.
[40] SHOHAM Y,LEYTON K.Multiagent Systems:Algorithmic, Game-Theoretic,and Logical Foundations[M].New York:Cambridge University Press,2009.
[41] ZAWADZKI E,LIPSON A,LEYTON K.Empirically evaluating multiagent learning algorithms[J/OL].https://arxiv.org/abs/1401.8074.
[42] YANG Y,LUO R,LI M,et al.Mean Field Multi-Agent Reinforcement Learning[J/OL].https://arxiv.org/abs/1802.05438.
[43] PALMER G,TUYLS K,BLOEMBERGEN D,et al.Lenient multi-agent deep reinforcement learning[C]∥Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems.Swede:AAMAS press,2018:443-451.
[44] ZHENG Y,MENG Z,HAO J,et al.Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments[C]∥ Pacific Rim International Conference on Artificial Intelligence.Springer,Cham:PRICAI press,2018:421-429.
[45] TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning [J].Plus One,2017,12(4):e0172395.
[46] SONG J,REN H,SADIGH D,et al.Multi-agent generative adversarial imitation learning[J/OL].https://arxiv.org/abs/1807.09936.
[47] WAI H T,YANG Z,WANG Z,et al.Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization[J/OL].https://arxiv.org/abs/1806.00877.
[48] ABOUHEAF M,GUEAIEB W.Multi-agent reinforcement learning approach based on reduced value function approximations[C]∥2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).Canada:IEEE Press,2017:111-116.
[49] QI S,ZHU S C.Intent-aware Multi-agent Reinforcement Lear- ning[J/OL].https://arxiv.org/abs/1803.02018.
[50] RAILEANU R,DENTON E,SZLAM A,et al.Modeling Others using Oneself in Multi-Agent Reinforcement Learning[J/OL].https://arxiv.org/abs/1802.09640.
[51] RABINOWITZ N,PERBET F,SONG H,et al.Machine Theory of Mind[J/OL].https://arxiv.org/abs/1802.07740.
[52] OMIDSHAFIEI S,KIM D,LIU M,et al.Learning to Teach in Cooperative Multiagent Reinforcement Learning[J/OL].https://arxiv.org/abs/1805.07830.
[53] GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:2829-2838.
[54] DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:1329-1338.
[55] KOFINAS P,DOUNIS A I,VOUROS G A.Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids[J].Applied Energy,2018,219(3):53-67.
[56] CHEN W,ZHOU K,CHEN C.Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning[C]∥ 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).Brazil:IEEE Press,2016:100-106.
[57] VIDHATE D A,KULKARNI P.Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control[C]∥2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).India:IEEE Press,2017:325-331.
[1] 马堉银, 郑万波, 马勇, 刘航, 夏云霓, 郭坤银, 陈鹏, 刘诚武. 一种基于深度强化学习与概率性能感知的边缘计算环境多工作流卸载方法[J]. 计算机科学, 2021, 48(1): 40-48.
[2] 毛莺池, 周彤, 刘鹏飞. 基于延迟接受的多用户任务卸载策略[J]. 计算机科学, 2021, 48(1): 49-57.
[3] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[4] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[5] 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述[J]. 计算机科学, 2021, 48(1): 258-267.
[6] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[7] 何鑫, 许娟, 金莹莹. 行为关联网络:完整的变化行为建模[J]. 计算机科学, 2020, 47(9): 123-128.
[8] 叶亚男, 迟静, 于志平, 战玉丽, 张彩明. 基于改进CycleGan模型和区域分割的表情动画合成[J]. 计算机科学, 2020, 47(9): 142-149.
[9] 邓良, 许庚林, 李梦杰, 陈章进. 基于深度学习与多哈希相似度加权实现快速人脸识别[J]. 计算机科学, 2020, 47(9): 163-168.
[10] 暴雨轩, 芦天亮, 杜彦辉. 深度伪造视频检测技术综述[J]. 计算机科学, 2020, 47(9): 283-292.
[11] 刘凌云, 钱辉, 邢红杰, 董春茹, 张峰. 一种基于Q-学习算法的增量分类模型[J]. 计算机科学, 2020, 47(8): 171-177.
[12] 刘君良, 李晓光. 个性化推荐系统技术进展[J]. 计算机科学, 2020, 47(7): 47-55.
[13] 袁野, 和晓歌, 朱定坤, 王富利, 谢浩然, 汪俊, 魏明强, 郭延文. 视觉图像显著性检测综述[J]. 计算机科学, 2020, 47(7): 84-91.
[14] 王文刀, 王润泽, 魏鑫磊, 漆云亮, 马义德. 基于堆叠式双向LSTM的心电图自动识别算法[J]. 计算机科学, 2020, 47(7): 118-124.
[15] 刘燕, 温静. 基于注意力机制的复杂场景文本检测[J]. 计算机科学, 2020, 47(7): 135-140.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .
[10] 杨羽琦,章国安,金喜龙. 车载自组织网络中基于车辆密度的双簇头路由协议[J]. 计算机科学, 2018, 45(4): 126 -130 .