计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 176-189.doi: 10.11896/jsjkx.241000047
黄炜烨, 陈希亮, 赖俊
HUANG Weiye, CHEN Xiliang, LAI Jun
摘要: 人机协作近年来受到广泛关注,多智能体强化学习在人机协作领域展现出了显著的优势和应用潜力。首先,对多智能体强化学习的基本概念和重要模型进行了介绍,分析了多智能体强化学习在人机协作任务中的优势,并将人机协作分为3种类型进行介绍。其次,论述了多智能体强化学习的3种训练范式,包括集中训练集中执行、分散训练分散执行和集中训练分散执行,以及每种训练范式的适用场景。接着,针对人机协作中智能体训练方法存在的泛化能力差、训练伙伴缺乏多样性以及无法更好地适应人类合作伙伴等问题,从是否使用人类数据的角度,论述了面向人机协作的智能体训练方法的研究进展。最后,讨论了人机协作的应用场景和未来发展趋势,提出了可能的解决思路与研究方向。
中图分类号:
[1]GAO Y,LIU F,WANG L.Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games:A Communication Perspective[C]//Proceedings of the International Confe-rence on Learning Representations.Kigali:ICLR,2023:1-26. [2]DORRI A,KANHERE S S,JURDAK R.Multi-Agent Systems:A Survey[J].IEEE Access,2018,6:28573-28593. [3]CHENS,WANG Y,SONG Z,et al.WHALES:A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving[J].arXiv:2411.13340,2024. [4]CARROLL M,SHAH R,HO M K,et al.On the Utility ofLearning about Humans for Human-AI Coordination[C]//Proceedings of the Neural Information Processing Systems.Vancouver:NeurIPS,2019:5175-5186. [5]BAIN M,SAMMUT C.A Framework for Behavioural Cloning[C]//Proceedings of the Machine Intelligence 15.Oxford:Oxford University Press,2000:103-129. [6]STROUSE D,MCKEE K R,BOTVINICK M,et al.Collabora-ting with Humans without Human Data[C]//Proceedings of the Neural Information Processing Systems.Virtual Event:NeurIPS,2021:14502-14515. [7]NEKOEI H,ZHAO X T,RAJENDRAN J,et al.Towards Few-shot Coordination:Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi[C]//Proceedings of the Conference on Lifelong Learning Agents.Montral:CoLLAs,2023:861-877. [8]YUAN L,ZHANG Z,LI L,et al.A Survey of Progress on Co-operative Multi-agent Reinforcement Learning in Open Environment[J].arXiv:2312.01058,2023. [9]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4:237-285. [10]LI Y X.Deep Reinforcement Learning An Overview[J].arXiv:1810.06339,2018. [11]WONG A,BÄCK T,KONONOVA A V,et al.Deep Multi-agentReinforcement Learning:Challenges and Directions[J].Artificial Intelligence Review,2022,56(6):5023-5056. [12]OROOJLOOY A,HAJINEZHAD D.A Review of CooperativeMulti-agent Deep Reinforcement Learning[J].Applied Intelligence,2022,53(11):13677-13722. [13]GRONAUER S,DIEPOLD K.Multi-agent Deep Reinforcement Learning:A Survey[J].Artificial Intelligence Review,2022,55:895-943. [14]EKER B,OZKUCUR E,MERICLI C,et al.A Finite HorizonDEC-POMDP Approach to Multi-robot Task Learning[C]//Proceedings of the 2011 5th International Conference on Application of Information and Communication Technologies(AICT).Baku:IEEE Xplore,2011:1-5. [15]YANG Y,WANG J.An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective[J].arXiv:2011.00583,2020. [16]BERKEL N V,SKOV M B,KJELDSKOV J.Human-ai Interaction:Intermittent,Continuous and Proactive[J].Interactions,2021,28(6):67-71. [17]ONNASCH L,ROESLER E.A Taxonomy to Structure and Analyze Human-Robot Interaction[J].International Journal of Social Robotics,2021,13(1):833-849. [18]PUIG X,SHU T,LI S,et al.Watch-And-Help:A Challenge for Social Perception and Human-AI Collaboration[C]//Procee-dings of the International Conference on Learning Representations.Vienna:ICLR,2021:1-23. [19]AJOUDANI A,ZANCHETTIN A M,IVALDI S,et al.Progress and Prospects of the Human-robot Collaboration[J].Autonomous Robots,2017,42(5):957-975. [20]GOODRICH M A,SCHULTZ A C.Human-Robot Interaction:A Survey[J].Foundations and Trends in Human-Computer Interaction,2007,1(3):203-275. [21]MICHALOS G,KARAGIANNIS P,DIMITROPOULOS N,et al.The 21st century industrial robot:When tools become collaborators[M].Berlin:Springer International Publishing,2021,17-29. [22]YANG G,ZHOU H Y,WANG B C.Digital Twin-driven Smart Human-machine Collaboration:Theory,Enabling Technologies and Applications[J].Journal of Mechanical Engineering,2022,58(18):279-291. [23]ABRAMSON J,AHUJA A,BRUSSEE A,et al.Imitating Interactive Intelligence[J].arXiv:2012.05672,2020. [24]MANGAL U,MOGHA S,MALIK S.Data-Driven DecisionMaking:Maximizing Insights Through Business Intelligence,Artificial Intelligence and Big Data Analytics[C]//Proceedings of the 2024 International Conference on Advances in Computing Research on Science Engineering and Technology.Indore:IEEE Xplore,2024:1-7. [25]HADDADIN S,CROFT E.Physical Human-Robot Interaction[M].Berlin:Springer International Publishing,2016:1835-1874. [26]LUCK M,MARK D.A Conceptual Framework for Agent Definition and Development[J].Computer Journal,2001,44:1-20. [27]HENTOUT A,AOUACHE M,MAOUDJ A,et al.Human-ro-bot interaction in industrial collaborative robotics:a literature review of the decade 2008-2017[J].Advanced Robotics,2019,33(15/16):764-799. [28]KOLBEINSSON A,LAGERSTEDT E,LINDBLOM J.Foundation for a classification of collaboration levels for human-robot cooperation in manufacturing[J].Production & Manufacturing Research,2019,7(1):448-471. [29]HAESEVOETS T,CREMER D,DIERCKX K,et al.Human-Machine Collaboration in Managerial Decision Making[J].Computers in Human Behavior,2021,119:106730. [30]WU X,CHANDRA R,GUAN T,et al.iPLAN:Intent-AwarePlanning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning[J].arXiv:2306.06236,2023. [31]FOSTER D J,FOSTER D P,GOLOWICH N,et al.On theComplexity of Multi-Agent Decision Making:From Learning in Games to Partial Monitoring[C]//Proceedings of the Annual Conference Computational Learning Theory.Bangalore:COLT,2023:2678-2792. [32]CHEN Y,YANG W,ZHANG T,et al.Commander-soldiers reinforcement learning for cooperative multi-agent systems[C]//Proceedings of the 2022 International Joint Conference on Neural Networks(IJCNN).IEEE,2022:1-7. [33]LIU C,LIU G.JointPPO:Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning[J].arXiv:2404.11831,2024. [34]OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal andApproximate Q-value Functions for Decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353. [35]LYU X,BAISERO A,XIAO Y,et al.On Centralized Critics in Multi-Agent Reinforcement Learning[J].Journal of Artificial Intelligence Research,2023,77:295-354. [36]WANG J,YE D,LU Z.More Centralized Training,Still Decentralized Execution:Multi-Agent Conditional Policy Factorization[C]//Proceedings of the International Conference on Learning Representations.Kigali:ICLR,2023:1-18. [37]ZHOU Y,LIU S,QING Y,et al.Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?[J].arXiv:2305.17352,2023. [38]MATIGNON L,LAURENT G J,LE FORT-PIAT N.Indepen-dent Reinforcement Learners in Cooperative Markov Games:A Survey Regarding Coordination Problems[J].The Knowledge Engineering Review,2012,27(1):1-31. [39]ZHANG J,ZHANG Y,ZHANG X S,et al.Intrinsic ActionTendency Consistency for Cooperative Multi-Agent Reinforcement Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2024:17600-17608. [40]MUTLU B,TERRELL A,HUANG C M.Coordination Mechanisms in Human-Robot Collaboration[C]//Proceedings of the HRI 2013 Workshop on Collaborative.2013. [41]MCKEE K R,LEIBO J Z,BEATTIE C,et al.Quantifying the Effects of Environment and Population Diversity in Multi-agent Reinforcement Learning[J].Autonomous Agents and Multi-Agent Systems,2022,36(1):1-16. [42]DAFOE A,HUGHES E,BACHRACH Y,et al.Open Problems in Cooperative AI[J].arXiv:2012.08630,2020. [43]WANG L,SUN L,TOMIZUKA M,et al.Socially-Compatible Behavior Design of Autonomous Vehicles With Verification on Real Human Data[J].IEEE Robotics and Automation Letters,2021,6(2):3421-3428. [44]WANG X,TIAN Z,WAN Z,et al.Order Matters:Agent-by-agent Policy Optimization[J].arXiv:2302.06205,2023. [45]SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm[J].arXiv:1712.01815,2017. [46]JADERBERG M,CZARNECKI W M,DUNNING I,et al.Human-level Performance in 3D Multiplayer Games with Population-based Reinforcement Learning[J].Science,2019,364(6443):859-865. [47]LOWE R,GUPTA A,FOERSTER J,et al.On the Interaction Between Supervision and Self-play in Emergent Communication[J].arXiv:2002.01093,2020. [48]BULLARD K,KIELA D,PINEAU J,et al.Quasi-Equivalence Discovery for Zero-Shot Emergent Communicatio[J].arXiv:2103.08067,2021. [49]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A UnifiedGame-Theoretic Approach to Multiagent Reinforcement Learning[C]//Proceedings of the Neural Information Processing Systems.Long Beach:NeurIPS,2017:4190-4203. [50]GARNELO M,CZARNECKI W M,LIU S,et al.Pick Your Battles:Interaction Graphs as Population-Level Objectives for Strategic Diversity[C]//Proceedings of the Autonomous Agents and Multiagent Systems.Virtual Event:AAMAS,2021:1501-1503. [51]KLEIMAN-WEINER M,LITTMAN M L,TENENBAUM J B,et al.Coordinate to Cooperate or Compete:Abstract Goals and Joint Intentions in Social Interaction [EB/OL].[2016-08-10].https://mindmodeling.org/cogsci2016/papers/0295/index.html. [52]SHUM M,KLEIMAN-WEINER M,LITTMAN M L,et al.Theory of Minds:Understanding Behavior in Groups through Inverse Planning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6163-6170. [53]LERER A,PEYSAKHOVICH A.Maintaining Cooperation inComplex Social Dilemmas Using Deep Reinforcement Learning[J].arXiv:1707.01068,2017. [54]TREUTLEIN J,DENNIS M,OESTERHELD C,et al.A NewFormalism,Method and Open Issues for Zero-Shot Coordination[C]//Proceedings of the International Conference on Machine Learning.Virtual Event:ICML,2021:10413-10423. [55]HU H,LERER A,PEYSAKHOVICH A,et al.“Other-Play” for Zero-Shot Coordination[C]//Proceedings of the Interna-tional Conference on Machine Learning.Virtual Event:ICML,2020:4399-4410. [56]LUPU A,HU H,FOERSTER J.Trajectory Diversity for Zero-Shot Coordination[J].Adaptive Agents and Multi-Agent Systems,2021,139:7204-7213. [57]CHOUDHURY R,SWAMY G,HADFIELD-MENELL D,et al.On the Utility of Model Learning in HRI[C]//Proceedings of the 2019 14th ACM/IEEE International Conference on Human-Robot Interaction(HRI).IEEE:Daegu,2019:317-325. [58]SADIGH D,LANDOLFI N,SASTRY S S,et al.Planning forCars that Coordinate with People:Leveraging Effects on Human Actions for Planning and Active Information Gathering over Human Internal State[J].Autonomous Robots,2018,42:1405-1426. [59]BROWN N,SANDHOLM T.Superhuman AI for MultiplayerPoker[J].Science,2019,365(6456):885-890. [60]CHARAKORN R,MANOONPONG P,DILOKTHANAKULN.Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning[M].Bangkok:Springer International Publishing,2020:395-402. [61]SARKAR B,SHIH A,SADIGH D.Diverse Conventions forHuman-AI Collaboration[C]//Proceedings of the Neural Information Processing Systems.New Orleans:NeurIPS,2023:1-25. [62]KUBA J,FENG X,DING S,et al.Heterogeneous-Agent Mirror Learning:A Continuum of Solutions to Cooperative MARL[J].arXiv:2208.01682,2022. [63]XUE K,WANG Y,YUAN L,et al.Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution[J].arXiv:2208.04957,2022. [64]ZHAO R,SONG J,YUAN Y,et al.Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination[J].Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(5):6145-6153. [65]MNIH V,ADRIÈ PUIGDOMÈNECH B,MIRZA M,et al.Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of the International Conference on Machine Lear-ning.New York:NeurIPS,2016:1928-1937. [66]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the International Conference on Machine Learning.Stockholm:NeurIPS,2018:1856-1865. [67]CHARAKORN R,MANOONPONG P,DILOKTHANAKULN.Generating Diverse Cooperative Agents by Learning Incompatible Policies[C]//Proceedings of the International Confe-rence on Learning Representations.Kigali:ICLR,2023:1-15. [68]YU C,CHAO J X,LIU W L et al.Learning Zero-Shot Cooperation with Humans,Assuming Humans Are Biased[J].arXiv:2302.01605,2023. [69]WANG X,ZHANG S,ZHANG W,et al.Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners[J].arXiv:2310.05208,2023. [70]KIRK R,ZHANG A,GREFENSTETTE E,et al.A Survey of Zero-shot Generalisation in Deep Reinforcement Learning[J].Journal of Artificial Intelligence Research,2023,76:201-264. [71]AGARWAL R,SCHWARZER M,CASTRO P S,et al.DeepReinforcement Learning at the Edge of the Statistical Precipice[C]//Proceedings of the Neural Information Processing Systems.Virtual Event:NeurIPS,2021:29304-29320. [72]KNOTT P,CARROLL M,DEVLIN S,et al.Evaluating the Robustness of Collaborative Agents[C]//Proceedings of the Adaptive Agents and Multi-Agent Systems.UK:AAMAS,2021:1560-1562. [73]MUGLICH D,WITT C S D,VAN DER POL E,et al.Equivariant Networks for Zero-Shot Coordination[C]//Proceedings of the Neural Information Processing Systems.New Orleans:NeurIPS,2022:6410-6423. [74]LI Y,ZHANG S,SUN J,et al.Cooperative Open-ended Lear-ning Framework for Zero-shot Coordination[J].International Conference on Machine Learning,2023,202:20470-20484. [75]FOSONG E,RAHMAN A,CARLUCHO I,et al.Few-ShotTeamwork[J].arXiv:2207.09300,2022. [76]DING H,JIA C,GUAN C.Coordination Scheme Probing forGeneralizable Multi-agent Reinforcement Learning[C]//Proceedings of the ICLR 2023 Conference Blind Submission.2023. [77]YUAN L,LI L,ZHANG Z,et al.Multi-agent Continual Coordination via Progressive Task Contextualization[J].arXiv:2305.13937,2023. [78]ISLAM S,DAS S,GOTTIPATI S K,et al.Human-AI Collaboration in Real-World Complex Environment with Reinforcement Learning[J].arXiv:2312.15160,2023. [79]WAYTOWICH N R,HARE J,GOECKS V G,et al.Learning to Guide Multiple Heterogeneous Actors From A Single Human Demonstration via Automatic Curriculum Learning in StarCraft II[J].arXiv:2205.05784,2022. [80]SHIH A,SAWHNEY A,KONDIC J,et al.On the Critical Role of Conventions in Adaptive Human-AI Collaboration[J].arXiv:2104.02871,2021. [81]BHATT A,NANDAN V.Med-Bot:An AI-Powered Assistantto Provide Accurate and Reliable Medical Information[J].ar-Xiv:2411.09648,2024. [82]DUAN Z,WANG J.Exploration of LLM Multi-Agent Applica-tion Implementation Based on LangGraph+CrewAI[J].arXiv:2411.18241,2024. |
|