面向人机协作的智能体训练方法研究综述

doi:10.11896/jsjkx.241000047

摘要/Abstract

摘要： 人机协作近年来受到广泛关注,多智能体强化学习在人机协作领域展现出了显著的优势和应用潜力。首先,对多智能体强化学习的基本概念和重要模型进行了介绍,分析了多智能体强化学习在人机协作任务中的优势,并将人机协作分为3种类型进行介绍。其次,论述了多智能体强化学习的3种训练范式,包括集中训练集中执行、分散训练分散执行和集中训练分散执行,以及每种训练范式的适用场景。接着,针对人机协作中智能体训练方法存在的泛化能力差、训练伙伴缺乏多样性以及无法更好地适应人类合作伙伴等问题,从是否使用人类数据的角度,论述了面向人机协作的智能体训练方法的研究进展。最后,讨论了人机协作的应用场景和未来发展趋势,提出了可能的解决思路与研究方向。

关键词: 人工智能, 多智能体强化学习, 人机协作, 零样本协调

Abstract: Human-agent collaboration has received widespread attention in recent years,and multi-agent reinforcement learning has demonstrated significant advantages and application potential in the field of human-agent collaboration.This paper first introduces the basic concepts and important models of multi-agent reinforcement learning,and analyzes the advantages of multi-agent reinforcement learning in human-agent collaborative tasks,and introduces human-agent collaboration in three types.Secondly,it explores three training paradigms of multi-agent reinforcement learning,including centralized training and centralized execution,decentralized training and decentralized execution,and centralized training and decentralized execution,as well as the applicable scenarios for each training paradigm.Then,in response to the problems faced by agent training methods for human-agent collaboration,such as poor generalization ability,lack of diversity in training partners and inability to better adapt to human partners,it summarizes the research progress on agent training methods for human-agent collaboration from the perspective of whether human data is used or not.Finally,it discusses the application scenarios and future development trends of human-agent collaboration,proposes possible solutions and research directions.

Key words: Artificial intelligence,Multi-agent reinforcement learning,Human-agent collaboration,Zero-shot coordination

中图分类号:

TP181

黄炜烨, 陈希亮, 赖俊. 面向人机协作的智能体训练方法研究综述[J]. 计算机科学, 2025, 52(10): 176-189. https://doi.org/10.11896/jsjkx.241000047

HUANG Weiye, CHEN Xiliang, LAI Jun. Review of Research on Agent Training Methods Toward Human-Agent Collaboration[J]. Computer Science, 2025, 52(10): 176-189. https://doi.org/10.11896/jsjkx.241000047

参考文献

[1]GAO Y,LIU F,WANG L.Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games:A Communication Perspective[C]//Proceedings of the International Confe-rence on Learning Representations.Kigali:ICLR,2023:1-26.
[2]DORRI A,KANHERE S S,JURDAK R.Multi-Agent Systems:A Survey[J].IEEE Access,2018,6:28573-28593.
[3]CHENS,WANG Y,SONG Z,et al.WHALES:A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving[J].arXiv:2411.13340,2024.
[4]CARROLL M,SHAH R,HO M K,et al.On the Utility ofLearning about Humans for Human-AI Coordination[C]//Proceedings of the Neural Information Processing Systems.Vancouver:NeurIPS,2019:5175-5186.
[5]BAIN M,SAMMUT C.A Framework for Behavioural Cloning[C]//Proceedings of the Machine Intelligence 15.Oxford:Oxford University Press,2000:103-129.
[6]STROUSE D,MCKEE K R,BOTVINICK M,et al.Collabora-ting with Humans without Human Data[C]//Proceedings of the Neural Information Processing Systems.Virtual Event:NeurIPS,2021:14502-14515.
[7]NEKOEI H,ZHAO X T,RAJENDRAN J,et al.Towards Few-shot Coordination:Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi[C]//Proceedings of the Conference on Lifelong Learning Agents.Montral:CoLLAs,2023:861-877.
[8]YUAN L,ZHANG Z,LI L,et al.A Survey of Progress on Co-operative Multi-agent Reinforcement Learning in Open Environment[J].arXiv:2312.01058,2023.
[9]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
[10]LI Y X.Deep Reinforcement Learning An Overview[J].arXiv:1810.06339,2018.
[11]WONG A,BÄCK T,KONONOVA A V,et al.Deep Multi-agentReinforcement Learning:Challenges and Directions[J].Artificial Intelligence Review,2022,56(6):5023-5056.
[12]OROOJLOOY A,HAJINEZHAD D.A Review of CooperativeMulti-agent Deep Reinforcement Learning[J].Applied Intelligence,2022,53(11):13677-13722.
[13]GRONAUER S,DIEPOLD K.Multi-agent Deep Reinforcement Learning:A Survey[J].Artificial Intelligence Review,2022,55:895-943.
[14]EKER B,OZKUCUR E,MERICLI C,et al.A Finite HorizonDEC-POMDP Approach to Multi-robot Task Learning[C]//Proceedings of the 2011 5th International Conference on Application of Information and Communication Technologies(AICT).Baku:IEEE Xplore,2011:1-5.
[15]YANG Y,WANG J.An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective[J].arXiv:2011.00583,2020.
[16]BERKEL N V,SKOV M B,KJELDSKOV J.Human-ai Interaction:Intermittent,Continuous and Proactive[J].Interactions,2021,28(6):67-71.
[17]ONNASCH L,ROESLER E.A Taxonomy to Structure and Analyze Human-Robot Interaction[J].International Journal of Social Robotics,2021,13(1):833-849.
[18]PUIG X,SHU T,LI S,et al.Watch-And-Help:A Challenge for Social Perception and Human-AI Collaboration[C]//Procee-dings of the International Conference on Learning Representations.Vienna:ICLR,2021:1-23.
[19]AJOUDANI A,ZANCHETTIN A M,IVALDI S,et al.Progress and Prospects of the Human-robot Collaboration[J].Autonomous Robots,2017,42(5):957-975.
[20]GOODRICH M A,SCHULTZ A C.Human-Robot Interaction:A Survey[J].Foundations and Trends in Human-Computer Interaction,2007,1(3):203-275.
[21]MICHALOS G,KARAGIANNIS P,DIMITROPOULOS N,et al.The 21st century industrial robot:When tools become collaborators[M].Berlin:Springer International Publishing,2021,17-29.
[22]YANG G,ZHOU H Y,WANG B C.Digital Twin-driven Smart Human-machine Collaboration:Theory,Enabling Technologies and Applications[J].Journal of Mechanical Engineering,2022,58(18):279-291.
[23]ABRAMSON J,AHUJA A,BRUSSEE A,et al.Imitating Interactive Intelligence[J].arXiv:2012.05672,2020.
[24]MANGAL U,MOGHA S,MALIK S.Data-Driven DecisionMaking:Maximizing Insights Through Business Intelligence,Artificial Intelligence and Big Data Analytics[C]//Proceedings of the 2024 International Conference on Advances in Computing Research on Science Engineering and Technology.Indore:IEEE Xplore,2024:1-7.
[25]HADDADIN S,CROFT E.Physical Human-Robot Interaction[M].Berlin:Springer International Publishing,2016:1835-1874.
[26]LUCK M,MARK D.A Conceptual Framework for Agent Definition and Development[J].Computer Journal,2001,44:1-20.
[27]HENTOUT A,AOUACHE M,MAOUDJ A,et al.Human-ro-bot interaction in industrial collaborative robotics:a literature review of the decade 2008-2017[J].Advanced Robotics,2019,33(15／16):764-799.
[28]KOLBEINSSON A,LAGERSTEDT E,LINDBLOM J.Foundation for a classification of collaboration levels for human-robot cooperation in manufacturing[J].Production & Manufacturing Research,2019,7(1):448-471.
[29]HAESEVOETS T,CREMER D,DIERCKX K,et al.Human-Machine Collaboration in Managerial Decision Making[J].Computers in Human Behavior,2021,119:106730.
[30]WU X,CHANDRA R,GUAN T,et al.iPLAN:Intent-AwarePlanning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning[J].arXiv:2306.06236,2023.
[31]FOSTER D J,FOSTER D P,GOLOWICH N,et al.On theComplexity of Multi-Agent Decision Making:From Learning in Games to Partial Monitoring[C]//Proceedings of the Annual Conference Computational Learning Theory.Bangalore:COLT,2023:2678-2792.
[32]CHEN Y,YANG W,ZHANG T,et al.Commander-soldiers reinforcement learning for cooperative multi-agent systems[C]//Proceedings of the 2022 International Joint Conference on Neural Networks(IJCNN).IEEE,2022:1-7.
[33]LIU C,LIU G.JointPPO:Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning[J].arXiv:2404.11831,2024.
[34]OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal andApproximate Q-value Functions for Decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353.
[35]LYU X,BAISERO A,XIAO Y,et al.On Centralized Critics in Multi-Agent Reinforcement Learning[J].Journal of Artificial Intelligence Research,2023,77:295-354.
[36]WANG J,YE D,LU Z.More Centralized Training,Still Decentralized Execution:Multi-Agent Conditional Policy Factorization[C]//Proceedings of the International Conference on Learning Representations.Kigali:ICLR,2023:1-18.
[37]ZHOU Y,LIU S,QING Y,et al.Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?[J].arXiv:2305.17352,2023.
[38]MATIGNON L,LAURENT G J,LE FORT-PIAT N.Indepen-dent Reinforcement Learners in Cooperative Markov Games:A Survey Regarding Coordination Problems[J].The Knowledge Engineering Review,2012,27(1):1-31.
[39]ZHANG J,ZHANG Y,ZHANG X S,et al.Intrinsic ActionTendency Consistency for Cooperative Multi-Agent Reinforcement Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2024:17600-17608.
[40]MUTLU B,TERRELL A,HUANG C M.Coordination Mechanisms in Human-Robot Collaboration[C]//Proceedings of the HRI 2013 Workshop on Collaborative.2013.
[41]MCKEE K R,LEIBO J Z,BEATTIE C,et al.Quantifying the Effects of Environment and Population Diversity in Multi-agent Reinforcement Learning[J].Autonomous Agents and Multi-Agent Systems,2022,36(1):1-16.
[42]DAFOE A,HUGHES E,BACHRACH Y,et al.Open Problems in Cooperative AI[J].arXiv:2012.08630,2020.
[43]WANG L,SUN L,TOMIZUKA M,et al.Socially-Compatible Behavior Design of Autonomous Vehicles With Verification on Real Human Data[J].IEEE Robotics and Automation Letters,2021,6(2):3421-3428.
[44]WANG X,TIAN Z,WAN Z,et al.Order Matters:Agent-by-agent Policy Optimization[J].arXiv:2302.06205,2023.
[45]SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm[J].arXiv:1712.01815,2017.
[46]JADERBERG M,CZARNECKI W M,DUNNING I,et al.Human-level Performance in 3D Multiplayer Games with Population-based Reinforcement Learning[J].Science,2019,364(6443):859-865.
[47]LOWE R,GUPTA A,FOERSTER J,et al.On the Interaction Between Supervision and Self-play in Emergent Communication[J].arXiv:2002.01093,2020.
[48]BULLARD K,KIELA D,PINEAU J,et al.Quasi-Equivalence Discovery for Zero-Shot Emergent Communicatio[J].arXiv:2103.08067,2021.
[49]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A UnifiedGame-Theoretic Approach to Multiagent Reinforcement Learning[C]//Proceedings of the Neural Information Processing Systems.Long Beach:NeurIPS,2017:4190-4203.
[50]GARNELO M,CZARNECKI W M,LIU S,et al.Pick Your Battles:Interaction Graphs as Population-Level Objectives for Strategic Diversity[C]//Proceedings of the Autonomous Agents and Multiagent Systems.Virtual Event:AAMAS,2021:1501-1503.
[51]KLEIMAN-WEINER M,LITTMAN M L,TENENBAUM J B,et al.Coordinate to Cooperate or Compete:Abstract Goals and Joint Intentions in Social Interaction [EB/OL].[2016-08-10].https://mindmodeling.org/cogsci2016/papers/0295/index.html.
[52]SHUM M,KLEIMAN-WEINER M,LITTMAN M L,et al.Theory of Minds:Understanding Behavior in Groups through Inverse Planning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6163-6170.
[53]LERER A,PEYSAKHOVICH A.Maintaining Cooperation inComplex Social Dilemmas Using Deep Reinforcement Learning[J].arXiv:1707.01068,2017.
[54]TREUTLEIN J,DENNIS M,OESTERHELD C,et al.A NewFormalism,Method and Open Issues for Zero-Shot Coordination[C]//Proceedings of the International Conference on Machine Learning.Virtual Event:ICML,2021:10413-10423.
[55]HU H,LERER A,PEYSAKHOVICH A,et al.“Other-Play” for Zero-Shot Coordination[C]//Proceedings of the Interna-tional Conference on Machine Learning.Virtual Event:ICML,2020:4399-4410.
[56]LUPU A,HU H,FOERSTER J.Trajectory Diversity for Zero-Shot Coordination[J].Adaptive Agents and Multi-Agent Systems,2021,139:7204-7213.
[57]CHOUDHURY R,SWAMY G,HADFIELD-MENELL D,et al.On the Utility of Model Learning in HRI[C]//Proceedings of the 2019 14th ACM/IEEE International Conference on Human-Robot Interaction(HRI).IEEE:Daegu,2019:317-325.
[58]SADIGH D,LANDOLFI N,SASTRY S S,et al.Planning forCars that Coordinate with People:Leveraging Effects on Human Actions for Planning and Active Information Gathering over Human Internal State[J].Autonomous Robots,2018,42:1405-1426.
[59]BROWN N,SANDHOLM T.Superhuman AI for MultiplayerPoker[J].Science,2019,365(6456):885-890.
[60]CHARAKORN R,MANOONPONG P,DILOKTHANAKULN.Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning[M].Bangkok:Springer International Publishing,2020:395-402.
[61]SARKAR B,SHIH A,SADIGH D.Diverse Conventions forHuman-AI Collaboration[C]//Proceedings of the Neural Information Processing Systems.New Orleans:NeurIPS,2023:1-25.
[62]KUBA J,FENG X,DING S,et al.Heterogeneous-Agent Mirror Learning:A Continuum of Solutions to Cooperative MARL[J].arXiv:2208.01682,2022.
[63]XUE K,WANG Y,YUAN L,et al.Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution[J].arXiv:2208.04957,2022.
[64]ZHAO R,SONG J,YUAN Y,et al.Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination[J].Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(5):6145-6153.
[65]MNIH V,ADRIÈ PUIGDOMÈNECH B,MIRZA M,et al.Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of the International Conference on Machine Lear-ning.New York:NeurIPS,2016:1928-1937.
[66]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the International Conference on Machine Learning.Stockholm:NeurIPS,2018:1856-1865.
[67]CHARAKORN R,MANOONPONG P,DILOKTHANAKULN.Generating Diverse Cooperative Agents by Learning Incompatible Policies[C]//Proceedings of the International Confe-rence on Learning Representations.Kigali:ICLR,2023:1-15.
[68]YU C,CHAO J X,LIU W L et al.Learning Zero-Shot Cooperation with Humans,Assuming Humans Are Biased[J].arXiv:2302.01605,2023.
[69]WANG X,ZHANG S,ZHANG W,et al.Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners[J].arXiv:2310.05208,2023.
[70]KIRK R,ZHANG A,GREFENSTETTE E,et al.A Survey of Zero-shot Generalisation in Deep Reinforcement Learning[J].Journal of Artificial Intelligence Research,2023,76:201-264.
[71]AGARWAL R,SCHWARZER M,CASTRO P S,et al.DeepReinforcement Learning at the Edge of the Statistical Precipice[C]//Proceedings of the Neural Information Processing Systems.Virtual Event:NeurIPS,2021:29304-29320.
[72]KNOTT P,CARROLL M,DEVLIN S,et al.Evaluating the Robustness of Collaborative Agents[C]//Proceedings of the Adaptive Agents and Multi-Agent Systems.UK:AAMAS,2021:1560-1562.
[73]MUGLICH D,WITT C S D,VAN DER POL E,et al.Equivariant Networks for Zero-Shot Coordination[C]//Proceedings of the Neural Information Processing Systems.New Orleans:NeurIPS,2022:6410-6423.
[74]LI Y,ZHANG S,SUN J,et al.Cooperative Open-ended Lear-ning Framework for Zero-shot Coordination[J].International Conference on Machine Learning,2023,202:20470-20484.
[75]FOSONG E,RAHMAN A,CARLUCHO I,et al.Few-ShotTeamwork[J].arXiv:2207.09300,2022.
[76]DING H,JIA C,GUAN C.Coordination Scheme Probing forGeneralizable Multi-agent Reinforcement Learning[C]//Proceedings of the ICLR 2023 Conference Blind Submission.2023.
[77]YUAN L,LI L,ZHANG Z,et al.Multi-agent Continual Coordination via Progressive Task Contextualization[J].arXiv:2305.13937,2023.
[78]ISLAM S,DAS S,GOTTIPATI S K,et al.Human-AI Collaboration in Real-World Complex Environment with Reinforcement Learning[J].arXiv:2312.15160,2023.
[79]WAYTOWICH N R,HARE J,GOECKS V G,et al.Learning to Guide Multiple Heterogeneous Actors From A Single Human Demonstration via Automatic Curriculum Learning in StarCraft II[J].arXiv:2205.05784,2022.
[80]SHIH A,SAWHNEY A,KONDIC J,et al.On the Critical Role of Conventions in Adaptive Human-AI Collaboration[J].arXiv:2104.02871,2021.
[81]BHATT A,NANDAN V.Med-Bot:An AI-Powered Assistantto Provide Accurate and Reliable Medical Information[J].ar-Xiv:2411.09648,2024.
[82]DUAN Z,WANG J.Exploration of LLM Multi-Agent Applica-tion Implementation Based on LangGraph+CrewAI[J].arXiv:2411.18241,2024.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed