Computer Science ›› 2025, Vol. 52 ›› Issue (10): 176-189.doi: 10.11896/jsjkx.241000047

• Artificial Intelligence • Previous Articles     Next Articles

Review of Research on Agent Training Methods Toward Human-Agent Collaboration

HUANG Weiye, CHEN Xiliang, LAI Jun   

  1. College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2024-10-11 Revised:2025-02-05 Online:2025-10-15 Published:2025-10-14
  • About author:HUANG Weiye,born in 1999,postgra-duate.His main research interest is multi-agent reinforcement learning.
    CHEN Xiliang,born in 1985,Ph.D,associate professor.His main research interests include command information system engineering and deep reinforcement learning.
  • Supported by:
    National Natural Science Foundation of China(62273356).

Abstract: Human-agent collaboration has received widespread attention in recent years,and multi-agent reinforcement learning has demonstrated significant advantages and application potential in the field of human-agent collaboration.This paper first introduces the basic concepts and important models of multi-agent reinforcement learning,and analyzes the advantages of multi-agent reinforcement learning in human-agent collaborative tasks,and introduces human-agent collaboration in three types.Secondly,it explores three training paradigms of multi-agent reinforcement learning,including centralized training and centralized execution,decentralized training and decentralized execution,and centralized training and decentralized execution,as well as the applicable scenarios for each training paradigm.Then,in response to the problems faced by agent training methods for human-agent collaboration,such as poor generalization ability,lack of diversity in training partners and inability to better adapt to human partners,it summarizes the research progress on agent training methods for human-agent collaboration from the perspective of whether human data is used or not.Finally,it discusses the application scenarios and future development trends of human-agent collaboration,proposes possible solutions and research directions.

Key words: Artificial intelligence,Multi-agent reinforcement learning,Human-agent collaboration,Zero-shot coordination

CLC Number: 

  • TP181
[1]GAO Y,LIU F,WANG L.Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games:A Communication Perspective[C]//Proceedings of the International Confe-rence on Learning Representations.Kigali:ICLR,2023:1-26.
[2]DORRI A,KANHERE S S,JURDAK R.Multi-Agent Systems:A Survey[J].IEEE Access,2018,6:28573-28593.
[3]CHENS,WANG Y,SONG Z,et al.WHALES:A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving[J].arXiv:2411.13340,2024.
[4]CARROLL M,SHAH R,HO M K,et al.On the Utility ofLearning about Humans for Human-AI Coordination[C]//Proceedings of the Neural Information Processing Systems.Vancouver:NeurIPS,2019:5175-5186.
[5]BAIN M,SAMMUT C.A Framework for Behavioural Cloning[C]//Proceedings of the Machine Intelligence 15.Oxford:Oxford University Press,2000:103-129.
[6]STROUSE D,MCKEE K R,BOTVINICK M,et al.Collabora-ting with Humans without Human Data[C]//Proceedings of the Neural Information Processing Systems.Virtual Event:NeurIPS,2021:14502-14515.
[7]NEKOEI H,ZHAO X T,RAJENDRAN J,et al.Towards Few-shot Coordination:Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi[C]//Proceedings of the Conference on Lifelong Learning Agents.Montral:CoLLAs,2023:861-877.
[8]YUAN L,ZHANG Z,LI L,et al.A Survey of Progress on Co-operative Multi-agent Reinforcement Learning in Open Environment[J].arXiv:2312.01058,2023.
[9]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
[10]LI Y X.Deep Reinforcement Learning An Overview[J].arXiv:1810.06339,2018.
[11]WONG A,BÄCK T,KONONOVA A V,et al.Deep Multi-agentReinforcement Learning:Challenges and Directions[J].Artificial Intelligence Review,2022,56(6):5023-5056.
[12]OROOJLOOY A,HAJINEZHAD D.A Review of CooperativeMulti-agent Deep Reinforcement Learning[J].Applied Intelligence,2022,53(11):13677-13722.
[13]GRONAUER S,DIEPOLD K.Multi-agent Deep Reinforcement Learning:A Survey[J].Artificial Intelligence Review,2022,55:895-943.
[14]EKER B,OZKUCUR E,MERICLI C,et al.A Finite HorizonDEC-POMDP Approach to Multi-robot Task Learning[C]//Proceedings of the 2011 5th International Conference on Application of Information and Communication Technologies(AICT).Baku:IEEE Xplore,2011:1-5.
[15]YANG Y,WANG J.An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective[J].arXiv:2011.00583,2020.
[16]BERKEL N V,SKOV M B,KJELDSKOV J.Human-ai Interaction:Intermittent,Continuous and Proactive[J].Interactions,2021,28(6):67-71.
[17]ONNASCH L,ROESLER E.A Taxonomy to Structure and Analyze Human-Robot Interaction[J].International Journal of Social Robotics,2021,13(1):833-849.
[18]PUIG X,SHU T,LI S,et al.Watch-And-Help:A Challenge for Social Perception and Human-AI Collaboration[C]//Procee-dings of the International Conference on Learning Representations.Vienna:ICLR,2021:1-23.
[19]AJOUDANI A,ZANCHETTIN A M,IVALDI S,et al.Progress and Prospects of the Human-robot Collaboration[J].Autonomous Robots,2017,42(5):957-975.
[20]GOODRICH M A,SCHULTZ A C.Human-Robot Interaction:A Survey[J].Foundations and Trends in Human-Computer Interaction,2007,1(3):203-275.
[21]MICHALOS G,KARAGIANNIS P,DIMITROPOULOS N,et al.The 21st century industrial robot:When tools become collaborators[M].Berlin:Springer International Publishing,2021,17-29.
[22]YANG G,ZHOU H Y,WANG B C.Digital Twin-driven Smart Human-machine Collaboration:Theory,Enabling Technologies and Applications[J].Journal of Mechanical Engineering,2022,58(18):279-291.
[23]ABRAMSON J,AHUJA A,BRUSSEE A,et al.Imitating Interactive Intelligence[J].arXiv:2012.05672,2020.
[24]MANGAL U,MOGHA S,MALIK S.Data-Driven DecisionMaking:Maximizing Insights Through Business Intelligence,Artificial Intelligence and Big Data Analytics[C]//Proceedings of the 2024 International Conference on Advances in Computing Research on Science Engineering and Technology.Indore:IEEE Xplore,2024:1-7.
[25]HADDADIN S,CROFT E.Physical Human-Robot Interaction[M].Berlin:Springer International Publishing,2016:1835-1874.
[26]LUCK M,MARK D.A Conceptual Framework for Agent Definition and Development[J].Computer Journal,2001,44:1-20.
[27]HENTOUT A,AOUACHE M,MAOUDJ A,et al.Human-ro-bot interaction in industrial collaborative robotics:a literature review of the decade 2008-2017[J].Advanced Robotics,2019,33(15/16):764-799.
[28]KOLBEINSSON A,LAGERSTEDT E,LINDBLOM J.Foundation for a classification of collaboration levels for human-robot cooperation in manufacturing[J].Production & Manufacturing Research,2019,7(1):448-471.
[29]HAESEVOETS T,CREMER D,DIERCKX K,et al.Human-Machine Collaboration in Managerial Decision Making[J].Computers in Human Behavior,2021,119:106730.
[30]WU X,CHANDRA R,GUAN T,et al.iPLAN:Intent-AwarePlanning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning[J].arXiv:2306.06236,2023.
[31]FOSTER D J,FOSTER D P,GOLOWICH N,et al.On theComplexity of Multi-Agent Decision Making:From Learning in Games to Partial Monitoring[C]//Proceedings of the Annual Conference Computational Learning Theory.Bangalore:COLT,2023:2678-2792.
[32]CHEN Y,YANG W,ZHANG T,et al.Commander-soldiers reinforcement learning for cooperative multi-agent systems[C]//Proceedings of the 2022 International Joint Conference on Neural Networks(IJCNN).IEEE,2022:1-7.
[33]LIU C,LIU G.JointPPO:Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning[J].arXiv:2404.11831,2024.
[34]OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal andApproximate Q-value Functions for Decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353.
[35]LYU X,BAISERO A,XIAO Y,et al.On Centralized Critics in Multi-Agent Reinforcement Learning[J].Journal of Artificial Intelligence Research,2023,77:295-354.
[36]WANG J,YE D,LU Z.More Centralized Training,Still Decentralized Execution:Multi-Agent Conditional Policy Factorization[C]//Proceedings of the International Conference on Learning Representations.Kigali:ICLR,2023:1-18.
[37]ZHOU Y,LIU S,QING Y,et al.Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?[J].arXiv:2305.17352,2023.
[38]MATIGNON L,LAURENT G J,LE FORT-PIAT N.Indepen-dent Reinforcement Learners in Cooperative Markov Games:A Survey Regarding Coordination Problems[J].The Knowledge Engineering Review,2012,27(1):1-31.
[39]ZHANG J,ZHANG Y,ZHANG X S,et al.Intrinsic ActionTendency Consistency for Cooperative Multi-Agent Reinforcement Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2024:17600-17608.
[40]MUTLU B,TERRELL A,HUANG C M.Coordination Mechanisms in Human-Robot Collaboration[C]//Proceedings of the HRI 2013 Workshop on Collaborative.2013.
[41]MCKEE K R,LEIBO J Z,BEATTIE C,et al.Quantifying the Effects of Environment and Population Diversity in Multi-agent Reinforcement Learning[J].Autonomous Agents and Multi-Agent Systems,2022,36(1):1-16.
[42]DAFOE A,HUGHES E,BACHRACH Y,et al.Open Problems in Cooperative AI[J].arXiv:2012.08630,2020.
[43]WANG L,SUN L,TOMIZUKA M,et al.Socially-Compatible Behavior Design of Autonomous Vehicles With Verification on Real Human Data[J].IEEE Robotics and Automation Letters,2021,6(2):3421-3428.
[44]WANG X,TIAN Z,WAN Z,et al.Order Matters:Agent-by-agent Policy Optimization[J].arXiv:2302.06205,2023.
[45]SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm[J].arXiv:1712.01815,2017.
[46]JADERBERG M,CZARNECKI W M,DUNNING I,et al.Human-level Performance in 3D Multiplayer Games with Population-based Reinforcement Learning[J].Science,2019,364(6443):859-865.
[47]LOWE R,GUPTA A,FOERSTER J,et al.On the Interaction Between Supervision and Self-play in Emergent Communication[J].arXiv:2002.01093,2020.
[48]BULLARD K,KIELA D,PINEAU J,et al.Quasi-Equivalence Discovery for Zero-Shot Emergent Communicatio[J].arXiv:2103.08067,2021.
[49]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A UnifiedGame-Theoretic Approach to Multiagent Reinforcement Learning[C]//Proceedings of the Neural Information Processing Systems.Long Beach:NeurIPS,2017:4190-4203.
[50]GARNELO M,CZARNECKI W M,LIU S,et al.Pick Your Battles:Interaction Graphs as Population-Level Objectives for Strategic Diversity[C]//Proceedings of the Autonomous Agents and Multiagent Systems.Virtual Event:AAMAS,2021:1501-1503.
[51]KLEIMAN-WEINER M,LITTMAN M L,TENENBAUM J B,et al.Coordinate to Cooperate or Compete:Abstract Goals and Joint Intentions in Social Interaction [EB/OL].[2016-08-10].https://mindmodeling.org/cogsci2016/papers/0295/index.html.
[52]SHUM M,KLEIMAN-WEINER M,LITTMAN M L,et al.Theory of Minds:Understanding Behavior in Groups through Inverse Planning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6163-6170.
[53]LERER A,PEYSAKHOVICH A.Maintaining Cooperation inComplex Social Dilemmas Using Deep Reinforcement Learning[J].arXiv:1707.01068,2017.
[54]TREUTLEIN J,DENNIS M,OESTERHELD C,et al.A NewFormalism,Method and Open Issues for Zero-Shot Coordination[C]//Proceedings of the International Conference on Machine Learning.Virtual Event:ICML,2021:10413-10423.
[55]HU H,LERER A,PEYSAKHOVICH A,et al.“Other-Play” for Zero-Shot Coordination[C]//Proceedings of the Interna-tional Conference on Machine Learning.Virtual Event:ICML,2020:4399-4410.
[56]LUPU A,HU H,FOERSTER J.Trajectory Diversity for Zero-Shot Coordination[J].Adaptive Agents and Multi-Agent Systems,2021,139:7204-7213.
[57]CHOUDHURY R,SWAMY G,HADFIELD-MENELL D,et al.On the Utility of Model Learning in HRI[C]//Proceedings of the 2019 14th ACM/IEEE International Conference on Human-Robot Interaction(HRI).IEEE:Daegu,2019:317-325.
[58]SADIGH D,LANDOLFI N,SASTRY S S,et al.Planning forCars that Coordinate with People:Leveraging Effects on Human Actions for Planning and Active Information Gathering over Human Internal State[J].Autonomous Robots,2018,42:1405-1426.
[59]BROWN N,SANDHOLM T.Superhuman AI for MultiplayerPoker[J].Science,2019,365(6456):885-890.
[60]CHARAKORN R,MANOONPONG P,DILOKTHANAKULN.Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning[M].Bangkok:Springer International Publishing,2020:395-402.
[61]SARKAR B,SHIH A,SADIGH D.Diverse Conventions forHuman-AI Collaboration[C]//Proceedings of the Neural Information Processing Systems.New Orleans:NeurIPS,2023:1-25.
[62]KUBA J,FENG X,DING S,et al.Heterogeneous-Agent Mirror Learning:A Continuum of Solutions to Cooperative MARL[J].arXiv:2208.01682,2022.
[63]XUE K,WANG Y,YUAN L,et al.Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution[J].arXiv:2208.04957,2022.
[64]ZHAO R,SONG J,YUAN Y,et al.Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination[J].Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(5):6145-6153.
[65]MNIH V,ADRIÈ PUIGDOMÈNECH B,MIRZA M,et al.Asynchronous Methods for Deep Reinforcement Learning[C]//Proceedings of the International Conference on Machine Lear-ning.New York:NeurIPS,2016:1928-1937.
[66]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the International Conference on Machine Learning.Stockholm:NeurIPS,2018:1856-1865.
[67]CHARAKORN R,MANOONPONG P,DILOKTHANAKULN.Generating Diverse Cooperative Agents by Learning Incompatible Policies[C]//Proceedings of the International Confe-rence on Learning Representations.Kigali:ICLR,2023:1-15.
[68]YU C,CHAO J X,LIU W L et al.Learning Zero-Shot Cooperation with Humans,Assuming Humans Are Biased[J].arXiv:2302.01605,2023.
[69]WANG X,ZHANG S,ZHANG W,et al.Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners[J].arXiv:2310.05208,2023.
[70]KIRK R,ZHANG A,GREFENSTETTE E,et al.A Survey of Zero-shot Generalisation in Deep Reinforcement Learning[J].Journal of Artificial Intelligence Research,2023,76:201-264.
[71]AGARWAL R,SCHWARZER M,CASTRO P S,et al.DeepReinforcement Learning at the Edge of the Statistical Precipice[C]//Proceedings of the Neural Information Processing Systems.Virtual Event:NeurIPS,2021:29304-29320.
[72]KNOTT P,CARROLL M,DEVLIN S,et al.Evaluating the Robustness of Collaborative Agents[C]//Proceedings of the Adaptive Agents and Multi-Agent Systems.UK:AAMAS,2021:1560-1562.
[73]MUGLICH D,WITT C S D,VAN DER POL E,et al.Equivariant Networks for Zero-Shot Coordination[C]//Proceedings of the Neural Information Processing Systems.New Orleans:NeurIPS,2022:6410-6423.
[74]LI Y,ZHANG S,SUN J,et al.Cooperative Open-ended Lear-ning Framework for Zero-shot Coordination[J].International Conference on Machine Learning,2023,202:20470-20484.
[75]FOSONG E,RAHMAN A,CARLUCHO I,et al.Few-ShotTeamwork[J].arXiv:2207.09300,2022.
[76]DING H,JIA C,GUAN C.Coordination Scheme Probing forGeneralizable Multi-agent Reinforcement Learning[C]//Proceedings of the ICLR 2023 Conference Blind Submission.2023.
[77]YUAN L,LI L,ZHANG Z,et al.Multi-agent Continual Coordination via Progressive Task Contextualization[J].arXiv:2305.13937,2023.
[78]ISLAM S,DAS S,GOTTIPATI S K,et al.Human-AI Collaboration in Real-World Complex Environment with Reinforcement Learning[J].arXiv:2312.15160,2023.
[79]WAYTOWICH N R,HARE J,GOECKS V G,et al.Learning to Guide Multiple Heterogeneous Actors From A Single Human Demonstration via Automatic Curriculum Learning in StarCraft II[J].arXiv:2205.05784,2022.
[80]SHIH A,SAWHNEY A,KONDIC J,et al.On the Critical Role of Conventions in Adaptive Human-AI Collaboration[J].arXiv:2104.02871,2021.
[81]BHATT A,NANDAN V.Med-Bot:An AI-Powered Assistantto Provide Accurate and Reliable Medical Information[J].ar-Xiv:2411.09648,2024.
[82]DUAN Z,WANG J.Exploration of LLM Multi-Agent Applica-tion Implementation Based on LangGraph+CrewAI[J].arXiv:2411.18241,2024.
[1] LU Xueqin, XIE Xicheng, TANG Yan, CHEN Shikun, LIU Yangguang. Integration of Machine Learning Prediction and Water Wave Optimization for Online Customer Service Representatives Scheduling in Bank Contact Centers [J]. Computer Science, 2025, 52(10): 33-49.
[2] XU Xin, ZHU Hongbin, CHEN Jie, LI Qingwen, ZHANG Xiaorong, LYU Zhihui. Anti-money Laundering Detection Method for Asset Management Based on Temporal Graph Neural Networks [J]. Computer Science, 2025, 52(10): 60-69.
[3] CHEN Ping, LIU Kehan, LIANG Zhengyou, HU Qixing, ZHANG Yuanpeng. Sparsity Cooperated Correntropy Based Robust Principal Component Analysis [J]. Computer Science, 2025, 52(10): 134-143.
[4] HU Libin, ZHANG Yunfeng, LIU Peide. Synthetic Oversampling Method Based Noiseless Gradient Distribution [J]. Computer Science, 2025, 52(9): 220-231.
[5] ZHU Rui, YE Yaqin, LI Shengwen, TANG Zijian, XIAO Yue. Dynamic Community Detection with Hierarchical Modularity Optimization [J]. Computer Science, 2025, 52(8): 127-135.
[6] JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun. Clustering Algorithm Based on Improved SOM Model [J]. Computer Science, 2025, 52(8): 162-170.
[7] ZENG Xinran, LI Tianrui, LI Chongshou. Active Learning for Point Cloud Semantic Segmentation Based on Dynamic Balance and DistanceSuppression [J]. Computer Science, 2025, 52(8): 180-187.
[8] FU Wenhao, GE Liyong, WANG Wen, ZHANG Chun. Multi-UAV Path Planning Algorithm Based on Improved Dueling-DQN [J]. Computer Science, 2025, 52(8): 326-334.
[9] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[10] LI Jiawei , DENG Yuandan, CHEN Bo. Domain UML Model Automatic Construction Based on Fine-tuning Qwen2 [J]. Computer Science, 2025, 52(6A): 240900155-4.
[11] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[12] HUANG Ao, LI Min, ZENG Xiangguang, PAN Yunwei, ZHANG Jiaheng, PENG Bei. Adaptive Hybrid Genetic Algorithm Based on PPO for Solving Traveling Salesman Problem [J]. Computer Science, 2025, 52(6A): 240600096-6.
[13] SUN Yongqian, TANG Shouguo. Prediction of Moisture Content and Temperature of Tobacco Leaf Re-curing Outlet Based onImproved DBO-BP Neural Network [J]. Computer Science, 2025, 52(6A): 240900069-7.
[14] GAO Xinjun, ZHANG Meixin, ZHU Li. Study on Short-time Passenger Flow Data Generation and Prediction Method for RailTransportation [J]. Computer Science, 2025, 52(6A): 240600017-5.
[15] DU Yuanhua, CHEN Pan, ZHOU Nan, SHI Kaibo, CHEN Eryang, ZHANG Yuanpeng. Correntropy Based Multi-view Low-rank Matrix Factorization and Constraint Graph Learning for Multi-view Data Clustering [J]. Computer Science, 2025, 52(6A): 240900131-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!