Computer Science ›› 2025, Vol. 52 ›› Issue (1): 323-330.doi: 10.11896/jsjkx.240800072

• Artificial Intelligence • Previous Articles     Next Articles

Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning

WANG Yanning1,2, ZHANG Fengdi1,2, XIAO Dengmin3, SUN Zhongqi4   

  1. 1 Beijing Aerospace Automatic Control Institute,Beijing 100854,China
    2 National Key Laboratory of Science and Technology on Aerospace Intelligence Control,Beijing 100854,China
    3 China Ship Intelligence and Marine Innovation Research Institute Co.,Ltd.,Beijing 100094,China
    4 School of Automation,Beijing Institute of Technology,Beijing 100081,China
  • Received:2024-08-13 Revised:2024-09-23 Online:2025-01-15 Published:2025-01-09
  • About author:WANG Yanning,born in 1981,master.His main research interests is reinforcement learning.
    XIAO Dengmin,born in 1999, master.Her main research interests include imitation learning and reinforcement lear-ning.

Abstract: Aiming at the limitations of traditional imitation learning approaches in handling diverse expert trajectories,particularly the difficulty in effectively integrating fixed-modality expert data of varying quality,this paper innovatively integrates the multiple trajectories generative adversarial imitation learning(MT-GAIL) method with temporal-difference error behavioral cloning(TD-BC) technology to construct a hybrid imitation learning framework.This framework not only enhances the model’s adaptability to complex and dynamic expert strategies but also improves its robustness in extracting useful information from low-quality data.The resulting model from this framework is directly applicable to reinforcement learning,requiring only minor adjustments and optimizations to train a readily usable reinforcement learning model grounded in expert experience.Experimental validation in a two-dimensional dynamic-static hybrid target pursuit scenario demonstrates the method’s impressive performance.The results indicate that the proposed method effectively assimilates expert knowledge,providing a high-starting-point and effective initial model for subsequent reinforcement learning training phases.

Key words: Intelligent decision-making, Reinforcement learning, Behavior cloning, Generative adversarial imitation learning

CLC Number: 

  • TP182
[1]WEN G H,YANG T,ZHOU J L,et al.Reinforcement learning and adaptive/approximate dynamic programming:A survey from theory to applications in multi-agent systems[J].Control and Decision,2023,38(5):1200-1230.
[2]ZHANG M Y,DOU Y J,CHEN Z Y,et al.Review of deep rein-forcement learning and its applications in military field[J].Systems Engineering and Electronics,2024,46(4):1297-1308.
[3]HAO J Y,SHAO K,LI K,et al.Research and Application ofGame Intelligence[J].SCIENTIA SINICA(Informationis),2023,53(10):1892-1923.
[4]KHATIB O.Real-time obstacle avoidance for manipulators and mobile robots[C]//IEEE International Conference on Robotics and Automation(ICRA).IEEE,1985:500-505.
[5]WANG X F,GU K R.A penetration strategy combining deep reinforcement learning and imitation learning[J].Journal of Astronautics,2023,44(6):914-925.
[6]LI Y Z,SONG J M,ERMON S.InfoGAIL:Interpretable imitation learning from visual demonstrations[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:3815-3825.
[7]WANG Z Y,MEREL J,REED S,et al.Robust imitation of diverse behaviors[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:5326-5335.
[8]JOSH M,TASSA Y,DHRUVA T,et al.Learning human behaviors from motion capture by adversarial imitation[J].arXiv:1707.02201,2017.
[9]LIN J H,ZHANG Z Z.ACGAIL:Imitation learning about multiple intentions with auxiliary classifier GANs[C]//15th Pacific Rim International Conference on Artificial Intelligence(PRICAI).Switzerland:Springer,Cham,2018:321-334.
[10]RAUNAK P B,DEREK J P,BLAKE W,et al.Multi-agent imitation learning for driving simulation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).Piscataway:IEEE,2018:1534-1539.
[11]FU Y P,DENG X Y,ZHU Z Q,et al.Fixed-wing aircraft attitude controller based on imitation reinforcement learning[J].Journal of Naval Aeronautical and Astronautical University,2022,37(5):393-399.
[12]WANG H J,TAO Y,LU C F.A Reinforcement ImitationLearning-based Robot Navigation Method with Collision Prediction[J].Computer Engineering and Applications,2024,60(10):341-352.
[13]POMERLEAU D A.Efficient training of artificial neural net-works for autonomous navigation[J].Neural Computation,1991,3(1):88-97.
[14]BOJARSKI M,TESTA D D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016.
[15]PFLUEGER M,AGHA A,SUKHATME S G.Rover-IRL:Inverse reinforcement learning with soft value iteration networks for planetary rover path planning[J].IEEE Robotics and Automation Letters,2019,4(2):1387-1394.
[16]ANDREW Y N,STUART J R.Algorithms for inverse rein-forcement learning[C]//17th International Conference on Machine Learning(ICML).Association for Computing Machinery,2000:663-670.
[17]WU S B,FU Q M,CHEN J P,et al.Meta-inverse reinforcement learning method based on relative entropy[J].Computer Science,2021,48(9):257-263.
[18]JONATHAN H,STEFANO E.Generative adversarial imitation learning[C]//30th International Conference on Neural Information Processing Systems.Curran Associates Inc,2016:4572-4580.
[19]JIANG C,ZHANG Z C,CHEN Z X,et al.Data efficient third-person imitation learning method[J].Computer Science,2021,48(2):238-244.
[20]XIAO D M,WANG B,SUN Z Q,et al.Behavioral cloning based model generation method for reinforcement learning[C]//China Automation Congress(CAC).IEEE,2023:6776-6781.
[21]XIAO D M,WANG B,SUN Z Q,et al.Imitation learning me-thod of multi-quality expert data based on GAIL[C]//China Symposium on Cognitive Computing and Hybrid Intelligence(CCHI).IEEE,2023:8642-8647.
[1] BAO Zepeng, QIAN Tieyun. Survey on Large Model Red Teaming [J]. Computer Science, 2025, 52(1): 34-41.
[2] LI Tingting, WANG Qi, WANG Jiakang, XU Yongjun. SWARM-LLM:An Unmanned Swarm Task Planning System Based on Large Language Models [J]. Computer Science, 2025, 52(1): 72-79.
[3] YAN Yusong, ZHOU Yuan, WANG Cong, KONG Shengqi, WANG Quan, LI Minne, WANG Zhiyuan. COA Generation Based on Pre-trained Large Language Models [J]. Computer Science, 2025, 52(1): 80-86.
[4] WANG Qidi, SHEN Liwei, WU Tianyi. Option Discovery Method Based on Symbolic Knowledge [J]. Computer Science, 2025, 52(1): 277-288.
[5] YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232.
[6] WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[7] ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[8] LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271.
[9] WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326.
[10] GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[11] ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5.
[12] LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[13] WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[14] HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[15] XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!