基于混合模仿学习的多智能体追捕决策方法

doi:10.11896/jsjkx.240800072

Abstract

Abstract: Aiming at the limitations of traditional imitation learning approaches in handling diverse expert trajectories,particularly the difficulty in effectively integrating fixed-modality expert data of varying quality,this paper innovatively integrates the multiple trajectories generative adversarial imitation learning(MT-GAIL) method with temporal-difference error behavioral cloning(TD-BC) technology to construct a hybrid imitation learning framework.This framework not only enhances the model’s adaptability to complex and dynamic expert strategies but also improves its robustness in extracting useful information from low-quality data.The resulting model from this framework is directly applicable to reinforcement learning,requiring only minor adjustments and optimizations to train a readily usable reinforcement learning model grounded in expert experience.Experimental validation in a two-dimensional dynamic-static hybrid target pursuit scenario demonstrates the method’s impressive performance.The results indicate that the proposed method effectively assimilates expert knowledge,providing a high-starting-point and effective initial model for subsequent reinforcement learning training phases.

Key words: Intelligent decision-making, Reinforcement learning, Behavior cloning, Generative adversarial imitation learning

CLC Number:

TP182

WANG Yanning, ZHANG Fengdi, XIAO Dengmin, SUN Zhongqi. Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning[J].Computer Science, 2025, 52(1): 323-330.

References

[1]WEN G H,YANG T,ZHOU J L,et al.Reinforcement learning and adaptive/approximate dynamic programming:A survey from theory to applications in multi-agent systems[J].Control and Decision,2023,38(5):1200-1230.
[2]ZHANG M Y,DOU Y J,CHEN Z Y,et al.Review of deep rein-forcement learning and its applications in military field[J].Systems Engineering and Electronics,2024,46(4):1297-1308.
[3]HAO J Y,SHAO K,LI K,et al.Research and Application ofGame Intelligence[J].SCIENTIA SINICA(Informationis),2023,53(10):1892-1923.
[4]KHATIB O.Real-time obstacle avoidance for manipulators and mobile robots[C]//IEEE International Conference on Robotics and Automation(ICRA).IEEE,1985:500-505.
[5]WANG X F,GU K R.A penetration strategy combining deep reinforcement learning and imitation learning[J].Journal of Astronautics,2023,44(6):914-925.
[6]LI Y Z,SONG J M,ERMON S.InfoGAIL:Interpretable imitation learning from visual demonstrations[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:3815-3825.
[7]WANG Z Y,MEREL J,REED S,et al.Robust imitation of diverse behaviors[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:5326-5335.
[8]JOSH M,TASSA Y,DHRUVA T,et al.Learning human behaviors from motion capture by adversarial imitation[J].arXiv:1707.02201,2017.
[9]LIN J H,ZHANG Z Z.ACGAIL:Imitation learning about multiple intentions with auxiliary classifier GANs[C]//15th Pacific Rim International Conference on Artificial Intelligence(PRICAI).Switzerland:Springer,Cham,2018:321-334.
[10]RAUNAK P B,DEREK J P,BLAKE W,et al.Multi-agent imitation learning for driving simulation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).Piscataway:IEEE,2018:1534-1539.
[11]FU Y P,DENG X Y,ZHU Z Q,et al.Fixed-wing aircraft attitude controller based on imitation reinforcement learning[J].Journal of Naval Aeronautical and Astronautical University,2022,37(5):393-399.
[12]WANG H J,TAO Y,LU C F.A Reinforcement ImitationLearning-based Robot Navigation Method with Collision Prediction[J].Computer Engineering and Applications,2024,60(10):341-352.
[13]POMERLEAU D A.Efficient training of artificial neural net-works for autonomous navigation[J].Neural Computation,1991,3(1):88-97.
[14]BOJARSKI M,TESTA D D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016.
[15]PFLUEGER M,AGHA A,SUKHATME S G.Rover-IRL:Inverse reinforcement learning with soft value iteration networks for planetary rover path planning[J].IEEE Robotics and Automation Letters,2019,4(2):1387-1394.
[16]ANDREW Y N,STUART J R.Algorithms for inverse rein-forcement learning[C]//17th International Conference on Machine Learning(ICML).Association for Computing Machinery,2000:663-670.
[17]WU S B,FU Q M,CHEN J P,et al.Meta-inverse reinforcement learning method based on relative entropy[J].Computer Science,2021,48(9):257-263.
[18]JONATHAN H,STEFANO E.Generative adversarial imitation learning[C]//30th International Conference on Neural Information Processing Systems.Curran Associates Inc,2016:4572-4580.
[19]JIANG C,ZHANG Z C,CHEN Z X,et al.Data efficient third-person imitation learning method[J].Computer Science,2021,48(2):238-244.
[20]XIAO D M,WANG B,SUN Z Q,et al.Behavioral cloning based model generation method for reinforcement learning[C]//China Automation Congress(CAC).IEEE,2023:6776-6781.
[21]XIAO D M,WANG B,SUN Z Q,et al.Imitation learning me-thod of multi-quality expert data based on GAIL[C]//China Symposium on Cognitive Computing and Hybrid Intelligence(CCHI).IEEE,2023:8642-8647.

Related Articles 15

[1]	BAO Zepeng, QIAN Tieyun. Survey on Large Model Red Teaming [J]. Computer Science, 2025, 52(1): 34-41.
[2]	LI Tingting, WANG Qi, WANG Jiakang, XU Yongjun. SWARM-LLM:An Unmanned Swarm Task Planning System Based on Large Language Models [J]. Computer Science, 2025, 52(1): 72-79.
[3]	YAN Yusong, ZHOU Yuan, WANG Cong, KONG Shengqi, WANG Quan, LI Minne, WANG Zhiyuan. COA Generation Based on Pre-trained Large Language Models [J]. Computer Science, 2025, 52(1): 80-86.
[4]	WANG Qidi, SHEN Liwei, WU Tianyi. Option Discovery Method Based on Symbolic Knowledge [J]. Computer Science, 2025, 52(1): 277-288.
[5]	YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232.
[6]	WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[7]	ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[8]	LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271.
[9]	WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326.
[10]	GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[11]	ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5.
[12]	LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[13]	WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[14]	HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[15]	XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0