基于混合模仿学习的多智能体追捕决策方法

doi:10.11896/jsjkx.240800072

计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 323-330.doi: 10.11896/jsjkx.240800072

基于混合模仿学习的多智能体追捕决策方法

王焱宁^1,2, 张锋镝^1,2, 肖登敏³, 孙中奇⁴

1 北京航天自动控制研究所北京 100854
2 宇航智能控制技术全国重点实验室北京 100854
3 中船智海创新研究院有限公司北京 100094
4 北京理工大学自动化学院北京 100081

收稿日期:2024-08-13 修回日期:2024-09-23 出版日期:2025-01-15 发布日期:2025-01-09
通讯作者: 肖登敏(2712538468@qq.com)
作者简介:(wyn_81_2049@163.com)

Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning

WANG Yanning^1,2, ZHANG Fengdi^1,2, XIAO Dengmin³, SUN Zhongqi⁴

1 Beijing Aerospace Automatic Control Institute,Beijing 100854,China
2 National Key Laboratory of Science and Technology on Aerospace Intelligence Control,Beijing 100854,China
3 China Ship Intelligence and Marine Innovation Research Institute Co.,Ltd.,Beijing 100094,China
4 School of Automation,Beijing Institute of Technology,Beijing 100081,China

Received:2024-08-13 Revised:2024-09-23 Online:2025-01-15 Published:2025-01-09
About author:WANG Yanning,born in 1981,master.His main research interests is reinforcement learning.
XIAO Dengmin,born in 1999, master.Her main research interests include imitation learning and reinforcement lear-ning.

摘要/Abstract

摘要： 针对传统模仿学习方法在处理多样化专家轨迹时的局限性,尤其是难以有效整合质量参差不齐的固定模态专家数据的问题,创新性地融合了多专家轨迹生成对抗模仿学习(Multiple Trajectories Generative Adversarial Imitation Learning,MT-GAIL)方法与时序差分误差行为克隆(Temporal-Difference Error Behavioral Cloning,TD-BC)技术,构建了一种混合模仿学习框架。该框架不仅可以增强模型对复杂多变的专家策略的适应能力,还能够提升模型从低质量数据中提炼有用信息的鲁棒性。框架得到的模型具备直接应用于强化学习的能力,仅需经过细微的调整与优化,即可训练出一个直接可用的、基于专家经验的强化学习模型。在二维动静结合的目标追捕场景中进行了实验验证,该方法展现出良好的性能。结果表明,所提方法可以吸取专家经验,为后续的强化学习训练阶段提供一个起点高、效果佳的初始模型。

关键词: 智能决策, 强化学习, 行为克隆, 生成对抗模仿学习

Abstract: Aiming at the limitations of traditional imitation learning approaches in handling diverse expert trajectories,particularly the difficulty in effectively integrating fixed-modality expert data of varying quality,this paper innovatively integrates the multiple trajectories generative adversarial imitation learning(MT-GAIL) method with temporal-difference error behavioral cloning(TD-BC) technology to construct a hybrid imitation learning framework.This framework not only enhances the model’s adaptability to complex and dynamic expert strategies but also improves its robustness in extracting useful information from low-quality data.The resulting model from this framework is directly applicable to reinforcement learning,requiring only minor adjustments and optimizations to train a readily usable reinforcement learning model grounded in expert experience.Experimental validation in a two-dimensional dynamic-static hybrid target pursuit scenario demonstrates the method’s impressive performance.The results indicate that the proposed method effectively assimilates expert knowledge,providing a high-starting-point and effective initial model for subsequent reinforcement learning training phases.

Key words: Intelligent decision-making, Reinforcement learning, Behavior cloning, Generative adversarial imitation learning

中图分类号:

TP182

王焱宁, 张锋镝, 肖登敏, 孙中奇. 基于混合模仿学习的多智能体追捕决策方法[J]. 计算机科学, 2025, 52(1): 323-330. https://doi.org/10.11896/jsjkx.240800072

WANG Yanning, ZHANG Fengdi, XIAO Dengmin, SUN Zhongqi. Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning[J]. Computer Science, 2025, 52(1): 323-330. https://doi.org/10.11896/jsjkx.240800072

参考文献

[1]WEN G H,YANG T,ZHOU J L,et al.Reinforcement learning and adaptive/approximate dynamic programming:A survey from theory to applications in multi-agent systems[J].Control and Decision,2023,38(5):1200-1230.
[2]ZHANG M Y,DOU Y J,CHEN Z Y,et al.Review of deep rein-forcement learning and its applications in military field[J].Systems Engineering and Electronics,2024,46(4):1297-1308.
[3]HAO J Y,SHAO K,LI K,et al.Research and Application ofGame Intelligence[J].SCIENTIA SINICA(Informationis),2023,53(10):1892-1923.
[4]KHATIB O.Real-time obstacle avoidance for manipulators and mobile robots[C]//IEEE International Conference on Robotics and Automation(ICRA).IEEE,1985:500-505.
[5]WANG X F,GU K R.A penetration strategy combining deep reinforcement learning and imitation learning[J].Journal of Astronautics,2023,44(6):914-925.
[6]LI Y Z,SONG J M,ERMON S.InfoGAIL:Interpretable imitation learning from visual demonstrations[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:3815-3825.
[7]WANG Z Y,MEREL J,REED S,et al.Robust imitation of diverse behaviors[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:5326-5335.
[8]JOSH M,TASSA Y,DHRUVA T,et al.Learning human behaviors from motion capture by adversarial imitation[J].arXiv:1707.02201,2017.
[9]LIN J H,ZHANG Z Z.ACGAIL:Imitation learning about multiple intentions with auxiliary classifier GANs[C]//15th Pacific Rim International Conference on Artificial Intelligence(PRICAI).Switzerland:Springer,Cham,2018:321-334.
[10]RAUNAK P B,DEREK J P,BLAKE W,et al.Multi-agent imitation learning for driving simulation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).Piscataway:IEEE,2018:1534-1539.
[11]FU Y P,DENG X Y,ZHU Z Q,et al.Fixed-wing aircraft attitude controller based on imitation reinforcement learning[J].Journal of Naval Aeronautical and Astronautical University,2022,37(5):393-399.
[12]WANG H J,TAO Y,LU C F.A Reinforcement ImitationLearning-based Robot Navigation Method with Collision Prediction[J].Computer Engineering and Applications,2024,60(10):341-352.
[13]POMERLEAU D A.Efficient training of artificial neural net-works for autonomous navigation[J].Neural Computation,1991,3(1):88-97.
[14]BOJARSKI M,TESTA D D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016.
[15]PFLUEGER M,AGHA A,SUKHATME S G.Rover-IRL:Inverse reinforcement learning with soft value iteration networks for planetary rover path planning[J].IEEE Robotics and Automation Letters,2019,4(2):1387-1394.
[16]ANDREW Y N,STUART J R.Algorithms for inverse rein-forcement learning[C]//17th International Conference on Machine Learning(ICML).Association for Computing Machinery,2000:663-670.
[17]WU S B,FU Q M,CHEN J P,et al.Meta-inverse reinforcement learning method based on relative entropy[J].Computer Science,2021,48(9):257-263.
[18]JONATHAN H,STEFANO E.Generative adversarial imitation learning[C]//30th International Conference on Neural Information Processing Systems.Curran Associates Inc,2016:4572-4580.
[19]JIANG C,ZHANG Z C,CHEN Z X,et al.Data efficient third-person imitation learning method[J].Computer Science,2021,48(2):238-244.
[20]XIAO D M,WANG B,SUN Z Q,et al.Behavioral cloning based model generation method for reinforcement learning[C]//China Automation Congress(CAC).IEEE,2023:6776-6781.
[21]XIAO D M,WANG B,SUN Z Q,et al.Imitation learning me-thod of multi-quality expert data based on GAIL[C]//China Symposium on Cognitive Computing and Hybrid Intelligence(CCHI).IEEE,2023:8642-8647.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于混合模仿学习的多智能体追捕决策方法

Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0