计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 323-330.doi: 10.11896/jsjkx.240800072
王焱宁1,2, 张锋镝1,2, 肖登敏3, 孙中奇4
WANG Yanning1,2, ZHANG Fengdi1,2, XIAO Dengmin3, SUN Zhongqi4
摘要: 针对传统模仿学习方法在处理多样化专家轨迹时的局限性,尤其是难以有效整合质量参差不齐的固定模态专家数据的问题,创新性地融合了多专家轨迹生成对抗模仿学习(Multiple Trajectories Generative Adversarial Imitation Learning,MT-GAIL)方法与时序差分误差行为克隆(Temporal-Difference Error Behavioral Cloning,TD-BC)技术,构建了一种混合模仿学习框架。该框架不仅可以增强模型对复杂多变的专家策略的适应能力,还能够提升模型从低质量数据中提炼有用信息的鲁棒性。框架得到的模型具备直接应用于强化学习的能力,仅需经过细微的调整与优化,即可训练出一个直接可用的、基于专家经验的强化学习模型。在二维动静结合的目标追捕场景中进行了实验验证,该方法展现出良好的性能。结果表明,所提方法可以吸取专家经验,为后续的强化学习训练阶段提供一个起点高、效果佳的初始模型。
中图分类号:
[1]WEN G H,YANG T,ZHOU J L,et al.Reinforcement learning and adaptive/approximate dynamic programming:A survey from theory to applications in multi-agent systems[J].Control and Decision,2023,38(5):1200-1230. [2]ZHANG M Y,DOU Y J,CHEN Z Y,et al.Review of deep rein-forcement learning and its applications in military field[J].Systems Engineering and Electronics,2024,46(4):1297-1308. [3]HAO J Y,SHAO K,LI K,et al.Research and Application ofGame Intelligence[J].SCIENTIA SINICA(Informationis),2023,53(10):1892-1923. [4]KHATIB O.Real-time obstacle avoidance for manipulators and mobile robots[C]//IEEE International Conference on Robotics and Automation(ICRA).IEEE,1985:500-505. [5]WANG X F,GU K R.A penetration strategy combining deep reinforcement learning and imitation learning[J].Journal of Astronautics,2023,44(6):914-925. [6]LI Y Z,SONG J M,ERMON S.InfoGAIL:Interpretable imitation learning from visual demonstrations[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:3815-3825. [7]WANG Z Y,MEREL J,REED S,et al.Robust imitation of diverse behaviors[C]//31st International Conference on Neural Information Processing Systems(NIPS).Cambridge:MIT Press,2017:5326-5335. [8]JOSH M,TASSA Y,DHRUVA T,et al.Learning human behaviors from motion capture by adversarial imitation[J].arXiv:1707.02201,2017. [9]LIN J H,ZHANG Z Z.ACGAIL:Imitation learning about multiple intentions with auxiliary classifier GANs[C]//15th Pacific Rim International Conference on Artificial Intelligence(PRICAI).Switzerland:Springer,Cham,2018:321-334. [10]RAUNAK P B,DEREK J P,BLAKE W,et al.Multi-agent imitation learning for driving simulation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).Piscataway:IEEE,2018:1534-1539. [11]FU Y P,DENG X Y,ZHU Z Q,et al.Fixed-wing aircraft attitude controller based on imitation reinforcement learning[J].Journal of Naval Aeronautical and Astronautical University,2022,37(5):393-399. [12]WANG H J,TAO Y,LU C F.A Reinforcement ImitationLearning-based Robot Navigation Method with Collision Prediction[J].Computer Engineering and Applications,2024,60(10):341-352. [13]POMERLEAU D A.Efficient training of artificial neural net-works for autonomous navigation[J].Neural Computation,1991,3(1):88-97. [14]BOJARSKI M,TESTA D D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016. [15]PFLUEGER M,AGHA A,SUKHATME S G.Rover-IRL:Inverse reinforcement learning with soft value iteration networks for planetary rover path planning[J].IEEE Robotics and Automation Letters,2019,4(2):1387-1394. [16]ANDREW Y N,STUART J R.Algorithms for inverse rein-forcement learning[C]//17th International Conference on Machine Learning(ICML).Association for Computing Machinery,2000:663-670. [17]WU S B,FU Q M,CHEN J P,et al.Meta-inverse reinforcement learning method based on relative entropy[J].Computer Science,2021,48(9):257-263. [18]JONATHAN H,STEFANO E.Generative adversarial imitation learning[C]//30th International Conference on Neural Information Processing Systems.Curran Associates Inc,2016:4572-4580. [19]JIANG C,ZHANG Z C,CHEN Z X,et al.Data efficient third-person imitation learning method[J].Computer Science,2021,48(2):238-244. [20]XIAO D M,WANG B,SUN Z Q,et al.Behavioral cloning based model generation method for reinforcement learning[C]//China Automation Congress(CAC).IEEE,2023:6776-6781. [21]XIAO D M,WANG B,SUN Z Q,et al.Imitation learning me-thod of multi-quality expert data based on GAIL[C]//China Symposium on Cognitive Computing and Hybrid Intelligence(CCHI).IEEE,2023:8642-8647. |
|