计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240100211-11.doi: 10.11896/jsjkx.240100211

• 智能计算 • 上一篇    下一篇

基于策略融合及Spiking DRL的移动机器人路径规划方法

安阳1,2,3, 王秀青1,2,3, 赵明华1   

  1. 1 河北师范大学计算机与网络空间安全学院 石家庄 050024
    2 河北省网络与信息安全重点实验室 石家庄 050024
    3 河北省供应链大数据分析与数据安全工程研究中心 石家庄 050024
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 王秀青(xqwang@hebtu.edu.cn)
  • 作者简介:(ay840962872@163.com)
  • 基金资助:
    国家自然科学基金面上项目(61673160,61175059);河北省自然科学基金(F2018205102);河北省高等学校科学技术研究重点项目(ZD2021063)

Mobile Robots' Path Planning Method Based on Policy Fusion and Spiking Deep ReinforcementLearning

AN Yang1,2,3, WANG Xiuqing1,2,3, ZHAO Minghua1   

  1. 1 College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
    2 Hebei Provincial Key Laboratory of Network & Information Security,Shijiazhuang 050024,China
    3 Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security,Shijiazhuang 050024,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:AN Yang,born in 2000,postgraduate,is a member of CCF(No.J6878G).His main research interests include deep reinforcement and spiking neural networks.
    WANG Xiuqing,born in 1970,Ph.D,professor.Her main research interests include spiking neural networks,artificial intelligence,advanced robotic technology,and fault detection and diagnosis.
  • Supported by:
    General Program of the National Natural Science Foundation of China(61673160,61175059),Natural Science Foundation of Hebei Province,China(F2018205102) and Colleges and Universities in Hebei Province Science and Technology Research Project(ZD2021063).

摘要: 深度强化学习(DRL)已被成功应用于移动机器人路径规划中,基于DRL的移动机器人路径规划算法适用于高维环境,是实现移动机器人自主学习的重要方法。而训练DRL模型需要大量的环境交互经验,这意味着更高的计算成本。此外,DRL算法的经验池容量有限,无法确保经验的有效利用。作为类脑计算重要工具之一的脉冲神经网络(Spiking Neural Networks,SNNs)以其独有的生物似真性,能同时融入时空信息,适用于机器人环境感知及控制。结合SNNs、卷积神经网络(CNNs)和策略融合,针对基于DRL的移动机器人路径规划算法进行研究,完成了以下工作:1)提出SCDDPG(SCDDP)算法。该算法利用CNNs对输入状态进行多通道特征提取,利用SNNs对提取的特征进行时空学习。2)在SCDDPG的基础上,提出SC2DDPG(SC2DDPG)算法。SC2DDPG通过设计状态约束策略对机器人运行状态进行约束,避免了不必要的环境探索,提升了SC2DDPG中DRL的收敛速度。3)在SCDDPG的基础上,提出了PFTDDPG(Policy Fusion and Transfer SCDDPG,PFTDDPG)算法。该算法采用分阶控制模式与DRL算法融合,针对环境中的楔形障碍物实施沿墙行走策略,并引入迁移学习对先验知识进行策略迁移。PFTDDPG算法不仅完成了单纯依靠RL不能完成的路径规划任务,还可以得到最优无碰路径。此外PFTDDPG提升了模型的收敛速度和路径规划性能。实验结果证明了所提出的3种路径规划算法的有效性,对比实验结果表明:在SpikeDDPG,SCDDPG,SC2DDPG和PFTDDPG算法中,PFTDDPG算法在路径规划成功率、训练收敛速度、规划路径长度等性能指标上表现最佳。本工作为移动机器人路径规划提出了新思路,丰富了DRL在移动机器人路径规划中的解决方案。

关键词: 深度强化学习, 脉冲神经网络, 卷积神经网络, 迁移学习, 移动机器人路径规划

Abstract: Deep reinforcement learning(DRL) has been applied to mobile robots' path planning successfully,and the DRL-based mobile robots' path planning methods are suitable for high-dimensional environments and stand as a crucial method for achieving autonomous learning in mobile robots.However,training DRL models requires a large amount of interacting experience with the environment,which leads to heavy computational cost.In addition,the limited memory capacity within DRL algorithms hinders the assurances of effective utilization of experiences.Spiking neural networks(SNNs),one of the main tools for brain-inspired computing,are suitable for robots' environmental perception and control with SNNs' unique bio-plausibility and the ability of incorporating spatio-temporal information simultaneously.In this paper,we combine SNNs,convolutional neural networks(CNNs),and policy fusion for DRL-based mobile robots' path planning,and have accomplished the following works:1)We propose the SCDDPG(spike convolutional DDPG,SCDDP) algorithm,which employs CNNs for multi-channel feature extraction of input states and SNNs for spatio-temporal features extracting.2)Based on SCDDPG and the designed state constraint policy,the SC2DDPG(State Constraint SCDDPG,SC2DDPG) algorithm is proposed to constrain the robot's operation states,which avoids unnecessary environment exploration and improves the convergence speed of DRL model in SC2DDPG.3)Based on SCDDPG,the PFTDDPG(policy fusion and transfer SCDDPG,PFTDDPG) algorithm is proposed.The PFTDDPG implements the “wall-follow” policy to pass the wedge-shaped obstacles in the environment.Additionally,PFTDDPG incorporates transfer learning to transfer prior knowledge between policies in mobile robots' path planning.PFTDDPG not only completes path planning tasks that cannot be completed solely by RL,but also yields the optimal collision-free paths.Furthermore,PFTDDPG improves the convergence speed of the DRL model and the performance of the planed path.Experimental results validate the effectiveness of the proposed path planning algorithms.The comparison experimental results indicates that compared with SpikeDDPG,SCDDPG,SC2DDPG and PFTDDPG algorithms,the PFTDDPG algorithm achieves the best performance in the path planning success rate,training convergence speed,planning path length.This paper not only proposes new ideas for mobile robots' path planning,but also enriches the solution policy of DRL in mobile robots' path planning.

Key words: Deep reinforcement learning, Spiking neural networks, Convolutional neural networks, Transfer learning, Mobile robot path planning

中图分类号: 

  • TP183
[1]PATLE B K,GANESH B L,PANDEYA,et al.A review:On path planning strategies for navigation of mobile robot[J].Defence Technology,2019,15(4):582-606.
[2]DIJKSTRA E W.A note on two problems in connexion withgraphs[J].Numerische Mathematik,1959,1(1):269-271.
[3]HART P E,NILSSON N J,RAPHAELB.A formal basis for theheuristic determination of minimum cost paths[J].IEEE Tran-sactions on Systems Science & Cybernetics,1972,4(2):28-29.
[4]STENTZ A.Optimal and efficient path planning for partially-known environments[C]//Proceedings of the 1994 IEEE International Conference on Robotics and Automation.1994:3310-3317.
[5]LAVALLE S.Rapidly-exploring random trees:A new tool for path planning:#9811[R].Ames:Iowa State University,1988.
[6]FOX D,BURGARD W,THRUN S.The dynamic window ap-proach to collision avoidance[J].IEEE Robotics & Automation Magazine,1997,4(1):23-33.
[7]RAWLINGS J B.Tutorial overview of model predictive control[J].IEEE Control Systems Magazine,2000,20(3):38-52.
[8]KHATIB O.Real-Time obstacle avoidance system for manipulators and mobile robots[J].The International Journal of Robotics Research,1986,5(1):90-98.
[9]SUTTON R S,BARTO A G.Reinforcement learning:an intro-duction[M].London:MIT Press,2017.
[10]RAAJAN J,SRIHARI P V,SATYA J P,et al.Real time path planning of robot using deep reinforcement learning[J].IFAC-Papers OnLine,2020,53(2):15602-15607.
[11]CHEN P,PEI J,LU W,et al.A deep reinforcement learningbased method for real-time path planning and dynamic obstacle avoidance[J].Neurocomputing,2022,497:64-75.
[12]YANG S,SHAN Z,CAO J,et al.Path planning of UAV base station based on deep reinforcement learning[J].Procedia Computer Science,2022,202:89-104.
[13]ZHAO Y P.Deep Reinforcement Learning Based Mobile RobotPath Planning[D].Shijiazhuang:Hebei Normal University,2014.
[14]MNIH V,KAVUKCUOGLU K,SILVERD,et al.Playing atariwith deep reinforcement learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1312.5602.pdf.
[15]MNIH V,KAVUKCUOGLU K,SILVERD,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[16]BADIA A P,PIOT B,KAPTUROWSKIS,et al.Agent57:Outperforming the atari human benchmark[C]//International Conference on Machine Learning.PMLR,2020:507-517.
[17]FAWZI A,BALOG M,HUANG A,et al.Discovering faster matrix multiplication algorithms with reinforcement learning[J].Nature,2022,610(7930):47-53.
[18]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[EB/OL].[2024-01-17].https://arxiv.org/pdf/1511.05952.pdf.
[19]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2016.
[20]LILLICRAP T P,HUNT J J,PRITZELA,et al.Continuouscontrol with deep reinforcement learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1509.02971.pdf.
[21]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2018:1.
[22]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596.
[23]HAARNOJA T,ZHOU A,HARTIKAINEN K,et al.Soft actor-critic algorithms and applications[J].arXiv:1812.05905,2018.
[24]TANG G Z,KUMAR N,MICHMIZOS K P.Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware[C]//RSJ International Conference on Intelligent Robots and Systems.2020:9340948.
[25]JIANG H,ESFAHANI M A,WU K,et al.iTD3-CLN:Learn to navigate in dynamic scene through Deep Reinforcement Learning[J].Neurocomputing,2022,503:118-128.
[26]ALSHIEKH M,BLOEM R,EHLERS R,et al.Safe reinforcement learning via shielding[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2018:1.
[27]CARR S,JANSEN N,JUNGESS,et al.Safe reinforcement learning via shielding under partial observability[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.AAAI Press,2023:14748-14756.
[28]PENG Y,TAN G,SI H,et al.DRL-GAT-SA:Deep reinforcement learning for autonomous driving planning based on graph attention networks and simplex architecture[J].Journal of Systems Architecture,2022,126:102505.
[29]HO J,ERMON S.Generative adversarial imitation learning[C]//Advances in Neural Information Processing Systems.NeurIPS,2016:29.
[30]VECERIK M,HESTER T,SCHOLZ J,et al.Leveraging dem-onstrations for deep reinforcement learning on robotics problems with sparse rewards[EB/OL].[2024-01-17].https://arxiv.org/pdf/1707.08817.pdf.
[31]ZHU Z,LIN K,DAI B,et al.Off-policy imitation learning from observations[C]//Advances in Neural Information Processing Systems.NeurIPS,2020:12402-12413.
[32]KOSTRIKOV I,AGRAWAL K K,DWIBEDID,et al.Discriminator-actor-critic:Addressing sample inefficiency and reward bias in adversarial imitation learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1809.02925.pdf.
[33]WU J,HUANG Z,HU Z,et al.Toward human-in-the-loop AI:Enhancing deep reinforcement learning via real-time human guidance for autonomous driving[J].Engineering,2023,21:75-91.
[34]FANG W,YU Z,CHEN Y,et al.Incorporating learnable mem-brane constant to enhance learning of spiking neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2021:2662-2671.
[35]DOMINGUEZ-MORALES J P,LIU Q,JAMESR,et al.Deepspiking neural network model for time-variant signals classification:A real-time speech recognition approach[C]//2018 International Joint Conference on Neural Networks(IJCNN).IEEE,2018:1-8.
[36]MAHADEVUNI A,LI P.Navigation mobile robots to target in near shortest time using reinforcement learning with spiking neural networks[C]//2017 International Joint Conference on Neural Networks(IJCNN).IEEE,2017:2243-2250.
[37]GERSTNER W,KISTLER W M.Spiking neuron models:Single neurons,populations,plasticity[M].Cambridge,England:Cambridge University Press.2002.
[38]MORRISON A,DIESMANN M,GERSTNER W.Phenomeno-logical models of synaptic plasticity based on spike timing[J].Biological Cybernetics,2008,98:459-478.
[39]RUECKAUER B,LUNGU I A,HU Y,et al.Conversion of continuous-valued deep networks to efficient event-driven networks for image classification[J].Frontiers in Neuroscience,2017,11:682.
[40]WU Y,DENG L,LI G,et al.Spatio-temporal backpropagationfor training high-performance spiking neural networks[J].Frontiers in Neuroscience,2018,12:331.
[41]Modeling,motion analysis and path planning for wheeled andunderwater robots[M].Beijing:Science Press,2018.
[42]LECUN Y,BOTTOU L,BENGIOY,et al.Gradient-based lear-ning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[43]JU H,JUAN R,GOMEZR,et al.Transferring policy of deep reinforcement learning from simulation to reality for robotics[J].Nature Machine Intelligence,2022,4(12):1077-1087.
[44]ZHU Z,LIN K,JAINA K,et al.Transfer learning in deep reinforcement learning:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(11):13344-13362.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!