基于策略融合及Spiking DRL的移动机器人路径规划方法

doi:10.11896/jsjkx.240100211

Abstract

Abstract: Deep reinforcement learning(DRL) has been applied to mobile robots' path planning successfully,and the DRL-based mobile robots' path planning methods are suitable for high-dimensional environments and stand as a crucial method for achieving autonomous learning in mobile robots.However,training DRL models requires a large amount of interacting experience with the environment,which leads to heavy computational cost.In addition,the limited memory capacity within DRL algorithms hinders the assurances of effective utilization of experiences.Spiking neural networks(SNNs),one of the main tools for brain-inspired computing,are suitable for robots' environmental perception and control with SNNs' unique bio-plausibility and the ability of incorporating spatio-temporal information simultaneously.In this paper,we combine SNNs,convolutional neural networks(CNNs),and policy fusion for DRL-based mobile robots' path planning,and have accomplished the following works:1)We propose the SCDDPG(spike convolutional DDPG,SCDDP) algorithm,which employs CNNs for multi-channel feature extraction of input states and SNNs for spatio-temporal features extracting.2)Based on SCDDPG and the designed state constraint policy,the SC2DDPG(State Constraint SCDDPG,SC2DDPG) algorithm is proposed to constrain the robot's operation states,which avoids unnecessary environment exploration and improves the convergence speed of DRL model in SC2DDPG.3)Based on SCDDPG,the PFTDDPG(policy fusion and transfer SCDDPG,PFTDDPG) algorithm is proposed.The PFTDDPG implements the “wall-follow” policy to pass the wedge-shaped obstacles in the environment.Additionally,PFTDDPG incorporates transfer learning to transfer prior knowledge between policies in mobile robots' path planning.PFTDDPG not only completes path planning tasks that cannot be completed solely by RL,but also yields the optimal collision-free paths.Furthermore,PFTDDPG improves the convergence speed of the DRL model and the performance of the planed path.Experimental results validate the effectiveness of the proposed path planning algorithms.The comparison experimental results indicates that compared with SpikeDDPG,SCDDPG,SC2DDPG and PFTDDPG algorithms,the PFTDDPG algorithm achieves the best performance in the path planning success rate,training convergence speed,planning path length.This paper not only proposes new ideas for mobile robots' path planning,but also enriches the solution policy of DRL in mobile robots' path planning.

Key words: Deep reinforcement learning, Spiking neural networks, Convolutional neural networks, Transfer learning, Mobile robot path planning

CLC Number:

TP183

AN Yang, WANG Xiuqing, ZHAO Minghua. Mobile Robots' Path Planning Method Based on Policy Fusion and Spiking Deep ReinforcementLearning[J].Computer Science, 2024, 51(11A): 240100211-11.

References

[1]PATLE B K,GANESH B L,PANDEYA,et al.A review:On path planning strategies for navigation of mobile robot[J].Defence Technology,2019,15(4):582-606.
[2]DIJKSTRA E W.A note on two problems in connexion withgraphs[J].Numerische Mathematik,1959,1(1):269-271.
[3]HART P E,NILSSON N J,RAPHAELB.A formal basis for theheuristic determination of minimum cost paths[J].IEEE Tran-sactions on Systems Science & Cybernetics,1972,4(2):28-29.
[4]STENTZ A.Optimal and efficient path planning for partially-known environments[C]//Proceedings of the 1994 IEEE International Conference on Robotics and Automation.1994:3310-3317.
[5]LAVALLE S.Rapidly-exploring random trees:A new tool for path planning:#9811[R].Ames:Iowa State University,1988.
[6]FOX D,BURGARD W,THRUN S.The dynamic window ap-proach to collision avoidance[J].IEEE Robotics & Automation Magazine,1997,4(1):23-33.
[7]RAWLINGS J B.Tutorial overview of model predictive control[J].IEEE Control Systems Magazine,2000,20(3):38-52.
[8]KHATIB O.Real-Time obstacle avoidance system for manipulators and mobile robots[J].The International Journal of Robotics Research,1986,5(1):90-98.
[9]SUTTON R S,BARTO A G.Reinforcement learning:an intro-duction[M].London:MIT Press,2017.
[10]RAAJAN J,SRIHARI P V,SATYA J P,et al.Real time path planning of robot using deep reinforcement learning[J].IFAC-Papers OnLine,2020,53(2):15602-15607.
[11]CHEN P,PEI J,LU W,et al.A deep reinforcement learningbased method for real-time path planning and dynamic obstacle avoidance[J].Neurocomputing,2022,497:64-75.
[12]YANG S,SHAN Z,CAO J,et al.Path planning of UAV base station based on deep reinforcement learning[J].Procedia Computer Science,2022,202:89-104.
[13]ZHAO Y P.Deep Reinforcement Learning Based Mobile RobotPath Planning[D].Shijiazhuang:Hebei Normal University,2014.
[14]MNIH V,KAVUKCUOGLU K,SILVERD,et al.Playing atariwith deep reinforcement learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1312.5602.pdf.
[15]MNIH V,KAVUKCUOGLU K,SILVERD,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[16]BADIA A P,PIOT B,KAPTUROWSKIS,et al.Agent57:Outperforming the atari human benchmark[C]//International Conference on Machine Learning.PMLR,2020:507-517.
[17]FAWZI A,BALOG M,HUANG A,et al.Discovering faster matrix multiplication algorithms with reinforcement learning[J].Nature,2022,610(7930):47-53.
[18]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[EB/OL].[2024-01-17].https://arxiv.org/pdf/1511.05952.pdf.
[19]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2016.
[20]LILLICRAP T P,HUNT J J,PRITZELA,et al.Continuouscontrol with deep reinforcement learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1509.02971.pdf.
[21]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2018:1.
[22]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596.
[23]HAARNOJA T,ZHOU A,HARTIKAINEN K,et al.Soft actor-critic algorithms and applications[J].arXiv:1812.05905,2018.
[24]TANG G Z,KUMAR N,MICHMIZOS K P.Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware[C]//RSJ International Conference on Intelligent Robots and Systems.2020:9340948.
[25]JIANG H,ESFAHANI M A,WU K,et al.iTD3-CLN:Learn to navigate in dynamic scene through Deep Reinforcement Learning[J].Neurocomputing,2022,503:118-128.
[26]ALSHIEKH M,BLOEM R,EHLERS R,et al.Safe reinforcement learning via shielding[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2018:1.
[27]CARR S,JANSEN N,JUNGESS,et al.Safe reinforcement learning via shielding under partial observability[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.AAAI Press,2023:14748-14756.
[28]PENG Y,TAN G,SI H,et al.DRL-GAT-SA:Deep reinforcement learning for autonomous driving planning based on graph attention networks and simplex architecture[J].Journal of Systems Architecture,2022,126:102505.
[29]HO J,ERMON S.Generative adversarial imitation learning[C]//Advances in Neural Information Processing Systems.NeurIPS,2016:29.
[30]VECERIK M,HESTER T,SCHOLZ J,et al.Leveraging dem-onstrations for deep reinforcement learning on robotics problems with sparse rewards[EB/OL].[2024-01-17].https://arxiv.org/pdf/1707.08817.pdf.
[31]ZHU Z,LIN K,DAI B,et al.Off-policy imitation learning from observations[C]//Advances in Neural Information Processing Systems.NeurIPS,2020:12402-12413.
[32]KOSTRIKOV I,AGRAWAL K K,DWIBEDID,et al.Discriminator-actor-critic:Addressing sample inefficiency and reward bias in adversarial imitation learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1809.02925.pdf.
[33]WU J,HUANG Z,HU Z,et al.Toward human-in-the-loop AI:Enhancing deep reinforcement learning via real-time human guidance for autonomous driving[J].Engineering,2023,21:75-91.
[34]FANG W,YU Z,CHEN Y,et al.Incorporating learnable mem-brane constant to enhance learning of spiking neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2021:2662-2671.
[35]DOMINGUEZ-MORALES J P,LIU Q,JAMESR,et al.Deepspiking neural network model for time-variant signals classification:A real-time speech recognition approach[C]//2018 International Joint Conference on Neural Networks(IJCNN).IEEE,2018:1-8.
[36]MAHADEVUNI A,LI P.Navigation mobile robots to target in near shortest time using reinforcement learning with spiking neural networks[C]//2017 International Joint Conference on Neural Networks(IJCNN).IEEE,2017:2243-2250.
[37]GERSTNER W,KISTLER W M.Spiking neuron models:Single neurons,populations,plasticity[M].Cambridge,England:Cambridge University Press.2002.
[38]MORRISON A,DIESMANN M,GERSTNER W.Phenomeno-logical models of synaptic plasticity based on spike timing[J].Biological Cybernetics,2008,98:459-478.
[39]RUECKAUER B,LUNGU I A,HU Y,et al.Conversion of continuous-valued deep networks to efficient event-driven networks for image classification[J].Frontiers in Neuroscience,2017,11:682.
[40]WU Y,DENG L,LI G,et al.Spatio-temporal backpropagationfor training high-performance spiking neural networks[J].Frontiers in Neuroscience,2018,12:331.
[41]Modeling,motion analysis and path planning for wheeled andunderwater robots[M].Beijing:Science Press,2018.
[42]LECUN Y,BOTTOU L,BENGIOY,et al.Gradient-based lear-ning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[43]JU H,JUAN R,GOMEZR,et al.Transferring policy of deep reinforcement learning from simulation to reality for robotics[J].Nature Machine Intelligence,2022,4(12):1077-1087.
[44]ZHU Z,LIN K,JAINA K,et al.Transfer learning in deep reinforcement learning:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(11):13344-13362.

Related Articles 15

[1]	WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[2]	ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[3]	TIAN Qing, LU Zhanghu, YANG Hong. Unsupervised Domain Adaptation Based on Entropy Filtering and Class Centroid Optimization [J]. Computer Science, 2024, 51(7): 345-353.
[4]	LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[5]	GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[6]	CAO Yan, ZHU Zhenfeng. DRSTN:Deep Residual Soft Thresholding Network [J]. Computer Science, 2024, 51(6A): 230400112-7.
[7]	SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6.
[8]	ZHANG Xinrui, YANG Jian, WANG Zhan. Thai Speech Synthesis Based on Cross-language Transfer Learning and Joint Training [J]. Computer Science, 2024, 51(6A): 230500174-7.
[9]	WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[10]	LIU Hui, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun. Malicious Attack Detection in Recommendation Systems Combining Graph Convolutional Neural Networks and Ensemble Methods [J]. Computer Science, 2024, 51(6A): 230700003-9.
[11]	WANG Jiahao, FU Yifu, FENG Hainan, REN Yuheng. Indoor Location Algorithm in Dynamic Environment Based on Transfer Learning [J]. Computer Science, 2024, 51(5): 277-283.
[12]	YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[13]	LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258.
[14]	SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277.
[15]	ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Mobile Robots' Path Planning Method Based on Policy Fusion and Spiking Deep ReinforcementLearning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0