计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 323-332.doi: 10.11896/jsjkx.220100007

• 人工智能 • 上一篇    下一篇

基于碰撞危急程度和深度强化学习的实时轨迹规划算法

徐林玲1, 周远2, 黄鸿云3, 刘杨1,2   

  1. 1 浙江理工大学信息学院 杭州 310018
    2 新加坡南洋理工大学计算机科学与工程学院 新加坡 639798
    3 浙江理工大学图书馆大数据处理与分析中心 杭州 310018
  • 收稿日期:2022-01-04 修回日期:2022-08-14 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 黄鸿云(huanghongyun07@hotmail.com)
  • 作者简介:(201930605055@mails.zstu.edu.cn)
  • 基金资助:
    国家自然科学基金(62132014);浙江省科技计划项目(2022C01045);上海工业控制系统安全创新功能型平台开放课题(21170022-N);上海工业控制安全创新科技有限公司资助课题(21170424-J)

Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning

XU Linling1, ZHOU Yuan2, HUANG Hongyun3, LIU Yang1,2   

  1. 1 School of Information Science and Technology,Zhejiang Sci-Tech University, Hangzhou 310018, China
    2 School of Computer Science and Engineering,Nanyang Technological University,Singapore 639798,Singapore
    3 Center of Library Big Data Processing and Analysis,Zhejiang Sci-Tech University,Hangzhou 310018,China
  • Received:2022-01-04 Revised:2022-08-14 Online:2023-03-15 Published:2023-03-15
  • About author:XU Linling,born in 1997,postgraduate.Her main research interests include robot intelligent control and deep reinforcement learning.
    HUANG Hongyun,born in 1977,master,lecturer.Her main research intere-sts include intelligent system modeling and analysis and information management.
  • Supported by:
    National Natural Science Foundation of China(62132014),Science and Technology Plan Project of Zhejiang Pro-vince,China(2022C01045),Opening Project of Shanghai Trusted Industrial Control Platform(21170022-N) and Project Funded by Shanghai Industrial Control Security Innovation Technology Co.,Ltd.(21170424-J).

摘要: 动态环境的实时碰撞规避是移动机器人轨迹规划中的一个巨大挑战。针对可变障碍物数量的环境,提出了基于LSTM(Long Short Term Memory)和DRL(Deep Reinforcement Learning)的实时轨迹规划算法Crit-LSTM-DRL。首先,根据机器人和障碍物的状态,预测碰撞可能发生的时间,计算各个障碍物相对于机器人的碰撞危急程度(Collision Criticality);其次,将障碍物根据碰撞危急程度由低到高排序,然后由LSTM模型提取固定维度的环境表征向量;最后,将机器人状态和该环境表征向量作为DRL的输入,计算对应状态的价值。在任何一个时刻,针对每一个动作,通过LSTM和DRL计算下一时刻对应的状态的价值,从而计算当前状态的最大价值以及对应的动作。针对不同环境,训练获得3个模型,即在5个障碍物的环境里训练的模型、在10个障碍物的环境里训练的模型和在可变障碍物数量(1~10)的环境里训练的模型,分析了它们在不同测试环境中的性能。为进一步分析单个障碍物和机器人之间的交互影响,将障碍物表示为障碍物和机器人的联合状态(Joint State),分析了在上述3个训练环境下获得的模型的性能。实验结果验证了Crit-LSTM-DRL的有效性。

关键词: 轨迹规划, 碰撞规避, 障碍物危急度, 深度强化学习

Abstract: Real-time collision avoidance in dynamic environments is a challenge in trajectory planning of mobile robots. Focusing on environments with variable number of obstacles,this paper proposes a real-time trajectory planning algorithm,Crit-LSTM-DRL,based on long short-term memory(LSTM) and deep reinforcement learning(DRL). First,it predicts the time to the occurrence of a collision between an obstacle and the robot based on their states,and then computes the collision criticality of each obstacle with respect to the robot. Second,it generates the obstacle sequence based on the collision criticality and abstracts a fixed-dimension vector by LSTM to represent the environment. Finally,the robot state and the extracted vector are concatenated as the input of the DRL's value network to compute the value with respect to the system state. At any instant,for each action,it predicts the value of the next state based on the LSTM and DRL models and then the value of the current state; hence,the action generating the maximal value of the current state is selected to control the robot. To evaluate the performance of Crit-LSTM-DRL,it is first trained in three different environments and obtain three models: the model trained in the environment with 5 obstacles,the model trained in the environment with 10 obstacles,and the model trained in the environment with variable number of obstacles(1~10). The models then are tested in various environments containing different number of obstacles. To further investigate the effects of the interaction between an obstacle and the robot,this paper also takes the joint state of an obstacle and the robot as the state of the obstacle and trains another three models in the above training environments. Experimental results show the effectiveness and efficiency of Crit-LSTM-DRL.

Key words: Trajectory planning, Collision avoidance, Obstacle criticality, Deep reinforcement learning

中图分类号: 

  • TP242
[1]KHAMIS A,HUSSEIN A,ELMOGY A.Multi-robot task allocation:A review of the state-of-the-art[M]//Cooperative Robots and Sensor Networks 2015.Springer International Publi-shing,2015:31-51.
[2]LUO J L,NI H J,ZHOU M C.Control program design for automated guided vehicle systems via Petri nets[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2014,45(1):44-55.
[3]ZHOU Y,HU H,LIU Y,et al.Collision and deadlock avoidance in multirobot systems:A distributed approach[J].IEEE Transa-ctions on Systems,Man,and Cybernetics:Systems,2017,47(7):1712-1726.
[4]ZHOU Y,HU H,LIU Y,et al.A distributed method to avoid higher-order deadlocks in multi-robot systems[J].Automatica,2020,112:108706.
[5]VAN DEN BERG J P,OVERMARS M H.Roadmap-based motion planning in dynamic environments[J].IEEE Transactions on Robotics,2005,21(5):885-897.
[6]MARBLE J D,BEKRIS K E.Asymptotically near-optimal planning with probabilistic roadmap spanners[J].IEEE Transactions on Robotics,2013,29(2):432-444.
[7]KLOETZER M,MAHULEA C,GONZALEZ R.Optimizing cell decomposition path planning for mobile robots using different metrics[C]//International Conference on System Theory,Control and Computing(ICSTCC).IEEE,2015:565-570.
[8]PIVTORAIKO M,KNEPPER R A,KELLY A.Differentiallyconstrained mobile robot motion planning in state lattices[J].Journal of Field Robotics,2009,26(3):308-333.
[9]BIRCHER A,ALEXIS K,SCHWESINGER U,et al.An incremental sampling-based approach to inspection planning:the ra-pidly exploring random tree of trees[J].Robotica,2017,35(6):1327-1340.
[10]TANNER H G,BODDU A.Multiagent navigation functions revisited[J].IEEE Transactions on Robotics,2012,28(6):1346-1359.
[11]VAN DEN BERG J,GUY S J,LIN M,et al.Reciprocal n-body collision avoidance [M]//Robotics research.Berlin:Springer,2011:3-19.
[12]ALONSO-MORA J,BEARDSLEY P,SIEGWART R.Cooperative collision avoidance for nonholonomic robots[J].IEEE Transactions on Robotics,2018,34(2):404-420.
[13]ABICHANDANI P,FORD G,BENSON H Y,et al.Mathematical programming for multi-vehicle motion planning problems[C]//IEEE International Conference on Robotics and Automation.IEEE,2012:3315-3322.
[14]ZHOU Y,HU H,LIU Y,et al.A real-time and fully distributed approach to motion planning for multirobot systems[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2017,49(12):2636-2650.
[15]ZHOU Y,HU H,LIU Y,et al.Distributed approaches to motion control of multiple robots via discrete event systems[J].Control Theory & Applications,2018,35(1):110-120.
[16]WANG H,WANG X F,ZHANG B,et al.A method for planning the path of mobile robot moving on general terrain[J].Ruan Jian Xue Bao/Journal of Software,1995(3):173-178.
[17]ZHANG X D,ZHAN D C,WANG C Y,et al.Key technologies of indoor navigation based on heuristic path planning and IMU[J].Ruan Jian Xue Bao/Journal of Software,2015,26(S1):78-89.
[18]LIN Y S,LI Q S,LU P H,et al.Shelf and AGV path cooperative optimization algorithm used in intelligent warehousing[J].Ruan Jian Xue Bao/Journal of Software,2020,31(9):2770-2784.
[19]ZHU Q B.Ant algorithm for navigation of multi-robot movement in unknown environment[J].Journal of Software,2006,17(9):1890-1898.
[20]CHEN Y F,EVERETT M,LIU M,et al.Socially aware motion planning with deep reinforcement learning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2017:1343-1350.
[21]CHEN C,LIU Y,KREISS S,et al.Crowd-robot interaction:Crowd-aware robot navigation with attention-based deep reinforcement learning[C]//International Conference on Robotics and Automation(ICRA).IEEE,2019:6015-6022.
[22]CHEN Y F,LIU M,EVERETT M,et al.Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C]//IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:285-292.
[23]EVERETT M,CHEN Y F,HOW J P.Motion planning among dynamic,decision-making agents with deep reinforcement lear-ning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:3052-3059.
[24]BENCY M J,QURESHI A H,YIP M C.Neural path planning:Fixed time,near-optimal path generation via oracle imitation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2019:3965-3972.
[25]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[26]ZHU Y,ZHAO D,LI X.Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data[J].IEEE Transactions on Neural Networks and Learning Systems,2016,28(3):714-725.
[27]TAI L,PAOLO G,LIU M.Virtual-to-real deep reinforcement learning:Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2017:31-36.
[28]FENG S,SHU H,XIE B Q.Path planning for 3D environment based on improved deep reinforcement learning[J].Computer Applications and Software,2021,38(1):250-255.
[29]WANG K,BU X J,LI R F,et al.Path Planning for Robots Based on Deep Reinforcement Learning by Depth Constraint[J].Journal of Huazhong University of Science and Technology(Natural Science Edition),2018,46(12):77-82.
[30]LI H,QI Y M.A Path Planning Method for Robot Based on Deep Reinforcement Learning in Complex Environment[J].Application Research of Computers,2020,37(S1):129-131.
[31]SAWADA R,SATO K,MAJIMA T.Automatic ship collisionavoidance using deep reinforcement learning with LSTM in continuous action spaces[J].Journal of Marine Science and Techno-logy,2021,26(2):509-524.
[32]NAEEM M,RIZVI S T H,CORONATO A.A gentle introduction to reinforcement learning and its application in different fields[J].IEEE Access,2020,8:209320-209344.
[33]CHEN Z H,YANG Z H,WANG H B,et al.Recent Researches on Robot Autonomous Grasp Technology[J].Control and Decision,2008(9):961-968,975.
[34]LE CUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[35]MOUSSA M A.Combining expert neural networks using reinforcement feedback for learning primitive grasping behavior[J].IEEE Transactions on Neural Networks,2004,15(3):629-638.
[36]ZHAO D B,SHAO K,ZHU Y H,et al.Review of deep reinforcement learning and discussions on the development of computer Go[J].Control Theory & Applications,2016,33(6):701-717.
[37]WAN L P,LAN X G,ZHANG H B,et al.A review of deep reinforcement learning theory and application[J].Pattern Recognition and Artificial Intelligence,2019,32(1):67-81.
[38]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27.
[39]LIU J W,GAO F,LUO X L.Survey of deep reinforcementlearning based on value function and policy gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438.
[40]LIU Z Y,MU Z X,SUN C Y.An overview on algorithms and applications of deep reinforcement learning[J].Chinese Journal of Intelligent Science and Technology,2020,2(4):314-326.
[41]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[1] 黄昱洲, 王立松, 秦小麟.
一种基于深度强化学习的无人小车双层路径规划方法
Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning
计算机科学, 2023, 50(1): 194-204. https://doi.org/10.11896/jsjkx.220500241
[2] 徐平安, 刘全.
基于相似度约束的双策略蒸馏深度强化学习方法
Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation
计算机科学, 2023, 50(1): 253-261. https://doi.org/10.11896/jsjkx.211100167
[3] 张启阳, 陈希亮, 张巧.
基于轨迹感知的稀疏奖励探索方法
Sparse Reward Exploration Method Based on Trajectory Perception
计算机科学, 2023, 50(1): 262-269. https://doi.org/10.11896/jsjkx.220700010
[4] 魏楠, 魏祥麟, 范建华, 薛羽, 胡永扬.
面向频谱接入深度强化学习模型的后门攻击方法
Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model
计算机科学, 2023, 50(1): 351-361. https://doi.org/10.11896/jsjkx.220800269
[5] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[7] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[8] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[9] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[10] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[11] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[12] 蔡岳, 王恩良, 孙哲, 孙知信.
基于双重指针网络的车货匹配双重序列决策研究
Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network
计算机科学, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257
[13] 代珊珊, 刘全.
基于动作约束深度强化学习的安全自动驾驶方法
Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method
计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[14] 成昭炜, 沈航, 汪悦, 王敏, 白光伟.
基于深度强化学习的无人机辅助弹性视频多播机制
Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast
计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[15] 周仕承, 刘京菊, 钟晓峰, 卢灿举.
基于深度强化学习的智能化渗透测试路径发现
Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning
计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!