计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 323-332.doi: 10.11896/jsjkx.220100007
徐林玲1, 周远2, 黄鸿云3, 刘杨1,2
XU Linling1, ZHOU Yuan2, HUANG Hongyun3, LIU Yang1,2
摘要: 动态环境的实时碰撞规避是移动机器人轨迹规划中的一个巨大挑战。针对可变障碍物数量的环境,提出了基于LSTM(Long Short Term Memory)和DRL(Deep Reinforcement Learning)的实时轨迹规划算法Crit-LSTM-DRL。首先,根据机器人和障碍物的状态,预测碰撞可能发生的时间,计算各个障碍物相对于机器人的碰撞危急程度(Collision Criticality);其次,将障碍物根据碰撞危急程度由低到高排序,然后由LSTM模型提取固定维度的环境表征向量;最后,将机器人状态和该环境表征向量作为DRL的输入,计算对应状态的价值。在任何一个时刻,针对每一个动作,通过LSTM和DRL计算下一时刻对应的状态的价值,从而计算当前状态的最大价值以及对应的动作。针对不同环境,训练获得3个模型,即在5个障碍物的环境里训练的模型、在10个障碍物的环境里训练的模型和在可变障碍物数量(1~10)的环境里训练的模型,分析了它们在不同测试环境中的性能。为进一步分析单个障碍物和机器人之间的交互影响,将障碍物表示为障碍物和机器人的联合状态(Joint State),分析了在上述3个训练环境下获得的模型的性能。实验结果验证了Crit-LSTM-DRL的有效性。
中图分类号:
[1]KHAMIS A,HUSSEIN A,ELMOGY A.Multi-robot task allocation:A review of the state-of-the-art[M]//Cooperative Robots and Sensor Networks 2015.Springer International Publi-shing,2015:31-51. [2]LUO J L,NI H J,ZHOU M C.Control program design for automated guided vehicle systems via Petri nets[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2014,45(1):44-55. [3]ZHOU Y,HU H,LIU Y,et al.Collision and deadlock avoidance in multirobot systems:A distributed approach[J].IEEE Transa-ctions on Systems,Man,and Cybernetics:Systems,2017,47(7):1712-1726. [4]ZHOU Y,HU H,LIU Y,et al.A distributed method to avoid higher-order deadlocks in multi-robot systems[J].Automatica,2020,112:108706. [5]VAN DEN BERG J P,OVERMARS M H.Roadmap-based motion planning in dynamic environments[J].IEEE Transactions on Robotics,2005,21(5):885-897. [6]MARBLE J D,BEKRIS K E.Asymptotically near-optimal planning with probabilistic roadmap spanners[J].IEEE Transactions on Robotics,2013,29(2):432-444. [7]KLOETZER M,MAHULEA C,GONZALEZ R.Optimizing cell decomposition path planning for mobile robots using different metrics[C]//International Conference on System Theory,Control and Computing(ICSTCC).IEEE,2015:565-570. [8]PIVTORAIKO M,KNEPPER R A,KELLY A.Differentiallyconstrained mobile robot motion planning in state lattices[J].Journal of Field Robotics,2009,26(3):308-333. [9]BIRCHER A,ALEXIS K,SCHWESINGER U,et al.An incremental sampling-based approach to inspection planning:the ra-pidly exploring random tree of trees[J].Robotica,2017,35(6):1327-1340. [10]TANNER H G,BODDU A.Multiagent navigation functions revisited[J].IEEE Transactions on Robotics,2012,28(6):1346-1359. [11]VAN DEN BERG J,GUY S J,LIN M,et al.Reciprocal n-body collision avoidance [M]//Robotics research.Berlin:Springer,2011:3-19. [12]ALONSO-MORA J,BEARDSLEY P,SIEGWART R.Cooperative collision avoidance for nonholonomic robots[J].IEEE Transactions on Robotics,2018,34(2):404-420. [13]ABICHANDANI P,FORD G,BENSON H Y,et al.Mathematical programming for multi-vehicle motion planning problems[C]//IEEE International Conference on Robotics and Automation.IEEE,2012:3315-3322. [14]ZHOU Y,HU H,LIU Y,et al.A real-time and fully distributed approach to motion planning for multirobot systems[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2017,49(12):2636-2650. [15]ZHOU Y,HU H,LIU Y,et al.Distributed approaches to motion control of multiple robots via discrete event systems[J].Control Theory & Applications,2018,35(1):110-120. [16]WANG H,WANG X F,ZHANG B,et al.A method for planning the path of mobile robot moving on general terrain[J].Ruan Jian Xue Bao/Journal of Software,1995(3):173-178. [17]ZHANG X D,ZHAN D C,WANG C Y,et al.Key technologies of indoor navigation based on heuristic path planning and IMU[J].Ruan Jian Xue Bao/Journal of Software,2015,26(S1):78-89. [18]LIN Y S,LI Q S,LU P H,et al.Shelf and AGV path cooperative optimization algorithm used in intelligent warehousing[J].Ruan Jian Xue Bao/Journal of Software,2020,31(9):2770-2784. [19]ZHU Q B.Ant algorithm for navigation of multi-robot movement in unknown environment[J].Journal of Software,2006,17(9):1890-1898. [20]CHEN Y F,EVERETT M,LIU M,et al.Socially aware motion planning with deep reinforcement learning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2017:1343-1350. [21]CHEN C,LIU Y,KREISS S,et al.Crowd-robot interaction:Crowd-aware robot navigation with attention-based deep reinforcement learning[C]//International Conference on Robotics and Automation(ICRA).IEEE,2019:6015-6022. [22]CHEN Y F,LIU M,EVERETT M,et al.Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C]//IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:285-292. [23]EVERETT M,CHEN Y F,HOW J P.Motion planning among dynamic,decision-making agents with deep reinforcement lear-ning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:3052-3059. [24]BENCY M J,QURESHI A H,YIP M C.Neural path planning:Fixed time,near-optimal path generation via oracle imitation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2019:3965-3972. [25]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [26]ZHU Y,ZHAO D,LI X.Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data[J].IEEE Transactions on Neural Networks and Learning Systems,2016,28(3):714-725. [27]TAI L,PAOLO G,LIU M.Virtual-to-real deep reinforcement learning:Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2017:31-36. [28]FENG S,SHU H,XIE B Q.Path planning for 3D environment based on improved deep reinforcement learning[J].Computer Applications and Software,2021,38(1):250-255. [29]WANG K,BU X J,LI R F,et al.Path Planning for Robots Based on Deep Reinforcement Learning by Depth Constraint[J].Journal of Huazhong University of Science and Technology(Natural Science Edition),2018,46(12):77-82. [30]LI H,QI Y M.A Path Planning Method for Robot Based on Deep Reinforcement Learning in Complex Environment[J].Application Research of Computers,2020,37(S1):129-131. [31]SAWADA R,SATO K,MAJIMA T.Automatic ship collisionavoidance using deep reinforcement learning with LSTM in continuous action spaces[J].Journal of Marine Science and Techno-logy,2021,26(2):509-524. [32]NAEEM M,RIZVI S T H,CORONATO A.A gentle introduction to reinforcement learning and its application in different fields[J].IEEE Access,2020,8:209320-209344. [33]CHEN Z H,YANG Z H,WANG H B,et al.Recent Researches on Robot Autonomous Grasp Technology[J].Control and Decision,2008(9):961-968,975. [34]LE CUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [35]MOUSSA M A.Combining expert neural networks using reinforcement feedback for learning primitive grasping behavior[J].IEEE Transactions on Neural Networks,2004,15(3):629-638. [36]ZHAO D B,SHAO K,ZHU Y H,et al.Review of deep reinforcement learning and discussions on the development of computer Go[J].Control Theory & Applications,2016,33(6):701-717. [37]WAN L P,LAN X G,ZHANG H B,et al.A review of deep reinforcement learning theory and application[J].Pattern Recognition and Artificial Intelligence,2019,32(1):67-81. [38]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27. [39]LIU J W,GAO F,LUO X L.Survey of deep reinforcementlearning based on value function and policy gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438. [40]LIU Z Y,MU Z X,SUN C Y.An overview on algorithms and applications of deep reinforcement learning[J].Chinese Journal of Intelligent Science and Technology,2020,2(4):314-326. [41]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. |
[1] | 黄昱洲, 王立松, 秦小麟. 一种基于深度强化学习的无人小车双层路径规划方法 Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning 计算机科学, 2023, 50(1): 194-204. https://doi.org/10.11896/jsjkx.220500241 |
[2] | 徐平安, 刘全. 基于相似度约束的双策略蒸馏深度强化学习方法 Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation 计算机科学, 2023, 50(1): 253-261. https://doi.org/10.11896/jsjkx.211100167 |
[3] | 张启阳, 陈希亮, 张巧. 基于轨迹感知的稀疏奖励探索方法 Sparse Reward Exploration Method Based on Trajectory Perception 计算机科学, 2023, 50(1): 262-269. https://doi.org/10.11896/jsjkx.220700010 |
[4] | 魏楠, 魏祥麟, 范建华, 薛羽, 胡永扬. 面向频谱接入深度强化学习模型的后门攻击方法 Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model 计算机科学, 2023, 50(1): 351-361. https://doi.org/10.11896/jsjkx.220800269 |
[5] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[6] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[7] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[8] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[9] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[10] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[11] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
[12] | 蔡岳, 王恩良, 孙哲, 孙知信. 基于双重指针网络的车货匹配双重序列决策研究 Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network 计算机科学, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257 |
[13] | 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084 |
[14] | 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078 |
[15] | 周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现 Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning 计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057 |
|