计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 194-204.doi: 10.11896/jsjkx.220500241

• 人工智能 • 上一篇    下一篇

一种基于深度强化学习的无人小车双层路径规划方法

黄昱洲, 王立松, 秦小麟   

  1. 南京航空航天大学计算机科学与技术学院 南京 211106
  • 收稿日期:2022-05-25 修回日期:2022-09-12 出版日期:2023-01-15 发布日期:2023-01-09
  • 通讯作者: 秦小麟(qinxcs@nuaa.edu.cn)
  • 作者简介:huangyuzhou@nuaa.edu.cn
  • 基金资助:
    国家自然科学基金(61728204)

Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning

HUANG Yuzhou, WANG Lisong, QIN Xiaolin   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2022-05-25 Revised:2022-09-12 Online:2023-01-15 Published:2023-01-09
  • About author:HUANG Yuzhou,born in 1998,postgraduate.His main research interests include robot intelligence,etc.
    QIN Xiaolin,born in 1953,Ph.D,is a senior member of China Computer Fe-deration.His main research interests include data management and security in distributed environment,etc.
  • Supported by:
    National Natural Science Foundation of China(61728204).

摘要: 随着智能无人小车的广泛应用,智能化导航、路径规划和避障技术成为了重要的研究内容。文中提出了基于无模型的DDPG和SAC深度强化学习算法,利用环境信息循迹至目标点,躲避静态与动态的障碍物并且使其普适于不同环境。通过全局规划和局部避障相结合的方式,该方法以更好的全局性与鲁棒性解决路径规划问题,以更好的动态性与泛化性解决避障问题,并缩短了迭代时间;在网络训练阶段结合PID和A*等传统算法,提高了所提方法的收敛速度和稳定性。最后,在机器人操作系统ROS和仿真程序gazebo中设计了导航和避障等多种实验场景,仿真实验结果验证了所提出的兼顾问题全局性和动态性的方法具有可靠性,生成的路径和时间效率有所优化。

关键词: 无人小车, 避障, 路径规划, 深度强化学习

Abstract: With the wide application of intelligent unmanned vehicles,intelligent navigation,path planning and obstacle avoidance technology have become important research contents.This paper proposes model-free deep reinforcement learning algorithms DDPG and SAC,which use environmental information to track to the target point,avoid static and dynamic obstacles,and can be generally suitable for different environments.Through the combination of global planning and local obstacle avoidance,it solves the path planning problem with better globality and robustness,solves the obstacle avoidance problem with better dynamicity and generalization,and shortens the iteration time.In the network training stage,PID,A* and other traditional algorithms are combined to improve the convergence speed and stability of the method.Finally,a variety of experimental scenarios such as navigation and obstacle avoidance are designed in the robot operating system ROS and the simulation program gazebo.Simulation results verify the reliability of the proposed approach,which takes the global and dynamic nature of the problem into account and optimizes the generated paths and time efficiency.

Key words: Unmanned vehicle, Obstacle avoidance, Path planning, Deep reinforcement learning

中图分类号: 

  • TP311
[1]CAI K,WANG C,CHENG J,et al.Mobile robot path planning in dynamic environments:a survey[J].arXiv:2006.14195,2020.
[2]ZHANG H,WANG Y,YI J F,et al.Research on intelligent robot systems for emergency prevention and control of major pandemics[J].Scientia Sinica Informationis,2020,50(7):1069-1090.
[3]ZHANG H,LIN W,CHEN A.Path planning for the mobile robot:A review[J].Symmetry,2018,10(10):450-466.
[4]XU X,CAI P,AHMED Z,et al.Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning[J/OL].Neurocomputing,2022,468:181-197.https://doi.org/10.1016/j.neucom.2021.09.071.
[5]LI W,CHEN D,LE J.Robot patrol path planning based on combined deep reinforcement learning[C]//2018 IEEE Intl. Conf. on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing &Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom).IEEE,2018:659-666.
[6]GAO J,YE W,GUO J,et al.Deep reinforcement learning for indoor mobile robot path planning[J].Sensors,2020,20(19):5493-5507.
[7]PFEIFFER M,SCHAEUBLE M,NIETO J,et al.From perception to decision:A data-driven approach to end-to-end motion planning for autonomous ground robots[C]//2017 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:1527-1533.
[8]TAI L,PAOLO G,LIU M,et al.Virtual-to-real deep reinforcement learning:Continuous control of mobile robots for mapless navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2017:31-36.
[9]HAN S H,CHOI H J,BENZ P,et al.Sensor-based mobile robot navigation via deep reinforcement learning[C]//2018 IEEE International Conference on Big Data and Smart Computing(BigComp).IEEE,2018:147-154.
[10]LING F,JIMENEZ-RODRIGUEZ A,PRESCOTT T J.Obstacle Avoidance Using Stereo Vision and Deep Reinforcement Lear-ning in an Animal-like Robot[C]//2019 IEEE International Conference on Robotics and Biomimetics(ROBIO).IEEE,2019:71-76.
[11]XIE L,WANG S,MARKHAM A,et al.Towards monocular vision based obstacle avoidance through deep reinforcement lear-ning[J].arXiv:1706.09829,2017.
[12]XIE L,WANG S,ROSA S,et al.Learning with training wheels:speeding up training with a simple controller for deep reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6276-6283.
[13]KULHÁNEK J,DERNER E,DE BRUIN T,et al.Vision-based navigation using deep reinforcement learning[C]//2019 Euro-pean Conference on Mobile Robots(ECMR).IEEE,2019:1-8.
[14]KÄSTNER L,SHEN Z,MARX C,et al..Autonomous Navigation in Complex Environments using Memory-Aided Deep Reinforcement Learning[C]//2021 IEEE/SICE International Symposium on System Integration(SII).IEEE,2021:170-175.
[15]LUONG M,PHAM C.Incremental learning for autonomousnavigation of mobile robots based on deep reinforcement lear-ning[J].Journal of Intelligent & Robotic Systems,2021,101(1):1-11.
[16]FAN T,CHENG X,PAN J,et al.Crowdmove:Autonomousmapless navigation in crowded scenarios[J].arXiv:1807.07870,2018.
[17]KÄSTNER L,ZHAO X,BUIYAN T,et al.Connecting Deep-Reinforcement-Learning-based Obstacle Avoidance with Conventional Global Planners using Waypoint Generators[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2021:1213-1220.
[18]CIMURS R,SUH I H,LEE J H.Goal-driven autonomous mapping through deep reinforcement learning and planning-based navigation[J].arXiv:2103.07119,2021.
[19]GULDENRING R,GÖRNER M,HENDRICH N,et al.Lear-ning local planners for human-aware navigation in indoor environments[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2020:6053-6060.
[20]GAO P,LIU Z,WU Z,et al.A global path planning algorithm for robots using reinforcement learning[C]//2019 IEEE International Conference on Robotics and Biomimetics(ROBIO).IEEE,2019:1693-1698.
[21]SICHKAR V N.Reinforcement learning algorithms in globalpath planning for mobile robot[C]//2019 International Confe-rence on Industrial Engineering,Applications and Manufacturing(ICIEAM).IEEE,2019:1-5.
[22]PANOV A I,YAKOVLEV K S,SUVOROV R.Grid path planning with deep reinforcement learning:Preliminary results[J].Procedia Computer Science,2018,123:347-353.
[23]MOERLAND T M,BROEKENS J,JONKER C M.Model-based reinforcement learning:A survey[J].:arXiv:2006.16712,2020.
[24]LIU R,NAGEOTTE F,ZANNE P,et al.Deep reinforcementlearning for the control of robotic manipulation:a focussed mini-review[J].Robotics,2021,10(1):22-34.
[25]NGUYEN H,LA H.Review of deep reinforcement learning for robot manipulation[C]//2019 Third IEEE International Confe-rence on Robotic Computing(IRC).IEEE,2019:590-595.
[26]CHEN W,QIU X,CAI T,et al.Deep reinforcement learning for Internet of Things:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2021,23(3):1659-1692.
[27]LI D,OKHRIN O.DDPG car-following model with real-worldhuman driving experience in CARLA[J].arXiv:2112.14602,2021.
[28]DE JESUS J C,KICH V A,KOLLING A H,et al.Soft Actor-Critic for Navigation of Mobile Robots[J].Journal of Intelligent &Robotic Systems,2021,102(2):1-11.
[1] 徐平安, 刘全.
基于相似度约束的双策略蒸馏深度强化学习方法
Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation
计算机科学, 2023, 50(1): 253-261. https://doi.org/10.11896/jsjkx.211100167
[2] 张启阳, 陈希亮, 张巧.
基于轨迹感知的稀疏奖励探索方法
Sparse Reward Exploration Method Based on Trajectory Perception
计算机科学, 2023, 50(1): 262-269. https://doi.org/10.11896/jsjkx.220700010
[3] 魏楠, 魏祥麟, 范建华, 薛羽, 胡永扬.
面向频谱接入深度强化学习模型的后门攻击方法
Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model
计算机科学, 2023, 50(1): 351-361. https://doi.org/10.11896/jsjkx.220800269
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 王兵, 吴洪亮, 牛新征.
基于改进势场法的机器人路径规划
Robot Path Planning Based on Improved Potential Field Method
计算机科学, 2022, 49(7): 196-203. https://doi.org/10.11896/jsjkx.210500020
[6] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[7] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[8] 杨浩雄, 高晶, 邵恩露.
考虑一单多品的外卖订单配送时间的带时间窗的车辆路径问题
Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery
计算机科学, 2022, 49(6A): 191-198. https://doi.org/10.11896/jsjkx.210400005
[9] 陈博琛, 唐文兵, 黄鸿云, 丁佐华.
基于改进人工势场的未知障碍物无人机编队避障
Pop-up Obstacles Avoidance for UAV Formation Based on Improved Artificial Potential Field
计算机科学, 2022, 49(6A): 686-693. https://doi.org/10.11896/jsjkx.210500194
[10] 谭任深, 徐龙博, 周冰, 荆朝霞, 黄向生.
海上风电场通用运维路径规划模型优化及仿真
Optimization and Simulation of General Operation and Maintenance Path Planning Model for Offshore Wind Farms
计算机科学, 2022, 49(6A): 795-801. https://doi.org/10.11896/jsjkx.210400300
[11] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[12] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[13] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[14] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[15] 沈彪, 沈立炜, 李弋.
空间众包任务的路径动态调度方法
Dynamic Task Scheduling Method for Space Crowdsourcing
计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!