计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 94-97.

• 智能计算 • 上一篇    下一篇

基于深度强化算法的机器人动态目标点跟随研究

徐继宁, 曾杰   

  1. (北方工业大学电气与控制工程学院 北京100043)
  • 出版日期:2019-11-10 发布日期:2019-11-20
  • 通讯作者: 徐继宁(1970-),女,博士,副教授,主要研究方向为信号处理、工业控制和总线技术,E-mail:jxu0422@ncut.edu。
  • 基金资助:
    本文受北方工大科研专项项目(108051360018XN073)资助。

Dynamic Target Following Based on Reinforcement Learning of Robot-car

XU Ji-ning, ZENG Jie   

  1. (School of Electrical and Control Engineering,North China University of Technology,Beijing 100043,China)
  • Online:2019-11-10 Published:2019-11-20

摘要: 机器人的路径规划一直是机器人运动控制研究的热点。目前的路径规划需要耗费大量时间来构建地图,而基于不断“试错”机制的强化学习通过预先的训练可以实现无地图条件下的路径规划。通过对当前的多种深度强化学习算法进行研究和分析,利用低维度的雷达数据和少量位置信息,最终确定了在不同智能家居环境下的有效动态目标点跟踪策略,同时完成了避障功能。实验结果表明,基于优先采样的DQN、Dueling Double DQN和DDPG算法,在不同环境下呈现较强的泛化能力。

关键词: 路径规划, 目标跟随, 强化学习

Abstract: Robot path planning has always been a hot topic in robot motion control.The current path planning takes a lot of time to build the map,but the reinforcement learning based on continuous “trial and error” mechanism can realize the mapless navigation.Through the research and analysis of current various deep reinforcement learning algorithms,using low-dimensional radar data and a small amount of position information can follow a moving target and avoid collisions in indoor environments.The results show that DQN、Dueling Double DQN and DDPG algorithms based on priority sampling present strong generalization capabilities in different environment.

Key words: Path planning, Reinforcement learning, Target following

中图分类号: 

  • TP181
[1]王春颖,刘平,秦洪政.移动机器人的智能路径规划算法综述[J].传感器与微系统,2018,37(8):5-8.
[2]刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27.
[3]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-learning[J].Computer Science,2015.
[4]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]∥InternationalConference on International Conference on Machine Learning.JMLR.org,2014:387-395.
[5]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].Computer Science,2013.
[6]KONDA V.Actorcritic algorithms[J].Siam Journal on Control &Optimization,2003,42(4):1143-1166.
[7]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].Computer Science,2015,8(6):A187.
[8]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[J].arXiv:1511.06581,2015.
[9]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-learning[J].Computer Science,2015.
[10]郭宪,方勇纯.深入浅出强化学习原理入门[M].北京:电子工业出版社,2018:125-141.
[11]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[J].Computer Science,2015.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[4] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[5] 王兵, 吴洪亮, 牛新征.
基于改进势场法的机器人路径规划
Robot Path Planning Based on Improved Potential Field Method
计算机科学, 2022, 49(7): 196-203. https://doi.org/10.11896/jsjkx.210500020
[6] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[7] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[8] 杨浩雄, 高晶, 邵恩露.
考虑一单多品的外卖订单配送时间的带时间窗的车辆路径问题
Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery
计算机科学, 2022, 49(6A): 191-198. https://doi.org/10.11896/jsjkx.210400005
[9] 谭任深, 徐龙博, 周冰, 荆朝霞, 黄向生.
海上风电场通用运维路径规划模型优化及仿真
Optimization and Simulation of General Operation and Maintenance Path Planning Model for Offshore Wind Farms
计算机科学, 2022, 49(6A): 795-801. https://doi.org/10.11896/jsjkx.210400300
[10] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[11] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[13] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[14] 张佳能, 李辉, 吴昊霖, 王壮.
一种平衡探索和利用的优先经验回放方法
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[15] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!