计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 94-97.
徐继宁, 曾杰
XU Ji-ning, ZENG Jie
摘要: 机器人的路径规划一直是机器人运动控制研究的热点。目前的路径规划需要耗费大量时间来构建地图,而基于不断“试错”机制的强化学习通过预先的训练可以实现无地图条件下的路径规划。通过对当前的多种深度强化学习算法进行研究和分析,利用低维度的雷达数据和少量位置信息,最终确定了在不同智能家居环境下的有效动态目标点跟踪策略,同时完成了避障功能。实验结果表明,基于优先采样的DQN、Dueling Double DQN和DDPG算法,在不同环境下呈现较强的泛化能力。
中图分类号:
[1]王春颖,刘平,秦洪政.移动机器人的智能路径规划算法综述[J].传感器与微系统,2018,37(8):5-8. [2]刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27. [3]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-learning[J].Computer Science,2015. [4]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]∥InternationalConference on International Conference on Machine Learning.JMLR.org,2014:387-395. [5]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].Computer Science,2013. [6]KONDA V.Actorcritic algorithms[J].Siam Journal on Control &Optimization,2003,42(4):1143-1166. [7]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].Computer Science,2015,8(6):A187. [8]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[J].arXiv:1511.06581,2015. [9]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-learning[J].Computer Science,2015. [10]郭宪,方勇纯.深入浅出强化学习原理入门[M].北京:电子工业出版社,2018:125-141. [11]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[J].Computer Science,2015. |
[1] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[2] | 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148 |
[3] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[4] | 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174 |
[5] | 王兵, 吴洪亮, 牛新征. 基于改进势场法的机器人路径规划 Robot Path Planning Based on Improved Potential Field Method 计算机科学, 2022, 49(7): 196-203. https://doi.org/10.11896/jsjkx.210500020 |
[6] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[7] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[8] | 杨浩雄, 高晶, 邵恩露. 考虑一单多品的外卖订单配送时间的带时间窗的车辆路径问题 Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery 计算机科学, 2022, 49(6A): 191-198. https://doi.org/10.11896/jsjkx.210400005 |
[9] | 谭任深, 徐龙博, 周冰, 荆朝霞, 黄向生. 海上风电场通用运维路径规划模型优化及仿真 Optimization and Simulation of General Operation and Maintenance Path Planning Model for Offshore Wind Farms 计算机科学, 2022, 49(6A): 795-801. https://doi.org/10.11896/jsjkx.210400300 |
[10] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[11] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[12] | 郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101 |
[13] | 范静宇, 刘全. 基于随机加权三重Q学习的异策略最大熵强化学习算法 Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning 计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081 |
[14] | 张佳能, 李辉, 吴昊霖, 王壮. 一种平衡探索和利用的优先经验回放方法 Exploration and Exploitation Balanced Experience Replay 计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084 |
[15] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
|