基于深度强化学习的无人机自主探索方法

doi:10.11896/jsjkx.231100139

计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231100139-6.doi: 10.11896/jsjkx.231100139

基于深度强化学习的无人机自主探索方法

唐嘉宁^1,2,3, 李成阳^1,2, 周思达^2,3, 马孟星^1,2,3, 施炀^1,2

1 云南民族大学电气信息工程学院昆明 650031
2 云南省无人自主系统重点实验室昆明 650031
3 云南民族大学无人自主系统研究院昆明 650031

出版日期:2024-11-16 发布日期:2024-11-13
通讯作者: 李成阳(1521148692@qq.com)
作者简介:(041749@yum.edu.cn)
基金资助:
国家自然科学基金(61963038,62063035)

Autonomous Exploration Methods for Unmanned Aerial Vehicles Based on Deep ReinforcementLearning

TANG Jianing^1,2,3, LI Chengyang^1,2, ZHOU Sida^2,3, MA Mengxing^1,2,3, SHI Yang^1,2

1 School of Electrical and Information Technology,Yunnan Minzu University,Kunming 650031,China
2 Yunnan Key Laboratory of Unmanned Autonomous System,Kunming 650031,China
3 Institute of Unmanned Autonomous Systems,Yunnan Minzu University,Kunming 650031,China

Online:2024-11-16 Published:2024-11-13
About author:TANG Jianing,born in 1984,Ph.D,professor,Ph.D supervisor.Her main research interest is cooperative guidance and control.
LI Chengyang,born in 1999,postgra-duate.His main research interests include deep reinforcement learning and auto-nomous exploration of unmanned aerial vehicles.
Supported by:
National Natural Science Foundation of China(61963038,62063035).

摘要/Abstract

摘要： 无人机面对非结构化未知环境,如山地和丛林等场景进行探索时,必须在缺乏先验条件的情况下同时进行环境感知和航迹规划。传统方法受制于算法和传感器等多重因素的制约,探索范围有限,效率低下,并易受到环境变化的干扰。为解决这一问题,提出了一种基于深度强化学习的无人机自主探索方法。该方法以归一化优势函数(Normalized Advantage Functions,NAF)算法为基础,引入了3种算法增强机制,以提升无人机在非结构化未知环境中的探索范围和效率。在自行设计的仿真环境中进行实验,结果表明,改进后的NAF算法相较于原始版本,具有更大的探索范围和更高的效率,同时表现出优越的收敛性和鲁棒性。

关键词: 无人机自主探索, 智能决策, 深度强化学习, NAF算法, 增强机制

Abstract: Faced with unstructured and unknown environments,such as exploring in mountains and jungles,UAVs must simultaneously perform environment sensing and trajectory planning in the absence of a priori conditions.Traditional methods are constrained by multiple factors such as algorithms and sensors,resulting in limited exploration range,low efficiency,and susceptibility to interference from environmental changes.To solve this problem,this study proposes an autonomous exploration method for UAVs based on deep reinforcement learning.The method is based on the normalized advantage functions(NAF) algorithm and introduces three algorithmic enhancement mechanisms to improve the exploration range and efficiency of UAVs in unstructured and unknown environments.By conducting experiments in a self-designed simulation environment,the results of simulation expe-riments and analysis show that the improved NAF algorithm has a larger exploration range and higher efficiency compared to the original version,while exhibiting superior convergence and robustness.

Key words: Autonomous UAV exploration, Intelligent decision making, Deep reinforcement learning, NAF algorithm, Augmentation mechanism

中图分类号:

V249

唐嘉宁, 李成阳, 周思达, 马孟星, 施炀. 基于深度强化学习的无人机自主探索方法[J]. 计算机科学, 2024, 51(11A): 231100139-6. https://doi.org/10.11896/jsjkx.231100139

TANG Jianing, LI Chengyang, ZHOU Sida, MA Mengxing, SHI Yang. Autonomous Exploration Methods for Unmanned Aerial Vehicles Based on Deep ReinforcementLearning[J]. Computer Science, 2024, 51(11A): 231100139-6. https://doi.org/10.11896/jsjkx.231100139

参考文献

[1]VALENTE J,DEL CERRO J,BARRIENTOS A,et al.Aerialcoverage optimization in precision agriculture management:A musical harmony inspired approach[J].Computers and Electronics in Agriculture,2013,99:153-159.
[2]TOMIC T,SCHMID K,LUTZ P,et al.Toward a fully autonomous UAV:Research platform for indoor and outdoor urban search and rescue[J].IEEE Robotics & Automation Magazine,2012,19(3):46-56.
[3]YAMAUCHI B.A frontier-based approach for autonomous exploration[C]//Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97.IEEE,1997:146-151.
[4]KEIDAR M,KAMINKA G A.Efficient frontier detection forrobot exploration[J].The International Journal of Robotics Research,2014,33(2):215-236.
[5]ZHOU B,ZHANG Y,CHEN X,et al.FUEL:Fast UAV exploration using incremental frontier structure and hierarchical planning[J].IEEE Robotics and Automation Letters,2021,6(2):779-786.
[6]LAVALLE S M.Rapidly-exploring random trees:A new tool forpath planning[R].Research Report,1998.
[7]BIRCHER A,KAMEL M,ALEXIS K,et al.Receding horizon“next-bestview” planner for 3d exploration[C]//2016 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2016:1462-1468.
[8]WANG C,MA H,CHEN W,et al.Efficient autonomous exploration with incrementally built topological map in 3Denviron-ments[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9853-9865.
[9]KULKARNI P,GOSWAMI D,GUHA P,et al.Path planningfor a statically stable biped robot using PRM and reinforcement learning[J].Journal of Intelligent and Robotic Systems,2006,47(3):197-214.
[10]MA X,XU Y,SUN G,et al.State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots[J].Journal of Zhejiang University Science C,2013,14(3):167-178.
[11]PANOV A I,YAKOVLEV K S,SUVOROV R.Grid path planning with deep reinforcement learning:Preliminary results[J].Procedia Computer Science,2018,123:347-353.
[12]WANG G,ZHENG X,ZHAO H,et al.Unmanned aerial vehi-cles path planning based on deep reinforcement learning[C]//The International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery.Cham:Springer,2019:81-88.
[13]GUO N,LI C,WANG D,et al.Local path planning of mobile robot based on long short-term memory neural network[J].Automatic Control and Computer Sciences,2021,55(1):53-65.
[14]SUTTON R S,BARTO A G.Reinforcement Learning:An I-ntroduction(2nd ed)[M].Massachusetts:MIT Press,2018.
[15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[16]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]//International Conference on Machine Learning.PMLR,2016:2829-2838.
[17]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[18]CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learn-ing phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[19]WAWRZYNSKI P.Control policy with autocorrelated noise inreinforcement learning for robotics[J].International Journal of Machine Learning and Computing,2015,5(2):91.
[20]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized ex-perience replay[J].arXiv:1511.05952,2015.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于深度强化学习的无人机自主探索方法

Autonomous Exploration Methods for Unmanned Aerial Vehicles Based on Deep ReinforcementLearning

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0