计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231100139-6.doi: 10.11896/jsjkx.231100139
唐嘉宁1,2,3, 李成阳1,2, 周思达2,3, 马孟星1,2,3, 施炀1,2
TANG Jianing1,2,3, LI Chengyang1,2, ZHOU Sida2,3, MA Mengxing1,2,3, SHI Yang1,2
摘要: 无人机面对非结构化未知环境,如山地和丛林等场景进行探索时,必须在缺乏先验条件的情况下同时进行环境感知和航迹规划。传统方法受制于算法和传感器等多重因素的制约,探索范围有限,效率低下,并易受到环境变化的干扰。为解决这一问题,提出了一种基于深度强化学习的无人机自主探索方法。该方法以归一化优势函数(Normalized Advantage Functions,NAF)算法为基础,引入了3种算法增强机制,以提升无人机在非结构化未知环境中的探索范围和效率。在自行设计的仿真环境中进行实验,结果表明,改进后的NAF算法相较于原始版本,具有更大的探索范围和更高的效率,同时表现出优越的收敛性和鲁棒性。
中图分类号:
[1]VALENTE J,DEL CERRO J,BARRIENTOS A,et al.Aerialcoverage optimization in precision agriculture management:A musical harmony inspired approach[J].Computers and Electronics in Agriculture,2013,99:153-159. [2]TOMIC T,SCHMID K,LUTZ P,et al.Toward a fully autonomous UAV:Research platform for indoor and outdoor urban search and rescue[J].IEEE Robotics & Automation Magazine,2012,19(3):46-56. [3]YAMAUCHI B.A frontier-based approach for autonomous exploration[C]//Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97.IEEE,1997:146-151. [4]KEIDAR M,KAMINKA G A.Efficient frontier detection forrobot exploration[J].The International Journal of Robotics Research,2014,33(2):215-236. [5]ZHOU B,ZHANG Y,CHEN X,et al.FUEL:Fast UAV exploration using incremental frontier structure and hierarchical planning[J].IEEE Robotics and Automation Letters,2021,6(2):779-786. [6]LAVALLE S M.Rapidly-exploring random trees:A new tool forpath planning[R].Research Report,1998. [7]BIRCHER A,KAMEL M,ALEXIS K,et al.Receding horizon“next-bestview” planner for 3d exploration[C]//2016 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2016:1462-1468. [8]WANG C,MA H,CHEN W,et al.Efficient autonomous exploration with incrementally built topological map in 3Denviron-ments[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9853-9865. [9]KULKARNI P,GOSWAMI D,GUHA P,et al.Path planningfor a statically stable biped robot using PRM and reinforcement learning[J].Journal of Intelligent and Robotic Systems,2006,47(3):197-214. [10]MA X,XU Y,SUN G,et al.State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots[J].Journal of Zhejiang University Science C,2013,14(3):167-178. [11]PANOV A I,YAKOVLEV K S,SUVOROV R.Grid path planning with deep reinforcement learning:Preliminary results[J].Procedia Computer Science,2018,123:347-353. [12]WANG G,ZHENG X,ZHAO H,et al.Unmanned aerial vehi-cles path planning based on deep reinforcement learning[C]//The International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery.Cham:Springer,2019:81-88. [13]GUO N,LI C,WANG D,et al.Local path planning of mobile robot based on long short-term memory neural network[J].Automatic Control and Computer Sciences,2021,55(1):53-65. [14]SUTTON R S,BARTO A G.Reinforcement Learning:An I-ntroduction(2nd ed)[M].Massachusetts:MIT Press,2018. [15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [16]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]//International Conference on Machine Learning.PMLR,2016:2829-2838. [17]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [18]CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learn-ing phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014. [19]WAWRZYNSKI P.Control policy with autocorrelated noise inreinforcement learning for robotics[J].International Journal of Machine Learning and Computing,2015,5(2):91. [20]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized ex-perience replay[J].arXiv:1511.05952,2015. |
|