Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 231100139-6.doi: 10.11896/jsjkx.231100139

• Intelligent Computing • Previous Articles     Next Articles

Autonomous Exploration Methods for Unmanned Aerial Vehicles Based on Deep ReinforcementLearning

TANG Jianing1,2,3, LI Chengyang1,2, ZHOU Sida2,3, MA Mengxing1,2,3, SHI Yang1,2   

  1. 1 School of Electrical and Information Technology,Yunnan Minzu University,Kunming 650031,China
    2 Yunnan Key Laboratory of Unmanned Autonomous System,Kunming 650031,China
    3 Institute of Unmanned Autonomous Systems,Yunnan Minzu University,Kunming 650031,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:TANG Jianing,born in 1984,Ph.D,professor,Ph.D supervisor.Her main research interest is cooperative guidance and control.
    LI Chengyang,born in 1999,postgra-duate.His main research interests include deep reinforcement learning and auto-nomous exploration of unmanned aerial vehicles.
  • Supported by:
    National Natural Science Foundation of China(61963038,62063035).

Abstract: Faced with unstructured and unknown environments,such as exploring in mountains and jungles,UAVs must simultaneously perform environment sensing and trajectory planning in the absence of a priori conditions.Traditional methods are constrained by multiple factors such as algorithms and sensors,resulting in limited exploration range,low efficiency,and susceptibility to interference from environmental changes.To solve this problem,this study proposes an autonomous exploration method for UAVs based on deep reinforcement learning.The method is based on the normalized advantage functions(NAF) algorithm and introduces three algorithmic enhancement mechanisms to improve the exploration range and efficiency of UAVs in unstructured and unknown environments.By conducting experiments in a self-designed simulation environment,the results of simulation expe-riments and analysis show that the improved NAF algorithm has a larger exploration range and higher efficiency compared to the original version,while exhibiting superior convergence and robustness.

Key words: Autonomous UAV exploration, Intelligent decision making, Deep reinforcement learning, NAF algorithm, Augmentation mechanism

CLC Number: 

  • V249
[1]VALENTE J,DEL CERRO J,BARRIENTOS A,et al.Aerialcoverage optimization in precision agriculture management:A musical harmony inspired approach[J].Computers and Electronics in Agriculture,2013,99:153-159.
[2]TOMIC T,SCHMID K,LUTZ P,et al.Toward a fully autonomous UAV:Research platform for indoor and outdoor urban search and rescue[J].IEEE Robotics & Automation Magazine,2012,19(3):46-56.
[3]YAMAUCHI B.A frontier-based approach for autonomous exploration[C]//Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97.IEEE,1997:146-151.
[4]KEIDAR M,KAMINKA G A.Efficient frontier detection forrobot exploration[J].The International Journal of Robotics Research,2014,33(2):215-236.
[5]ZHOU B,ZHANG Y,CHEN X,et al.FUEL:Fast UAV exploration using incremental frontier structure and hierarchical planning[J].IEEE Robotics and Automation Letters,2021,6(2):779-786.
[6]LAVALLE S M.Rapidly-exploring random trees:A new tool forpath planning[R].Research Report,1998.
[7]BIRCHER A,KAMEL M,ALEXIS K,et al.Receding horizon“next-bestview” planner for 3d exploration[C]//2016 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2016:1462-1468.
[8]WANG C,MA H,CHEN W,et al.Efficient autonomous exploration with incrementally built topological map in 3Denviron-ments[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9853-9865.
[9]KULKARNI P,GOSWAMI D,GUHA P,et al.Path planningfor a statically stable biped robot using PRM and reinforcement learning[J].Journal of Intelligent and Robotic Systems,2006,47(3):197-214.
[10]MA X,XU Y,SUN G,et al.State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots[J].Journal of Zhejiang University Science C,2013,14(3):167-178.
[11]PANOV A I,YAKOVLEV K S,SUVOROV R.Grid path planning with deep reinforcement learning:Preliminary results[J].Procedia Computer Science,2018,123:347-353.
[12]WANG G,ZHENG X,ZHAO H,et al.Unmanned aerial vehi-cles path planning based on deep reinforcement learning[C]//The International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery.Cham:Springer,2019:81-88.
[13]GUO N,LI C,WANG D,et al.Local path planning of mobile robot based on long short-term memory neural network[J].Automatic Control and Computer Sciences,2021,55(1):53-65.
[14]SUTTON R S,BARTO A G.Reinforcement Learning:An I-ntroduction(2nd ed)[M].Massachusetts:MIT Press,2018.
[15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[16]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]//International Conference on Machine Learning.PMLR,2016:2829-2838.
[17]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[18]CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learn-ing phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[19]WAWRZYNSKI P.Control policy with autocorrelated noise inreinforcement learning for robotics[J].International Journal of Machine Learning and Computing,2015,5(2):91.
[20]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized ex-perience replay[J].arXiv:1511.05952,2015.
[1] WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[2] ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[3] GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[4] LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[5] WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[6] YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[7] LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258.
[8] SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277.
[9] ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310.
[10] CHEN Juan, WANG Yang, WU Zongling, CHEN Peng, ZHANG Fengchun , HAO Junfeng. Cloud-Edge Collaborative Task Transfer and Resource Reallocation Optimization Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(11A): 231100170-10.
[11] AN Yang, WANG Xiuqing, ZHAO Minghua. Mobile Robots' Path Planning Method Based on Policy Fusion and Spiking Deep ReinforcementLearning [J]. Computer Science, 2024, 51(11A): 240100211-11.
[12] LU Yue, WANG Qiong, LIU Shun, LI Qingtao, LIU Yang, WANG Hongbiao. Reinforcement Learning Algorithm for Charging/Discharging Control of Electric Vehicles Considering Battery Loss [J]. Computer Science, 2024, 51(11A): 231200147-7.
[13] YANG Haolin, LIU Quan. Advantage Weighted Double Actors-Critics Algorithm Based on Key-Minor Architecture for Policy Distillation [J]. Computer Science, 2024, 51(11): 81-94.
[14] ZHAO Weidong, LU Ming, ZHANG Rui. Study on Road Crack Detection Based on Weakly Supervised Semantic Segmentation [J]. Computer Science, 2024, 51(11): 148-156.
[15] LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!