Computer Science ›› 2023, Vol. 50 ›› Issue (11A): 220900257-7.doi: 10.11896/jsjkx.220900257

• Artificial Intelligence • Previous Articles     Next Articles

Autonomous Control Algorithm for Quadrotor Based on Deep Reinforcement Learning

LIANG Ji, WANG Lisong, HUANG Yuzhou, QIN Xiaolin   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Published:2023-11-09
  • About author:LIANG Ji,born in 1996,postgraduate.His main research interests include adaptive UAV control and reinforcement learning.
    QIN Xiaolin,born in 1953,Ph.D,professor,is a member of China Computer Federation.His main research interests include data management,unmanned system and security in distributed environment.
  • Supported by:
    National Natural Science Foundation of China(61972198).

Abstract: With the wide application of UAV,the design of UAV controller has become a hot research topic in recent years.The control algorithms such as PID and MPC widely used in UAV are restricted by a series of factors such as difficult parameter adjustment,complex model construction,and large amount of calculation.Aiming at the above problems,a UAV autonomous control method based on deep reinforcement learning is proposed.This method fits the UAV controller through a neural network,directly maps the state of the UAV to the output of the steering gear to control the movement of the UAV,and can obtain a general UAV controller in the continuous interactive training with the environment.This method effectively avoids complex operations such as parameter adjustment and model building.At the same time,in order to further improve the convergence speed and accuracy of the model,on the basis of the traditional reinforcement learning algorithm soft actor critic(SAC),by introducing expert information,an ESAC algorithm is proposed,which guides the UAV to explore the environment and enhances the ease of control strategy.Finally,in the position control and trajectory tracking tasks of the UAV,compared to the traditional PID controller and the model controller constructed by SAC,DDPG and other reinforcement learning algorithms,experimental results show that the controller constructed by the ESAC algorithm can achieve the same level as the PID controller,and it is better than the controller built by SAC and DDPG in stability and accuracy.

Key words: Reinforcement learning, Quadrotor, Autonomous control, Expert policy

CLC Number: 

  • TP391
[1]MOAD I,SALAMI M,ANNAZ F,et al.A Review of Quadrotor Unmanned Aerial Vehicles:Applications,Architectural Design and Control Algorithms[J].Journal of Intelligent & Robotic Systems,2022,104(2):1-33.
[2]ANG K H,CHONG G,LI Y.PID control system analysis,design,and technology[J].IEEE Transactions on Control Systems Technology,2005,13(4):559-576.
[3]GARCIA C E,PRETT D M,MORARI M.Model predictivecontrol:Theory and practice—A survey[J].Automatica,1989,25(3):335-348.
[4]ARGENTIM L M,REZENDE W C,SANTOS P E,et al.PID,LQR and LQR-PID on a quadcopter platform[C]//2013 International Conference on Informatics,Electronics and Vision(ICIEV).IEEE,2013:1-6.
[5]EMRAN B J,NAJJARAN H.A review of quadrotor:An underactuated mechanical system[J].Annual Reviews in Control,2018,46:165-180.
[6]YU X,FAN Y,XU S,et al.A self-adaptive SAC-PID control approach based on reinforcement learning for mobile robots[J].International Journal of Robust and Nonlinear Control,2021.
[7]WILLIAMS G,WAGENER N,GOLDFAIN B,et al.Information theoretic MPC for model-based reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:1714-1721.
[8]HWANGBO J,SA I,SIEGWART R,et al.Control of a quadrotor with reinforcement learning[J].IEEE Robotics and Automation Letters,2017,2(4):2096-2103.
[9]KOCH W,MANCUSO R,WEST R,et al.Reinforcement lear-ning for UAV attitude control[J].ACM Transactions on Cyber-Physical Systems,2019,3(2):1-21.
[10]LEWIS F L,VRABIE D,VAMVOUDAKIS K G.Reinforcement learning and feedback control:Using natural decision methods to design optimal adaptive controllers[J].IEEE Control Systems Magazine,2012,32(6):76-105.
[11]MEIER L,HONEGGER D,POLLEFEYS M.PX4:A node-based multithreaded open source robotics framework for deeply embedded platforms[C]//2015 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2015:6235-6240.
[12]NG T C T,LEUNG F H F,TAM P K S.A simple gain sche-duled PID controller with stability consideration based on a grid-point concept[C]//Proceeding of the IEEE International Symposium on Industrial Electronics(ISIE’97).IEEE,1997:1090-1094.
[13]PAPADOPOULOS K G,TSELEPIS N D,MARGARIS N I.On the automatic tuning of PID type controllers via the magnitude optimum criterion[C]//2012 IEEE International Conference on Industrial Technology.IEEE,2012:869-874.
[14]PI C H,HU K C,CHENG S,et al.Low-level autonomous control and tracking of quadrotor using reinforcement learning[J].Control Engineering Practice,2020,95:104222.
[15]XIE L,WANG S,ROSA S,et al.Learning with training wheels:speeding up training with a simple controller for deep reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6276-6283.
[16]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015.
[17]PANERATI J,ZHENG H,ZHOU S Q,et al.Learning to fly—a gym environment with pybullet physics for reinforcement lear-ning of multi-agent quadcopter control[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2021:7512-7519.
[18]LoPES G C,FERREIRA M,DA SILVA SIMOES A,et al.Intelligent control of a quadrotor with proximal policy optimization reinforcement learning[C]//2018 Latin American Robotic Symposium,2018 Brazilian Symposium on Robotics(SBR) and 2018 Workshop on Robotics in Education(WRE).IEEE,2018:503-508.
[19]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[20]FAN D D,AGHA-MOHAMMADI A,THEODOROU E A.Deep learning tubes for tube mpc[J].arXiv:2002.01587,2020.
[21]BIEKER K,PEITZ S,BRUNTON S L,et al.Deep model predictive flow control with limited sensor data and online learning[J].Theoretical and Computational Fluid Dynamics,2020,34(4):577-591.
[22]LENZ I,KNEPPER R A,SAXENA A.DeepMPC:Learningdeep latent features for model predictive control[C]//Robotics:Science and Systems.2015.
[23]KABZAN J,HEWING L,LINIGER A,et al.Learning-basedmodel predictive control for autonomous racing[J].IEEE Robotics and Automation Letters,2019,4(4):3363-3370.
[24]TORRENTE G,KAUFMANN E,FÖHN P,et al.Data-driven MPC for quadrotors[J].IEEE Robotics and Automation Letters,2021,6(2):3769-3776.
[25]LAMBERT N O,DREW D S,YACONELLI J,et al.Low-level control of a quadrotor with deep model-based reinforcement learning[J].IEEE Robotics and Automation Letters,2019,4(4):4224-4230.
[26]HAARNOJA T,ZHOU A,HARTIKAINEN K,et al.Soft actor-critic algorithms and applications[J].arXiv:1812.05905,2018.
[27]JOSHUA A.Spinning Up in Deep Reinforcement Learning[OL].https://github.com/openai/spinningup.
[1] LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317.
[2] LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356.
[3] JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67.
[4] XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208.
[5] ZHANG Naixin, CHEN Xiaorui, LI An, YANG Leyao, WU Huaming. Edge Offloading Framework for D2D-MEC Networks Based on Deep Reinforcement Learningand Wireless Charging Technology [J]. Computer Science, 2023, 50(8): 233-242.
[6] XING Linquan, XIAO Yingmin, YANG Zhibin, WEI Zhengmin, ZHOU Yong, GAO Saijun. Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning [J]. Computer Science, 2023, 50(8): 271-279.
[7] ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua. Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning [J]. Computer Science, 2023, 50(7): 308-316.
[8] LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[9] SHI Liang, WEN Liangming, LEI Sheng, LI Jianhui. Virtual Machine Consolidation Algorithm Based on Decision Tree and Improved Q-learning by Uniform Distribution [J]. Computer Science, 2023, 50(6): 36-44.
[10] WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273.
[11] MIAO Kuan, LI Chongshou. Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning [J]. Computer Science, 2023, 50(6): 274-282.
[12] ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216.
[13] YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171.
[14] XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332.
[15] Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!