计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220900257-7.doi: 10.11896/jsjkx.220900257

• 人工智能 • 上一篇    下一篇

基于深度强化学习的四旋翼无人机自主控制方法

梁吉, 王立松, 黄昱洲, 秦小麟   

  1. 南京航空航天大学计算机科学与技术学院 南京 211106
  • 发布日期:2023-11-09
  • 通讯作者: 秦小麟(qinxcs@nuaa.edu.cn)
  • 作者简介:(2276835336@qq.com)
  • 基金资助:
    国家自然科学基金(61972198)

Autonomous Control Algorithm for Quadrotor Based on Deep Reinforcement Learning

LIANG Ji, WANG Lisong, HUANG Yuzhou, QIN Xiaolin   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Published:2023-11-09
  • About author:LIANG Ji,born in 1996,postgraduate.His main research interests include adaptive UAV control and reinforcement learning.
    QIN Xiaolin,born in 1953,Ph.D,professor,is a member of China Computer Federation.His main research interests include data management,unmanned system and security in distributed environment.
  • Supported by:
    National Natural Science Foundation of China(61972198).

摘要: 随着无人机的广泛应用,无人机控制器的设计成为近年来广泛研究的热点。当前无人机中广泛使用的PID,MPC等控制算法受到参数难调节、模型构建复杂、计算量大等一系列因素的制约。针对上述问题,提出了一种基于深度强化学习的无人机自主控制方法。该方法通过神经网络拟合无人机控制器,直接将无人机的状态映射到舵机的输出以控制无人机运动,在不断与环境进行交互训练中即可得到一个通用的无人机控制器,有效地避免了参数调节、模型构建等复杂操作。同时,为进一步提高模型的收敛速度和准确性,在传统强化学习算法Soft Actor Critic(SAC)的基础之上引入专家信息,提出了ESAC算法,指导无人机对环境进行探索,以增强控制策略的易用性和扩展性。最后在无人机的位置控制以及轨迹跟踪任务中,通过与传统PID控制器和SAC,DDPG等强化学习算法构建的模型控制器进行对比,实验结果表明,通过ESAC算法构建的控制器能够达到与PID控制器同样甚至更优的控制效果,同时在稳定性和准确性上优于SAC和DDPG构建的控制器。

关键词: 强化学习, 四旋翼无人机, 自主控制, 专家策略

Abstract: With the wide application of UAV,the design of UAV controller has become a hot research topic in recent years.The control algorithms such as PID and MPC widely used in UAV are restricted by a series of factors such as difficult parameter adjustment,complex model construction,and large amount of calculation.Aiming at the above problems,a UAV autonomous control method based on deep reinforcement learning is proposed.This method fits the UAV controller through a neural network,directly maps the state of the UAV to the output of the steering gear to control the movement of the UAV,and can obtain a general UAV controller in the continuous interactive training with the environment.This method effectively avoids complex operations such as parameter adjustment and model building.At the same time,in order to further improve the convergence speed and accuracy of the model,on the basis of the traditional reinforcement learning algorithm soft actor critic(SAC),by introducing expert information,an ESAC algorithm is proposed,which guides the UAV to explore the environment and enhances the ease of control strategy.Finally,in the position control and trajectory tracking tasks of the UAV,compared to the traditional PID controller and the model controller constructed by SAC,DDPG and other reinforcement learning algorithms,experimental results show that the controller constructed by the ESAC algorithm can achieve the same level as the PID controller,and it is better than the controller built by SAC and DDPG in stability and accuracy.

Key words: Reinforcement learning, Quadrotor, Autonomous control, Expert policy

中图分类号: 

  • TP391
[1]MOAD I,SALAMI M,ANNAZ F,et al.A Review of Quadrotor Unmanned Aerial Vehicles:Applications,Architectural Design and Control Algorithms[J].Journal of Intelligent & Robotic Systems,2022,104(2):1-33.
[2]ANG K H,CHONG G,LI Y.PID control system analysis,design,and technology[J].IEEE Transactions on Control Systems Technology,2005,13(4):559-576.
[3]GARCIA C E,PRETT D M,MORARI M.Model predictivecontrol:Theory and practice—A survey[J].Automatica,1989,25(3):335-348.
[4]ARGENTIM L M,REZENDE W C,SANTOS P E,et al.PID,LQR and LQR-PID on a quadcopter platform[C]//2013 International Conference on Informatics,Electronics and Vision(ICIEV).IEEE,2013:1-6.
[5]EMRAN B J,NAJJARAN H.A review of quadrotor:An underactuated mechanical system[J].Annual Reviews in Control,2018,46:165-180.
[6]YU X,FAN Y,XU S,et al.A self-adaptive SAC-PID control approach based on reinforcement learning for mobile robots[J].International Journal of Robust and Nonlinear Control,2021.
[7]WILLIAMS G,WAGENER N,GOLDFAIN B,et al.Information theoretic MPC for model-based reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:1714-1721.
[8]HWANGBO J,SA I,SIEGWART R,et al.Control of a quadrotor with reinforcement learning[J].IEEE Robotics and Automation Letters,2017,2(4):2096-2103.
[9]KOCH W,MANCUSO R,WEST R,et al.Reinforcement lear-ning for UAV attitude control[J].ACM Transactions on Cyber-Physical Systems,2019,3(2):1-21.
[10]LEWIS F L,VRABIE D,VAMVOUDAKIS K G.Reinforcement learning and feedback control:Using natural decision methods to design optimal adaptive controllers[J].IEEE Control Systems Magazine,2012,32(6):76-105.
[11]MEIER L,HONEGGER D,POLLEFEYS M.PX4:A node-based multithreaded open source robotics framework for deeply embedded platforms[C]//2015 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2015:6235-6240.
[12]NG T C T,LEUNG F H F,TAM P K S.A simple gain sche-duled PID controller with stability consideration based on a grid-point concept[C]//Proceeding of the IEEE International Symposium on Industrial Electronics(ISIE’97).IEEE,1997:1090-1094.
[13]PAPADOPOULOS K G,TSELEPIS N D,MARGARIS N I.On the automatic tuning of PID type controllers via the magnitude optimum criterion[C]//2012 IEEE International Conference on Industrial Technology.IEEE,2012:869-874.
[14]PI C H,HU K C,CHENG S,et al.Low-level autonomous control and tracking of quadrotor using reinforcement learning[J].Control Engineering Practice,2020,95:104222.
[15]XIE L,WANG S,ROSA S,et al.Learning with training wheels:speeding up training with a simple controller for deep reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6276-6283.
[16]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015.
[17]PANERATI J,ZHENG H,ZHOU S Q,et al.Learning to fly—a gym environment with pybullet physics for reinforcement lear-ning of multi-agent quadcopter control[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2021:7512-7519.
[18]LoPES G C,FERREIRA M,DA SILVA SIMOES A,et al.Intelligent control of a quadrotor with proximal policy optimization reinforcement learning[C]//2018 Latin American Robotic Symposium,2018 Brazilian Symposium on Robotics(SBR) and 2018 Workshop on Robotics in Education(WRE).IEEE,2018:503-508.
[19]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[20]FAN D D,AGHA-MOHAMMADI A,THEODOROU E A.Deep learning tubes for tube mpc[J].arXiv:2002.01587,2020.
[21]BIEKER K,PEITZ S,BRUNTON S L,et al.Deep model predictive flow control with limited sensor data and online learning[J].Theoretical and Computational Fluid Dynamics,2020,34(4):577-591.
[22]LENZ I,KNEPPER R A,SAXENA A.DeepMPC:Learningdeep latent features for model predictive control[C]//Robotics:Science and Systems.2015.
[23]KABZAN J,HEWING L,LINIGER A,et al.Learning-basedmodel predictive control for autonomous racing[J].IEEE Robotics and Automation Letters,2019,4(4):3363-3370.
[24]TORRENTE G,KAUFMANN E,FÖHN P,et al.Data-driven MPC for quadrotors[J].IEEE Robotics and Automation Letters,2021,6(2):3769-3776.
[25]LAMBERT N O,DREW D S,YACONELLI J,et al.Low-level control of a quadrotor with deep model-based reinforcement learning[J].IEEE Robotics and Automation Letters,2019,4(4):4224-4230.
[26]HAARNOJA T,ZHOU A,HARTIKAINEN K,et al.Soft actor-critic algorithms and applications[J].arXiv:1812.05905,2018.
[27]JOSHUA A.Spinning Up in Deep Reinforcement Learning[OL].https://github.com/openai/spinningup.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!