计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220900257-7.doi: 10.11896/jsjkx.220900257
梁吉, 王立松, 黄昱洲, 秦小麟
LIANG Ji, WANG Lisong, HUANG Yuzhou, QIN Xiaolin
摘要: 随着无人机的广泛应用,无人机控制器的设计成为近年来广泛研究的热点。当前无人机中广泛使用的PID,MPC等控制算法受到参数难调节、模型构建复杂、计算量大等一系列因素的制约。针对上述问题,提出了一种基于深度强化学习的无人机自主控制方法。该方法通过神经网络拟合无人机控制器,直接将无人机的状态映射到舵机的输出以控制无人机运动,在不断与环境进行交互训练中即可得到一个通用的无人机控制器,有效地避免了参数调节、模型构建等复杂操作。同时,为进一步提高模型的收敛速度和准确性,在传统强化学习算法Soft Actor Critic(SAC)的基础之上引入专家信息,提出了ESAC算法,指导无人机对环境进行探索,以增强控制策略的易用性和扩展性。最后在无人机的位置控制以及轨迹跟踪任务中,通过与传统PID控制器和SAC,DDPG等强化学习算法构建的模型控制器进行对比,实验结果表明,通过ESAC算法构建的控制器能够达到与PID控制器同样甚至更优的控制效果,同时在稳定性和准确性上优于SAC和DDPG构建的控制器。
中图分类号:
[1]MOAD I,SALAMI M,ANNAZ F,et al.A Review of Quadrotor Unmanned Aerial Vehicles:Applications,Architectural Design and Control Algorithms[J].Journal of Intelligent & Robotic Systems,2022,104(2):1-33. [2]ANG K H,CHONG G,LI Y.PID control system analysis,design,and technology[J].IEEE Transactions on Control Systems Technology,2005,13(4):559-576. [3]GARCIA C E,PRETT D M,MORARI M.Model predictivecontrol:Theory and practice—A survey[J].Automatica,1989,25(3):335-348. [4]ARGENTIM L M,REZENDE W C,SANTOS P E,et al.PID,LQR and LQR-PID on a quadcopter platform[C]//2013 International Conference on Informatics,Electronics and Vision(ICIEV).IEEE,2013:1-6. [5]EMRAN B J,NAJJARAN H.A review of quadrotor:An underactuated mechanical system[J].Annual Reviews in Control,2018,46:165-180. [6]YU X,FAN Y,XU S,et al.A self-adaptive SAC-PID control approach based on reinforcement learning for mobile robots[J].International Journal of Robust and Nonlinear Control,2021. [7]WILLIAMS G,WAGENER N,GOLDFAIN B,et al.Information theoretic MPC for model-based reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:1714-1721. [8]HWANGBO J,SA I,SIEGWART R,et al.Control of a quadrotor with reinforcement learning[J].IEEE Robotics and Automation Letters,2017,2(4):2096-2103. [9]KOCH W,MANCUSO R,WEST R,et al.Reinforcement lear-ning for UAV attitude control[J].ACM Transactions on Cyber-Physical Systems,2019,3(2):1-21. [10]LEWIS F L,VRABIE D,VAMVOUDAKIS K G.Reinforcement learning and feedback control:Using natural decision methods to design optimal adaptive controllers[J].IEEE Control Systems Magazine,2012,32(6):76-105. [11]MEIER L,HONEGGER D,POLLEFEYS M.PX4:A node-based multithreaded open source robotics framework for deeply embedded platforms[C]//2015 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2015:6235-6240. [12]NG T C T,LEUNG F H F,TAM P K S.A simple gain sche-duled PID controller with stability consideration based on a grid-point concept[C]//Proceeding of the IEEE International Symposium on Industrial Electronics(ISIE’97).IEEE,1997:1090-1094. [13]PAPADOPOULOS K G,TSELEPIS N D,MARGARIS N I.On the automatic tuning of PID type controllers via the magnitude optimum criterion[C]//2012 IEEE International Conference on Industrial Technology.IEEE,2012:869-874. [14]PI C H,HU K C,CHENG S,et al.Low-level autonomous control and tracking of quadrotor using reinforcement learning[J].Control Engineering Practice,2020,95:104222. [15]XIE L,WANG S,ROSA S,et al.Learning with training wheels:speeding up training with a simple controller for deep reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6276-6283. [16]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015. [17]PANERATI J,ZHENG H,ZHOU S Q,et al.Learning to fly—a gym environment with pybullet physics for reinforcement lear-ning of multi-agent quadcopter control[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2021:7512-7519. [18]LoPES G C,FERREIRA M,DA SILVA SIMOES A,et al.Intelligent control of a quadrotor with proximal policy optimization reinforcement learning[C]//2018 Latin American Robotic Symposium,2018 Brazilian Symposium on Robotics(SBR) and 2018 Workshop on Robotics in Education(WRE).IEEE,2018:503-508. [19]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [20]FAN D D,AGHA-MOHAMMADI A,THEODOROU E A.Deep learning tubes for tube mpc[J].arXiv:2002.01587,2020. [21]BIEKER K,PEITZ S,BRUNTON S L,et al.Deep model predictive flow control with limited sensor data and online learning[J].Theoretical and Computational Fluid Dynamics,2020,34(4):577-591. [22]LENZ I,KNEPPER R A,SAXENA A.DeepMPC:Learningdeep latent features for model predictive control[C]//Robotics:Science and Systems.2015. [23]KABZAN J,HEWING L,LINIGER A,et al.Learning-basedmodel predictive control for autonomous racing[J].IEEE Robotics and Automation Letters,2019,4(4):3363-3370. [24]TORRENTE G,KAUFMANN E,FÖHN P,et al.Data-driven MPC for quadrotors[J].IEEE Robotics and Automation Letters,2021,6(2):3769-3776. [25]LAMBERT N O,DREW D S,YACONELLI J,et al.Low-level control of a quadrotor with deep model-based reinforcement learning[J].IEEE Robotics and Automation Letters,2019,4(4):4224-4230. [26]HAARNOJA T,ZHOU A,HARTIKAINEN K,et al.Soft actor-critic algorithms and applications[J].arXiv:1812.05905,2018. [27]JOSHUA A.Spinning Up in Deep Reinforcement Learning[OL].https://github.com/openai/spinningup. |
|