计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 235-243.doi: 10.11896/jsjkx.201000084

• 人工智能 • 上一篇    下一篇

基于动作约束深度强化学习的安全自动驾驶方法

代珊珊1, 刘全1,2,3,4   

  1. 1 苏州大学计算机科学与技术学院 江苏 苏州215006
    2 苏州大学江苏省计算机信息处理技术重点实验室 江苏 苏州215006
    3 吉林大学符号计算与知识工程教育部重点实验室 长春130012
    4 软件新技术与产业化协同创新中心 南京210000
  • 收稿日期:2020-10-16 修回日期:2021-03-04 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 刘全(liuquan@suda.edu.cn)
  • 作者简介:20185427004@stu.suda.edu.cn
  • 基金资助:
    国家自然科学基金(61772355,61702055,61502323,61502329);江苏省高等学校自然科学研究重大项目(18KJA520011,17KJA520004);吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04,93K172017K18);苏州市应用基础研究计划工业部分(SYG201422);江苏高校优势学科建设工程资助项目

Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method

DAI Shan-shan1, LIU Quan1,2,3,4   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China
    3 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China
    4 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China
  • Received:2020-10-16 Revised:2021-03-04 Online:2021-09-15 Published:2021-09-10
  • About author:DAI Shan-shan,born in 1990,postgra-duate candidate.Her main research interests include reinforcement learning,deep reinforcement learning and automatic drive.
    LIU Quan,born in 1969,Ph.D,professor,supervisor.His main research interests include reinforcement learning,deep reinforcement learning and automated reasoning.
  • Supported by:
    National Natural Science Foundation of China(61772355,61702055,61502323,61502329),Jiangsu Province Na-tural Science Research University Major Projects(18KJA520011,17KJA520004),Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University(93K172014K04,93K172017K18),Suzhou Industrial Application of Basic Research Program Part(SYG201422) and A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

摘要: 随着人工智能的发展,自动驾驶领域的研究也日益壮大。深度强化学习(Deep Reinforcement Learning,DRL)方法是该领域的主要研究方法之一。其中,安全探索问题是该领域的一个研究热点。然而,大部分DRL算法为了提高样本的覆盖率并没有对探索方法进行安全限制,使无人车探索时会陷入某些危险状态,从而导致学习失败。针对该问题,提出了一种基于动作约束的软行动者-评论家算法(Constrained Soft Actor-critic,CSAC),该方法首先对环境奖赏进行了合理限制。无人车动作转角过大时会产生抖动,因此在奖赏函数中加入惩罚项,使无人车尽量避免陷入危险状态。另外,CSAC方法又对智能体的动作进行了约束。当目前状态选择动作后使无人车偏离轨道或者发生碰撞时,标记该动作为约束动作,在之后的训练中通过合理约束来更好地指导无人车选择新动作。为了体现CSAC方法的优势,将CSAC方法应用在自动驾驶车道保持任务中,并与SAC算法进行对比。结果表明,引入安全机制的CSAC方法可以有效避开不安全动作,提高自动驾驶过程中的稳定性,同时还加快了模型的训练速度。最后,将训练好的模型移植到带有树莓派的无人车上,进一步验证了模型的泛用性。

关键词: 安全自动驾驶, 车道保持, 软行动者-评论家, 深度强化学习, 无人车

Abstract: With the development of artificial intelligence,the field of autonomous driving is also growing.The deep reinforcement learning (DRL) method is one of the main research methods in this field.DRL algorithms have been reported to achieve excellent performance in many control tasks.However,the unconstrained exploration in the learning process of DRL usually restricts its application to automatic driving.For example,in common reinforcement learning (RL) algorithms,an agent often has to select an action to execute in each state although this action may result in a crash,deteriorating the performance,or even failing the task.To solve the problem,this paper proposes a new method of action constrained with the soft actor-critic algorithm (CSAC) where the ‘NO-OP'(NO-Option) identifies and replaces inappropriate actions,and we test the algorithm in the lane-keeping tasks.The method firstly limits the environmental reward reasonably.When the rotation angle of the driverless car is too large,it will shake,then a penalty term will be added to the reward function to avoid the driverless car falling into a dangerous state as far as possible.The contributions of this paper are as follows:first,we incorporates action constrained function with SAC algorithm,which achieves faster learning speed and higher stability;second,we propose a reward setting framework that overcomes the shaking and instability of driverless cars,achieving a better performance;finally,we trains the model in the unity virtual environment for evaluating the performance and successfully transplant the model to a donkey driverless car.

Key words: Deep reinforcement learning, Driverless cars, Lane-keeping, Safe automatic driving, Soft actor-critic

中图分类号: 

  • TP181
[1]ORT T,PAULL L,RUS D.Autonomous vehicle navigation in rural environments without detailed prior maps[C]//2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2018:2040-2047.
[2]PENDLETON S D,ANDERSEN H,DU X X,et al.Perception,Planning,Control,and Coordination for Autonomous Vehicles[J].Machines,2017,5(1):6.
[3]CAPORALE D,SETTIMI A,MASSA F,et al.Towards the Design of Robotic Drivers for Full-Scale Self-Driving Racing Cars [C]//2019 International Conference on Robotics and Automation (ICRA).IEEE,2019:5643-5649.
[4]ZHUANG L,ZHANG Z,WANG L.The automatic segmentation of residential solar panels based on satellite images:A cross learning driven U-Net method[J].Applied Soft Computing,2020,92:106283.
[5]VEDDER B,SVENSSON B J,VINTER J,et al.AutomatedTesting of Ultrawideband Positioning for Autonomous Driving[J].Journal of Robotics,2020,2020:1-15.
[6]BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to End Learning for Self-Driving Cars[J].arXiv:1604.07316,2016.
[7]XU H,GAO Y,YU F,et al.End-to-End Learning of Driving Models from LargeScale Video Datasets [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017:2174-2182.
[8]CHEN L,WANG Q,LU X,et al.Learning Driving ModelsFrom Parallel End-to-End Driving Data Set[J].Proceedings of the IEEE,2020,108(2):262-273.
[9]CODEVILLA F,MULLER M.End-to-end driving via conditio-nal imitation learning[C]//2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2018:4693-4700.
[10]SUTTOM R S,BARTO A G.Reinforcement learning:An introduction [M].MIT Press,1998.
[11]MAXIMILIAN J,RAOUL D,MARIN T,et al.End-to-EndRace Driving with Deep Reinforcement Learning[C]//International Conference on Robotics and Automation (ICRA).IEEE,2018:2070-2075.
[12]KENDALL A,HAWKE J,JANZ D,et al.Learning to Drive in a Day [C]//2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2019:8248-8254.
[13]TOROMANOFF M,WIRBEL E,MOUTAR-DE F.End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7153-7162.
[14]CHEN S,WANG M,SONG W,et al.Stabilization Approaches for Reinforcement Learning-Based End-to-End Autonomous Driving[J].IEEE Transactions on Vehicular Technology,2020,69(5):4740-4750.
[15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[16]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [C]//International Conference on Machine Learning ICML.2018.
[17]SHI W,SONG S,WU C.Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning [C]//International Joint Conference on Artificial Intelligence (IJCAI).2019.
[18]ZHU F,WU W,FU Y C,et al.Security depth reinforcementlearning method based on double depth network[J].Acta Computerica,2019,42(8).
[19]GARCI A J,FERNÁNDEZ F.A comprehensive survey on safe reinforcement learning[J].Journal of Machine Learning Research,2015,16(1):1437-1480.
[20]GARCIA J,FERNANDEZ F.Safe Exploration of State and Action Spaces in Reinforcement Learning[J].Journal of Artificial Intelligence Research,2014,45(1).
[21]BERKENKARMP F,TURCHETTA M,SCHOELLIG A P,et al.Safe model-based reinforcement learning with stability guarantees[J].arXiv:1705.08551,2017.
[22]MAZUMDER S,LIU B,WANG S,et al.Action permissibility in deep reinforcement learning and application to autonomous dri-ving[C]//KDD'18 Deep Learning Day.2018.
[23]LIU Q,ZHAI J W,ZHANG Z,et al.A review of deep reinforcement learning[J].Acta Computerica Sinica,2018,41(1):1-27.
[24]LEE K,SAIGOL K,THEODOROU E A.Early Failure Detection of Deep End-to-End Control Policy by Reinforcement Learning[C]//2019 International Conference on Robotics and Automation (ICRA).IEEE,2019.
[25]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596.
[26]ZIEBART B D,MAAS A L,BAGNELL J A,et al.Maximum entropy inverse reinforcement learning[C]//AAAI Conference on Artificial Intelligence (AAAI).2008:1433-1438.
[27]LEVINE S,FINN C,DARRELL T,et al.End-to-End Training of Deep Visuomotor Policies[J].Journal of Machine Learning Research,2015,17(1):1334-1373.
[28]O'DONOGHUE B,MUNOS R,KAVUKCUOGLU K,et al.PGQ:Combining policy gradient and Q-learning[J].arXiv:1611.01626,2016.
[29]NACHUM O,NOROUZI M,XU K,et al.Bridging the gap between value and policy based reinforcement learning[C]//Ad-vances in Neural Information Processing Systems (NIPS).2017:2772-2782.
[30]HAARNOJA T,TANG H,ABBEEL P,et al.Reinforcementlearning with deep energy-based policies[C]//International Conference on Machine Learning (ICML).2017:1352-1361.
[31]MINK J W.The basal ganglia:focused selection and inhibition of competing motor programs[J].Progress in Neurobiology,1996,50(4):381-425.
[32]LIPTON Z C,AZIZZADENESHELI K,KUMAR A,et al.Combating reinforcement learning's sisyphean curse with intrinsic fear[J].arXiv:1611.01211,2016.
[33]AGARWAL A,ABHINAU K V,DUNOVAN K,et al.BetterSafe than Sorry:Evidence Accumulation Allows for Safe Reinforcement Learning[J].arXiv:1809.09147,2018.
[34]REN J,MCLSAAC K A,PATEL R V,et al.A potential field model using generalized sigmoid functions[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2007,37(2):477-484.
[35]GOMES G S,LUDERMIR T B.Complementary log-log andprobit:activation functions implemented in artificial neural networks[C]//2008 Eighth International Conference on Hybrid Intelligent Systems.IEEE,2008:939-942.
[36]SCHULMAN J,ABBEEL P,CHEN X.Equivalence betweenpolicy gradients and soft Q-learning[J].arXiv:1704.06440,2017a.
[37]CHEN Z,HUANG X.End-to-end learning for lane keeping of self-driving cars[C]//2017 IEEE Intelligent Vehicles Sympo-sium (IV).IEEE,2017.
[38]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Seg-net:A deep convolutional encoder-decoder architecture for scene segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[39]SILVER D,HUANG A,MADDISON C J A,et al.Masteringthe game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[40]SILVER D,HUBERT T,SCHRITTWIESER I J,et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[C]//CoRR.2017.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[3] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[4] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[5] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[6] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[7] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[8] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[9] 成昭炜, 沈航, 汪悦, 王敏, 白光伟.
基于深度强化学习的无人机辅助弹性视频多播机制
Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast
计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[10] 梁俊斌, 张海涵, 蒋婵, 王天舒.
移动边缘计算中基于深度强化学习的任务卸载研究进展
Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing
计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
[11] 王英恺, 王青山.
能量收集无线通信系统中基于强化学习的能量分配策略
Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting
计算机科学, 2021, 48(7): 333-339. https://doi.org/10.11896/jsjkx.201100154
[12] 周仕承, 刘京菊, 钟晓峰, 卢灿举.
基于深度强化学习的智能化渗透测试路径发现
Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning
计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
[13] 李贝贝, 宋佳芮, 杜卿芸, 何俊江.
DRL-IDS:基于深度强化学习的工业物联网入侵检测系统
DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things
计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
[14] 范艳芳, 袁爽, 蔡英, 陈若愚.
车载边缘计算中基于深度强化学习的协同计算卸载方案
Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing
计算机科学, 2021, 48(5): 270-276. https://doi.org/10.11896/jsjkx.201000005
[15] 范家宽, 王皓月, 赵生宇, 周添一, 王伟.
数据驱动的开源贡献度量化评估与持续优化方法
Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions
计算机科学, 2021, 48(5): 45-50. https://doi.org/10.11896/jsjkx.201000107
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!