Computer Science ›› 2021, Vol. 48 ›› Issue (9): 235-243.doi: 10.11896/jsjkx.201000084

• Artificial Intelligence • Previous Articles     Next Articles

Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method

DAI Shan-shan1, LIU Quan1,2,3,4   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China
    3 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China
    4 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China
  • Received:2020-10-16 Revised:2021-03-04 Online:2021-09-15 Published:2021-09-10
  • About author:DAI Shan-shan,born in 1990,postgra-duate candidate.Her main research interests include reinforcement learning,deep reinforcement learning and automatic drive.
    LIU Quan,born in 1969,Ph.D,professor,supervisor.His main research interests include reinforcement learning,deep reinforcement learning and automated reasoning.
  • Supported by:
    National Natural Science Foundation of China(61772355,61702055,61502323,61502329),Jiangsu Province Na-tural Science Research University Major Projects(18KJA520011,17KJA520004),Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University(93K172014K04,93K172017K18),Suzhou Industrial Application of Basic Research Program Part(SYG201422) and A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Abstract: With the development of artificial intelligence,the field of autonomous driving is also growing.The deep reinforcement learning (DRL) method is one of the main research methods in this field.DRL algorithms have been reported to achieve excellent performance in many control tasks.However,the unconstrained exploration in the learning process of DRL usually restricts its application to automatic driving.For example,in common reinforcement learning (RL) algorithms,an agent often has to select an action to execute in each state although this action may result in a crash,deteriorating the performance,or even failing the task.To solve the problem,this paper proposes a new method of action constrained with the soft actor-critic algorithm (CSAC) where the ‘NO-OP'(NO-Option) identifies and replaces inappropriate actions,and we test the algorithm in the lane-keeping tasks.The method firstly limits the environmental reward reasonably.When the rotation angle of the driverless car is too large,it will shake,then a penalty term will be added to the reward function to avoid the driverless car falling into a dangerous state as far as possible.The contributions of this paper are as follows:first,we incorporates action constrained function with SAC algorithm,which achieves faster learning speed and higher stability;second,we propose a reward setting framework that overcomes the shaking and instability of driverless cars,achieving a better performance;finally,we trains the model in the unity virtual environment for evaluating the performance and successfully transplant the model to a donkey driverless car.

Key words: Deep reinforcement learning, Driverless cars, Lane-keeping, Safe automatic driving, Soft actor-critic

CLC Number: 

  • TP181
[1]ORT T,PAULL L,RUS D.Autonomous vehicle navigation in rural environments without detailed prior maps[C]//2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2018:2040-2047.
[2]PENDLETON S D,ANDERSEN H,DU X X,et al.Perception,Planning,Control,and Coordination for Autonomous Vehicles[J].Machines,2017,5(1):6.
[3]CAPORALE D,SETTIMI A,MASSA F,et al.Towards the Design of Robotic Drivers for Full-Scale Self-Driving Racing Cars [C]//2019 International Conference on Robotics and Automation (ICRA).IEEE,2019:5643-5649.
[4]ZHUANG L,ZHANG Z,WANG L.The automatic segmentation of residential solar panels based on satellite images:A cross learning driven U-Net method[J].Applied Soft Computing,2020,92:106283.
[5]VEDDER B,SVENSSON B J,VINTER J,et al.AutomatedTesting of Ultrawideband Positioning for Autonomous Driving[J].Journal of Robotics,2020,2020:1-15.
[6]BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to End Learning for Self-Driving Cars[J].arXiv:1604.07316,2016.
[7]XU H,GAO Y,YU F,et al.End-to-End Learning of Driving Models from LargeScale Video Datasets [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017:2174-2182.
[8]CHEN L,WANG Q,LU X,et al.Learning Driving ModelsFrom Parallel End-to-End Driving Data Set[J].Proceedings of the IEEE,2020,108(2):262-273.
[9]CODEVILLA F,MULLER M.End-to-end driving via conditio-nal imitation learning[C]//2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2018:4693-4700.
[10]SUTTOM R S,BARTO A G.Reinforcement learning:An introduction [M].MIT Press,1998.
[11]MAXIMILIAN J,RAOUL D,MARIN T,et al.End-to-EndRace Driving with Deep Reinforcement Learning[C]//International Conference on Robotics and Automation (ICRA).IEEE,2018:2070-2075.
[12]KENDALL A,HAWKE J,JANZ D,et al.Learning to Drive in a Day [C]//2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2019:8248-8254.
[13]TOROMANOFF M,WIRBEL E,MOUTAR-DE F.End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:7153-7162.
[14]CHEN S,WANG M,SONG W,et al.Stabilization Approaches for Reinforcement Learning-Based End-to-End Autonomous Driving[J].IEEE Transactions on Vehicular Technology,2020,69(5):4740-4750.
[15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[16]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [C]//International Conference on Machine Learning ICML.2018.
[17]SHI W,SONG S,WU C.Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning [C]//International Joint Conference on Artificial Intelligence (IJCAI).2019.
[18]ZHU F,WU W,FU Y C,et al.Security depth reinforcementlearning method based on double depth network[J].Acta Computerica,2019,42(8).
[19]GARCI A J,FERNÁNDEZ F.A comprehensive survey on safe reinforcement learning[J].Journal of Machine Learning Research,2015,16(1):1437-1480.
[20]GARCIA J,FERNANDEZ F.Safe Exploration of State and Action Spaces in Reinforcement Learning[J].Journal of Artificial Intelligence Research,2014,45(1).
[21]BERKENKARMP F,TURCHETTA M,SCHOELLIG A P,et al.Safe model-based reinforcement learning with stability guarantees[J].arXiv:1705.08551,2017.
[22]MAZUMDER S,LIU B,WANG S,et al.Action permissibility in deep reinforcement learning and application to autonomous dri-ving[C]//KDD'18 Deep Learning Day.2018.
[23]LIU Q,ZHAI J W,ZHANG Z,et al.A review of deep reinforcement learning[J].Acta Computerica Sinica,2018,41(1):1-27.
[24]LEE K,SAIGOL K,THEODOROU E A.Early Failure Detection of Deep End-to-End Control Policy by Reinforcement Learning[C]//2019 International Conference on Robotics and Automation (ICRA).IEEE,2019.
[25]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596.
[26]ZIEBART B D,MAAS A L,BAGNELL J A,et al.Maximum entropy inverse reinforcement learning[C]//AAAI Conference on Artificial Intelligence (AAAI).2008:1433-1438.
[27]LEVINE S,FINN C,DARRELL T,et al.End-to-End Training of Deep Visuomotor Policies[J].Journal of Machine Learning Research,2015,17(1):1334-1373.
[28]O'DONOGHUE B,MUNOS R,KAVUKCUOGLU K,et al.PGQ:Combining policy gradient and Q-learning[J].arXiv:1611.01626,2016.
[29]NACHUM O,NOROUZI M,XU K,et al.Bridging the gap between value and policy based reinforcement learning[C]//Ad-vances in Neural Information Processing Systems (NIPS).2017:2772-2782.
[30]HAARNOJA T,TANG H,ABBEEL P,et al.Reinforcementlearning with deep energy-based policies[C]//International Conference on Machine Learning (ICML).2017:1352-1361.
[31]MINK J W.The basal ganglia:focused selection and inhibition of competing motor programs[J].Progress in Neurobiology,1996,50(4):381-425.
[32]LIPTON Z C,AZIZZADENESHELI K,KUMAR A,et al.Combating reinforcement learning's sisyphean curse with intrinsic fear[J].arXiv:1611.01211,2016.
[33]AGARWAL A,ABHINAU K V,DUNOVAN K,et al.BetterSafe than Sorry:Evidence Accumulation Allows for Safe Reinforcement Learning[J].arXiv:1809.09147,2018.
[34]REN J,MCLSAAC K A,PATEL R V,et al.A potential field model using generalized sigmoid functions[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2007,37(2):477-484.
[35]GOMES G S,LUDERMIR T B.Complementary log-log andprobit:activation functions implemented in artificial neural networks[C]//2008 Eighth International Conference on Hybrid Intelligent Systems.IEEE,2008:939-942.
[36]SCHULMAN J,ABBEEL P,CHEN X.Equivalence betweenpolicy gradients and soft Q-learning[J].arXiv:1704.06440,2017a.
[37]CHEN Z,HUANG X.End-to-end learning for lane keeping of self-driving cars[C]//2017 IEEE Intelligent Vehicles Sympo-sium (IV).IEEE,2017.
[38]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Seg-net:A deep convolutional encoder-decoder architecture for scene segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[39]SILVER D,HUANG A,MADDISON C J A,et al.Masteringthe game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[40]SILVER D,HUBERT T,SCHRITTWIESER I J,et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[C]//CoRR.2017.
[1] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[2] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[3] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[4] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[5] ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[6] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[7] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[8] CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[9] ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju. Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning [J]. Computer Science, 2021, 48(7): 40-46.
[10] LI Bei-bei, SONG Jia-rui, DU Qing-yun, HE Jun-jiang. DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things [J]. Computer Science, 2021, 48(7): 47-54.
[11] LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing [J]. Computer Science, 2021, 48(7): 316-323.
[12] WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[13] FAN Yan-fang, YUAN Shuang, CAI Ying, CHEN Ruo-yu. Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing [J]. Computer Science, 2021, 48(5): 270-276.
[14] FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50.
[15] HUANG Zhi-yong, WU Hao-lin, WANG Zhuang, LI Hui. DQN Algorithm Based on Averaged Neural Network Parameters [J]. Computer Science, 2021, 48(4): 223-228.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!