Computer Science ›› 2024, Vol. 51 ›› Issue (9): 223-232.doi: 10.11896/jsjkx.230700131

• Artificial Intelligence • Previous Articles     Next Articles

Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm

YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
  • Received:2023-07-19 Revised:2024-01-19 Online:2024-09-15 Published:2024-09-10
  • About author:YAN Xin,born in 1999,postgraduate.His main research interests include reinforcement learning,autonomous dri-ving and so on.
    HUANG Zhiqiu,born in 1965,Ph.D,professor,is a distinguished member of CCF(No.09028D).His main research interests include software quality assu-rance,system safety,and formal me-thods.
  • Supported by:
    Joint Funds of the National Natural Science Foundation of China(U2241216).

Abstract: Autonomous driving plays a crucial role in reducing traffic congestion and improving driving comfort.It remains of significant research importance to enhance public acceptance of autonomous driving technology.Customizing different driving styles for diverse user needs can aid drivers in understanding autonomous driving behavior,enhancing the overall driving experience,and reducing psychological resistance to using autonomous driving systems.This study proposes a design approach for deep reinforcement learning models based on the proximal policy optimization(PPO) algorithm,focusing on analyzing following behaviors in autonomous driving scenarios.Firstly,a large dataset of vehicle trajectories on German highways(HDD) is analyzed.The driving behaviors are classified based on features such as time headway(THW),distance headway(DHW),vehicle acceleration,and follo-wing speed.Characteristic data for aggressive and conservative driving styles are extracted.On this basis,an encoded reward function reflecting driver styles is developed.Through iterative learning,different driving style deep reinforcement learning models are generated using the PPO algorithm.Simulations are conducted on the highway environment platform.Experimental resultsde-monstrate that the PPO-based driving models with different styles possess the capability to achieve task objectives.Moreover,when compared to traditional intelligent driver model(IDM),these models accurately reflect distinct driving styles in driving behaviors.

Key words: Autonomous driving, Intelligent driving model, Reinforcement learning, Proximal policy optimization, Principal component analysis, K-means

CLC Number: 

  • TP391
[1]WEI J,DOLAN J M,LITKOUHI B.A learning-based autonomous driver:emulate human driver's intelligence in low-speed car following[C]//Unattended Ground,Sea,and Air Sensor Technologies and Applications XII.SPIE,2010,7693:93-104.
[2]KESTING A,TREIBER M,HELBING D.Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity[J].Philosophical Transactions of the Royal Society A:Mathematical,Physical and Engineering Sciences,2010,368(1928):4585-4605.
[3]CAO W,LIU S,LI J,et al.Analysis and design of adaptivecruise control for smart electric vehicle with domain-based poly-service loop delay[J].IEEE Transactions on Industrial Electronics,2022,70(1):866-877.
[4]DARAPANENI N,RAJ P,PADURI A R,et al.Autonomouscar driving using deep learning[C]//2021 2nd International Conference on Secure Cyber Computing and Communications(ICSCCC).IEEE,2021:29-33.
[5]YI L M.Lane change of vehicles based on dqn[C]//2020 5th International Conference on Information Science,Computer Technology and Transportation(ISCTT).IEEE,2020:593-597.
[6]GIPPS P G.A behavioural car-following model for computersimulation[J].Transportation Research(Part B):Methodological,1981,15(2):105-111.
[7]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[8]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[9]WANG J,ZHANG L,ZHANG D,et al.An adaptive longitudinal driving assistance system based on driver characteristics[J].IEEE Transactions on Intelligent Transportation Systems,2012,14(1):1-12.
[10]KRAJEWSKI R,BOCK J,KLOEKER L,et al.The highd dataset:A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems[C]//2018 21st International Conference on Intelligent Transportation Systems(ITSC).IEEE,2018:2118-2125.
[11]HEDRICK J K,TOMIZUKA M,VARAIYA P.Control issues in automated highway systems[J].IEEE Control Systems Maga-zine,1994,14(6):21-32.
[12]GAO H,KAN Z,LI K.Robust lateral trajectory following control of unmanned vehicle based on model predictive control[J].IEEE/ASME Transactions on Mechatronics,2021,27(3):1278-1287.
[13]XIE G,GAO H,QIAN L,et al.Vehicle trajectory prediction by integrating physics-and maneuver-based approaches using interactive multiple models[J].IEEE Transactions on Industrial Electronics,2017,65(7):5999-6008.
[14]GAO H,ZHU J,LI X,et al.Automatic parking control of unmanned vehicle based on switching control algorithm and backstepping[J].IEEE/ASME Transactions on Mechatronics,2020,27(3):1233-1243.
[15]ZHANG J,LI Q Y,LI D,et al.Merging guidance of exclusive lanes for connected and autonomous vehicles based on deep reinforcement learning[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(9):2508-2518.
[16]VARAIYA P.Smart cars on smart roads:problems of control[J].IEEE Transactions on Automatic Control,1993,38(2):195-207.
[17]GAO H,QIN Y,HU C,et al.An interacting multiple model for trajectory prediction of intelligent vehicles in typical road traffic scenario[J].IEEE Transactions on Neural Networks and Lear-ning Systems,2021.
[18]CODEVILLA F,MÜLLER M,LÓPEZ A,et al.End-to-enddriving via conditional imitation learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:4693-4700.
[19]WANG W,XI J,CHEN H.Modeling and recognizing driver behavior based on driving data:A survey[J].Mathematical Pro-blems in Engineering,2014,2014:245611.
[20]KURITA T.Principal component analysis(PCA)[J/OL].HTTPS://DOI.ORG/10.1007/978-3-030-03243-2_649-1.
[21]SHLENS J.A tutorial on principal component analysis[J].ar-Xiv:1404.1100,2014.
[22]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM Computing Surveys(CSUR),1999,31(3):264-323.
[23]AHMED M,SERAJ R,ISLAM S M S.The k-means algorithm:A comprehensive survey and performance evaluation[J].Electronics,2020,9(8):1295.
[24]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
[25]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[26]SAGBERG F,SELPI,BIANCHI PICCININI G F,et al.A review of research on driving styles and road safety[J].Human Factors,2015,57(7):1248-1275.
[27]MURPHEY Y L,MILTON R,KILIARIS L.Driver's styleclassification using jerk analysis[C]//2009 IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems.IEEE,2009:23-28.
[28]DE WAARD D,DIJKSTERHUIS C,BROOKHUIS K A.Merging into heavy motorway traffic by young and elderly drivers[J].Accident Analysis & Prevention,2009,41(3):588-597.
[29]LIU L,LIN J,YAO J,et al.Path planning for smart car based on Dijkstra algorithm and dynamic window approach[J].Wireless Communications and Mobile Computing,2021,2021(1):8881684.
[30]MACADAM C,BAREKET Z,FANCHER P,et al.Using neural networks to identify driving style and headway control behavior of drivers[J].Vehicle System Dynamics,1998,29(S1):143-160.
[31]HELLY W.Simulation of bottlenecks in single-lane traffic flow[J].Theory of Traffic Flow,1959,6(2):207-238.
[32]VAN DER HORST A R A,HOGEMA J H.Time-to-collision and collision avoidance systems[C]//Proceeding of the 6th ICTCT Workshop.1994:59-66.
[33]TREIBER M,HENNECKE A,HELBING D.Congested traffic states in empirical observations and microscopic simulations[J].Physical review E,2000,62(2):1805.
[34]RAO C R.A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance[J].Qüestiió:Quaderns Destadística i Investigació Operativa,1995,19:23-63.
[1] LI Zekai, ZHONG Jiaqing, FENG Shaojun, CHEN Juan, DENG Rongyu, XU Tao, TAN Zhengyuan, ZHOU Kexing, ZHU Pengzhi, MA Zhaoyang. CPU Power Modeling Accuracy Improvement Method Based on Training Set Clustering Selection [J]. Computer Science, 2024, 51(9): 59-70.
[2] WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[3] ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[4] LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271.
[5] ZHANG Jindou, CHEN Jingwei, WU Wenyuan, FENG Yong. Privacy-preserving Principal Component Analysis Based on Homomorphic Encryption [J]. Computer Science, 2024, 51(8): 387-395.
[6] WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326.
[7] WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[8] PENG Bo, LI Yaodong, GONG Xianfu. Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder [J]. Computer Science, 2024, 51(6A): 230700070-5.
[9] LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[10] GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[11] ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5.
[12] HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[13] YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[14] XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.
[15] ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!