计算机科学 ›› 2024, Vol. 51 ›› Issue (9): 223-232.doi: 10.11896/jsjkx.230700131

• 人工智能 • 上一篇    下一篇

基于PPO算法的不同驾驶风格跟车模型研究

闫鑫, 黄志球, 石帆, 徐恒   

  1. 南京航空航天大学计算机科学与技术学院 南京 210016
  • 收稿日期:2023-07-19 修回日期:2024-01-19 出版日期:2024-09-15 发布日期:2024-09-10
  • 通讯作者: 黄志球(zqhuang@nuaa.edu.cn)
  • 作者简介:(yanxinsh@163.com)
  • 基金资助:
    国家自然科学基金联合基金项目(U2241216)

Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm

YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
  • Received:2023-07-19 Revised:2024-01-19 Online:2024-09-15 Published:2024-09-10
  • About author:YAN Xin,born in 1999,postgraduate.His main research interests include reinforcement learning,autonomous dri-ving and so on.
    HUANG Zhiqiu,born in 1965,Ph.D,professor,is a distinguished member of CCF(No.09028D).His main research interests include software quality assu-rance,system safety,and formal me-thods.
  • Supported by:
    Joint Funds of the National Natural Science Foundation of China(U2241216).

摘要: 自动驾驶对于减少交通堵塞、提高驾驶舒适性具有非常重要的作用,如何提高人们对自动驾驶技术的接受程度仍具有重要的研究意义。针对不同需求的人群定制不同的驾驶风格,可以帮助驾驶人理解自动驾驶行为,提高驾驶人的乘车体验,在一定程度上消除驾驶人对使用自动驾驶系统的心理抵抗性。通过分析自动驾驶场景下的跟车行为,提出基于PPO算法的不同驾驶风格的深度强化学习模型设计方案。首先分析德国高速公路车辆行驶数据集(HDD)中大量驾驶行为轨迹,根据跟车时距(THW)、跟车距离(DHW)、行车加速度以及跟车速度特征进行归类,提取激进型的驾驶风格和稳健型的驾驶风格的特征数据,以此为基础编码能够反映驾驶人风格的奖励函数,经过迭代学习生成不同驾驶风格的深度强化学习模型,并在highway env平台上进行道路模拟。实验结果表明,基于PPO算法的不同风格驾驶模型具有完成任务目标的能力,且与传统的智能驾驶模型(IDM)相比,能够在驾驶行为中准确反映出不同的驾驶风格。

关键词: 自动驾驶, 智能驾驶模型, 强化学习, PPO算法, 主成分分析, K-means

Abstract: Autonomous driving plays a crucial role in reducing traffic congestion and improving driving comfort.It remains of significant research importance to enhance public acceptance of autonomous driving technology.Customizing different driving styles for diverse user needs can aid drivers in understanding autonomous driving behavior,enhancing the overall driving experience,and reducing psychological resistance to using autonomous driving systems.This study proposes a design approach for deep reinforcement learning models based on the proximal policy optimization(PPO) algorithm,focusing on analyzing following behaviors in autonomous driving scenarios.Firstly,a large dataset of vehicle trajectories on German highways(HDD) is analyzed.The driving behaviors are classified based on features such as time headway(THW),distance headway(DHW),vehicle acceleration,and follo-wing speed.Characteristic data for aggressive and conservative driving styles are extracted.On this basis,an encoded reward function reflecting driver styles is developed.Through iterative learning,different driving style deep reinforcement learning models are generated using the PPO algorithm.Simulations are conducted on the highway environment platform.Experimental resultsde-monstrate that the PPO-based driving models with different styles possess the capability to achieve task objectives.Moreover,when compared to traditional intelligent driver model(IDM),these models accurately reflect distinct driving styles in driving behaviors.

Key words: Autonomous driving, Intelligent driving model, Reinforcement learning, Proximal policy optimization, Principal component analysis, K-means

中图分类号: 

  • TP391
[1]WEI J,DOLAN J M,LITKOUHI B.A learning-based autonomous driver:emulate human driver's intelligence in low-speed car following[C]//Unattended Ground,Sea,and Air Sensor Technologies and Applications XII.SPIE,2010,7693:93-104.
[2]KESTING A,TREIBER M,HELBING D.Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity[J].Philosophical Transactions of the Royal Society A:Mathematical,Physical and Engineering Sciences,2010,368(1928):4585-4605.
[3]CAO W,LIU S,LI J,et al.Analysis and design of adaptivecruise control for smart electric vehicle with domain-based poly-service loop delay[J].IEEE Transactions on Industrial Electronics,2022,70(1):866-877.
[4]DARAPANENI N,RAJ P,PADURI A R,et al.Autonomouscar driving using deep learning[C]//2021 2nd International Conference on Secure Cyber Computing and Communications(ICSCCC).IEEE,2021:29-33.
[5]YI L M.Lane change of vehicles based on dqn[C]//2020 5th International Conference on Information Science,Computer Technology and Transportation(ISCTT).IEEE,2020:593-597.
[6]GIPPS P G.A behavioural car-following model for computersimulation[J].Transportation Research(Part B):Methodological,1981,15(2):105-111.
[7]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[8]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[9]WANG J,ZHANG L,ZHANG D,et al.An adaptive longitudinal driving assistance system based on driver characteristics[J].IEEE Transactions on Intelligent Transportation Systems,2012,14(1):1-12.
[10]KRAJEWSKI R,BOCK J,KLOEKER L,et al.The highd dataset:A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems[C]//2018 21st International Conference on Intelligent Transportation Systems(ITSC).IEEE,2018:2118-2125.
[11]HEDRICK J K,TOMIZUKA M,VARAIYA P.Control issues in automated highway systems[J].IEEE Control Systems Maga-zine,1994,14(6):21-32.
[12]GAO H,KAN Z,LI K.Robust lateral trajectory following control of unmanned vehicle based on model predictive control[J].IEEE/ASME Transactions on Mechatronics,2021,27(3):1278-1287.
[13]XIE G,GAO H,QIAN L,et al.Vehicle trajectory prediction by integrating physics-and maneuver-based approaches using interactive multiple models[J].IEEE Transactions on Industrial Electronics,2017,65(7):5999-6008.
[14]GAO H,ZHU J,LI X,et al.Automatic parking control of unmanned vehicle based on switching control algorithm and backstepping[J].IEEE/ASME Transactions on Mechatronics,2020,27(3):1233-1243.
[15]ZHANG J,LI Q Y,LI D,et al.Merging guidance of exclusive lanes for connected and autonomous vehicles based on deep reinforcement learning[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(9):2508-2518.
[16]VARAIYA P.Smart cars on smart roads:problems of control[J].IEEE Transactions on Automatic Control,1993,38(2):195-207.
[17]GAO H,QIN Y,HU C,et al.An interacting multiple model for trajectory prediction of intelligent vehicles in typical road traffic scenario[J].IEEE Transactions on Neural Networks and Lear-ning Systems,2021.
[18]CODEVILLA F,MÜLLER M,LÓPEZ A,et al.End-to-enddriving via conditional imitation learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:4693-4700.
[19]WANG W,XI J,CHEN H.Modeling and recognizing driver behavior based on driving data:A survey[J].Mathematical Pro-blems in Engineering,2014,2014:245611.
[20]KURITA T.Principal component analysis(PCA)[J/OL].HTTPS://DOI.ORG/10.1007/978-3-030-03243-2_649-1.
[21]SHLENS J.A tutorial on principal component analysis[J].ar-Xiv:1404.1100,2014.
[22]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM Computing Surveys(CSUR),1999,31(3):264-323.
[23]AHMED M,SERAJ R,ISLAM S M S.The k-means algorithm:A comprehensive survey and performance evaluation[J].Electronics,2020,9(8):1295.
[24]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
[25]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[26]SAGBERG F,SELPI,BIANCHI PICCININI G F,et al.A review of research on driving styles and road safety[J].Human Factors,2015,57(7):1248-1275.
[27]MURPHEY Y L,MILTON R,KILIARIS L.Driver's styleclassification using jerk analysis[C]//2009 IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems.IEEE,2009:23-28.
[28]DE WAARD D,DIJKSTERHUIS C,BROOKHUIS K A.Merging into heavy motorway traffic by young and elderly drivers[J].Accident Analysis & Prevention,2009,41(3):588-597.
[29]LIU L,LIN J,YAO J,et al.Path planning for smart car based on Dijkstra algorithm and dynamic window approach[J].Wireless Communications and Mobile Computing,2021,2021(1):8881684.
[30]MACADAM C,BAREKET Z,FANCHER P,et al.Using neural networks to identify driving style and headway control behavior of drivers[J].Vehicle System Dynamics,1998,29(S1):143-160.
[31]HELLY W.Simulation of bottlenecks in single-lane traffic flow[J].Theory of Traffic Flow,1959,6(2):207-238.
[32]VAN DER HORST A R A,HOGEMA J H.Time-to-collision and collision avoidance systems[C]//Proceeding of the 6th ICTCT Workshop.1994:59-66.
[33]TREIBER M,HENNECKE A,HELBING D.Congested traffic states in empirical observations and microscopic simulations[J].Physical review E,2000,62(2):1805.
[34]RAO C R.A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance[J].Qüestiió:Quaderns Destadística i Investigació Operativa,1995,19:23-63.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!