基于深度强化学习的无信号灯交叉路口车辆控制

doi:10.11896/jsjkx.210700010

计算机科学 ›› 2022, Vol. 49 ›› Issue (3): 46-51.doi: 10.11896/jsjkx.210700010

• 新兴分布式计算技术与系统^* • 上一篇下一篇

基于深度强化学习的无信号灯交叉路口车辆控制

欧阳卓¹, 周思源^1,2, 吕勇¹, 谭国平^1,2, 张悦¹, 项亮亮¹

1 河海大学计算机与信息学院南京211100
2 江苏智能交通及智能驾驶研究院南京210019

收稿日期:2021-07-01 修回日期:2021-08-28 出版日期:2022-03-15 发布日期:2022-03-15
通讯作者: 谭国平(gptan@hhu.edu.cn)
作者简介:(191307020022@hhu.edu.cn)
基金资助:
国家自然科学基金(61701168,61832005);中国博士后科研基金(2019M651546);江苏省交通运输厅重大科技项目(2019Z07)

DRL-based Vehicle Control Strategy for Signal-free Intersections

OUYANG Zhuo¹, ZHOU Si-yuan^1,2, LYU Yong¹, TAN Guo-ping^1,2, ZHANG Yue¹, XIANG Liang-liang¹

1 School of Computer and Information,Hohai University,Nanjing 211100,China
2 Jiangsu Intelligent Transportation and Intelligent Driving Research Institute,Nanjing 210019,China

Received:2021-07-01 Revised:2021-08-28 Online:2022-03-15 Published:2022-03-15
About author:OUYANG Zhuo,born in 1995,postgra-duate.His main research interests include wireless communication theory and cooperative communications.
TAN Guo -ping,born in 1975,Ph.D,professor,Ph.D supervisor.His main research interests include Internet of vehicles,mobile edge computing,and wireless distributed machine learning.
Supported by:
National Natural Science Foundation of China(61701168,61832005),China Postdoctoral Science Funded Project(2019M651546) and Major Technological Projects of Jiangsu Province Transportations Department(2019Z07).

摘要/Abstract

摘要： 利用深度强化学习技术实现无信号灯交叉路口车辆控制是智能交通领域的研究热点。现有研究存在无法适应自动驾驶车辆数量动态变化、训练收敛慢、训练结果只能达到局部最优等问题。文中研究在无信号灯交叉路口,自动驾驶车辆如何利用分布式深度强化方法来提升路口的通行效率。首先,提出了一种高效的奖励函数,将分布式强化学习算法应用到无信号灯交叉路口场景中,使得车辆即使无法获取整个交叉路口的状态信息,只依赖局部信息也能有效提升交叉路口的通行效率。然后,针对开放交叉路口场景中强化学习方法训练效率低的问题,使用了迁移学习的方法,将封闭的8字型场景中训练好的策略作为暖启动,在无信号灯交叉路口场景继续训练,提升了训练效率。最后,提出了一种可以适应所有自动驾驶车辆比例的策略,此策略在任意比例自动驾驶车辆的场景中均可提升交叉路口的通行效率。在仿真平台Flow上对TD3强化学习算法进行了验证,实验结果表明,改进后的算法训练收敛快,能适应自动驾驶车辆比例的动态变化,能有效提升路口的通行效率。

关键词: V2X, 深度强化学习, 无信号灯交叉路口, 自动驾驶

Abstract: Using deep learning technology to control vehicles at intersections is a research hotspot in the field of intelligent transportation.Previous studies suffer from the inability to adapt to dynamic changes in the number of self-driving vehicles,slow convergence of training,and locally optimal training results.This work focuses on how autonomous vehicles can use distributed deep reinforcement methods to improve the efficiency of intersections at unsignalized intersections.First,an efficient reward function is proposed to apply the distributed reinforcement learning algorithm to the unsignalized intersection scenario,which can effectively improve the efficiency of intersection passage by relying on only local information even if the vehicle cannot obtain the whole intersection state information.Then,to address the problem of inefficient training of reinforcement learning methods in open intersection scenarios,a transfer learning approach is used to improve the training efficiency by using the trained strategy in the closed figure-of-eight scenario as a warm start and continuing the training in the unsignalized intersection scenario.Finally,this paper proposes a strategy that can be adapted to all proportions of autonomous vehicles,and this strategy can improve intersection access efficiency in scenarios with any proportion of autonomous vehicles.The algorithm is validated on the simulation platform Flow,and the experimental results show that the proposed smart body model converges quickly in training,can adapt to dynamic changes in the proportion of self-driving vehicles,and can effectively improve the efficiency of intersections.

Key words: Autonomous vehicles, Deep reinforcement learning, Signal-free intersections, V2X

中图分类号:

TP391

欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制[J]. 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010

OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections[J]. Computer Science, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010

参考文献

[1]MA M,LI Z.A time-independent trajectory optimization ap-proach for connected and auto-nomous vehicles under reservation-based inte-rsection control[J].Transportation Research Interdisciplinary Perspectives,2021,9(5):100312.
[2]LV P,HE Y B,XU J.An Improved Trust Evaluation Model Based on Bayesian for WSNs[J].Acta Electronica Sinica,2021,49(5):912-919.
[3]RIOS -TORRES J,MALIKOPOULOS A A.Automated andCooperative Vehicle Merging at Highway On-Ramps[J].IEEE Transactions on Intelligent Transportation Systems,2016,18(4):1-10.
[4]WANG Z,KIM B G,KOBAYASHI H,et al.Agent-Based Mo-deling and Simulation of Connected and Automated Vehicles Using Game Engine:A Cooperative On-Ramp Merging Study[J].arXiv:1810.09952,2018.
[5]MAITLAND A,MCPHEE J.Quasi-translations for fast hybrid nonlinear model predictive control[J].Control Engineering Practice,2020,97(4):104352.1-104352.9.
[6]DING J,LI L,PENG H,et al.A Rule-Based Cooperative Merging Strategy for Connected and Automated Vehicles[J].IEEE Transactions on Intelligent Transportation Systems,2020,21(8):3436-3446.
[7]XIONG L,KANG Y C,ZHANG P Z,et al.Research on beha-vior decision-making system for unmanned vehicle[J].Automobile Technology,2018,515(8):1-9.
[8]KAMRAN D,LOPEZ C,LAUER M,et al.Risk-aware high-level decisions for automated driving at occluded intersections with reinfor-cement learning[J].arXiv:2004.04450,2020.
[9]ISELE D,RAHIMI R,COSGUN A,et al.Navigating occluded intersections with autonomous vehicles using deep reinforcement learning[C]//2018 IEEE ICRA.Brisbane:IEEE,2018:2034-2039.
[10]XU G Y,ZONG X P,YU G Z,et al.A research on intelligent obstacle avoidance of unmanned vehicle based on DDPG algorithm[J].Automotive Engineering,2019,41(2):206-212.
[11]ZHANG B,HE M,CHEN X L,et al.Self-driving via improved DDPG algorithm[J].Computer Engineering and Applications,2019,55(10):264-270.
[12]DAI S S,LIU Q.Action Constrained Deep ReinforcementLearning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243.
[13]SUN C Y,MU C X.Important scientific probems of multi-agent deep reinforcement learning[J].Acta Automatica Sinica,2020,46(7):1301-1312.
[14]SUN H,CHEN C L,LIU Q,et al.Constrained Deep Reinforcement Learning Based Safe A-utomatic Driving Method[J].Computer Science,2020,47(2):169-174.
[15]WEI H,LIU X,MASHAYEKHY L,et al.Mixed-AutonomyTraffic Control with Proximal Policy Optimization[C]//2019 IEEE Vehicular Networking Conference (VNC).IEEE,2019.
[16]VINITSKY E,LICHTLE N,PARVATE K,et al.OptimizingMixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL[J].arXiv:2011.00120,2020.
[17]CHEN D,LI Z J,WANG Y Q,et al.Deep Multi-agent Rein-forcement Learning for High-way On-Ramp Merging in Mixed Traffic[J].arXiv:2105.05701v1,2021.
[18]TRAN D Q,BAE S H.Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection[J].Applied Sciences,2020,10(16):5722.
[19]TREIBER M,HENNECKE A,HELBING D.Congested traffic states in empirical observations and microscopic simulations[J].Physical Review E,2000,62(2):1805.
[20]CUI J,MACKE W,YEDIDSION H,et al.Scalable MultiagentDriving Policies For Reducing Traffic Congestion[J].arXiv:2103.00058,2021.
[21]WU C,KREIDIEH A,PARVATE K,et al.Flow:A Modular Learning Framework for Autonomy in Traffic[J].arXiv:1710.05465v2,2007.
[22]LIANG E,LIAW R,NISHIHARA R,et al.Ray RLLib:A Composable and Scalable Reinforcement Learning Library[J].arXiv:1712.09381,2017.
[23]KRAJZEWICZ D,ERDMANN J,BEHRISCH M,et al.Recent Development and Applications of SUMO Simulation of Urban MObility[J].International Journal on Advances in Systems and Measurements,2012,12(3/4/5):128-138.

相关文章 15

[1]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[3]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[4]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[5]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[6]	李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[7]	代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[8]	成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[9]	梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展 Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing 计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
[10]	王英恺, 王青山. 能量收集无线通信系统中基于强化学习的能量分配策略 Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting 计算机科学, 2021, 48(7): 333-339. https://doi.org/10.11896/jsjkx.201100154
[11]	周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现 Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning 计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
[12]	李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统 DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things 计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
[13]	曾伟良, 陈漪皓, 姚若愚, 廖睿翔, 孙为军. 时空图注意力网络在交叉口车辆轨迹预测的应用 Application of Spatial-Temporal Graph Attention Networks in Trajectory Prediction for Vehicles at Intersections 计算机科学, 2021, 48(6A): 334-341. https://doi.org/10.11896/jsjkx.200800066
[14]	范家宽, 王皓月, 赵生宇, 周添一, 王伟. 数据驱动的开源贡献度量化评估与持续优化方法 Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions 计算机科学, 2021, 48(5): 45-50. https://doi.org/10.11896/jsjkx.201000107
[15]	范艳芳, 袁爽, 蔡英, 陈若愚. 车载边缘计算中基于深度强化学习的协同计算卸载方案 Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing 计算机科学, 2021, 48(5): 270-276. https://doi.org/10.11896/jsjkx.201000005

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于深度强化学习的无信号灯交叉路口车辆控制

DRL-based Vehicle Control Strategy for Signal-free Intersections

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0