计算机科学 ›› 2020, Vol. 47 ›› Issue (2): 169-174.doi: 10.11896/jsjkx.190600154

• 人工智能 • 上一篇    下一篇

基于深度强化学习的交通信号控制方法

孙浩,陈春林,刘琼,赵佳宝   

  1. (南京大学控制与系统工程系 南京210093)
  • 收稿日期:2019-03-25 出版日期:2020-02-15 发布日期:2020-03-18
  • 通讯作者: 赵佳宝(jbzhao@nju.edu.cn)
  • 基金资助:
    国家自然科学基金(71732003);国家重点研发项目(2016YFD0702100)

Traffic Signal Control Method Based on Deep Reinforcement Learning

SUN Hao,CHEN Chun-lin,LIU Qiong,ZHAO Jia-bao   

  1. (Department of Control and Systems Engineering,Nanjing University,Nanjing 210093,China)
  • Received:2019-03-25 Online:2020-02-15 Published:2020-03-18
  • About author:SUN Hao,born in 1996,postgraduate.His main research interests include deep learning and reinforcement lear-ning;ZHAO Jia-bao,born in 1972,Ph.D,associate professor.His main research interests include coordination and control methods for CAVs and knowledge automation in AIOps (Artificial Intelligence for IT Operations).
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (71732003) and National Key Research and Development Program of China (2016YFD0702100).

摘要: 交通信号的智能控制是智能交通研究中的热点问题。为更加及时有效地自适应协调交通,文中提出了一种基于分布式深度强化学习的交通信号控制模型,采用深度神经网络框架,利用目标网络、双Q网络、价值分布提升模型表现。将交叉路口的高维实时交通信息离散化建模并与相应车道上的等待时间、队列长度、延迟时间、相位信息等整合作为状态输入,在对相位序列及动作、奖励做出恰当定义的基础上,在线学习交通信号的控制策略,实现交通信号Agent的自适应控制。为验证所提算法,在SUMO(Simulation of Urban Mobility)中相同设置下,将其与3种典型的深度强化学习算法进行对比。实验结果表明,基于分布式的深度强化学习算法在交通信号Agent的控制中具有更好的效率和鲁棒性,且在交叉路口车辆的平均延迟、行驶时间、队列长度、等待时间等方面具有更好的性能表现。

关键词: 分布式强化学习, 交通信号控制, 深度强化学习, 智能交通

Abstract: The control of traffic signals is always a hotspot in intelligent transportation systems research.In order to adapt and coordinate traffic more timely and effectively,a novel traffic signal control algorithm based on distributional deep reinforcement learning was proposed.The model utilizes a deep neural network framework composed of target network,double Q network and value distribution to improve the performance.After integrating the discretization of the high-dimensional real-time traffic information at intersections with waiting time,queue length,delay time and phase information as states and making appropriate definitions of actions,rewards in the algorithm,it can learn the control strategy of traffic signals online and realize the adaptive control of traffic signals.It was compared with three typical deep reinforcement learning algorithms,and the experiments were performed in SUMO (Simulation of Urban Mobility) with the same setting.The results show that the distributional deep reinforcement learning algorithm is more efficient and robust,and has better performance on average delay,travel time,queue length,and wai-ting time of vehicles.

Key words: Deep reinforcement learning, Distributional reinforcement learning, Intelligent transportation, Traffic signal control

中图分类号: 

  • TP181
[1]SUTTON R S,BARTO A G.Introduction to reinforcement learning[M].Cambridge:MIT Press,1998.
[2]BELLEMARE M G,DABNEY W,MUNOS R.A distributionalperspective on reinforcement learning[C]∥Proceedings of the 34th International Conference on Machine Learning.JMLR.org,2017:449-458.
[3]CHIS S.Adaptive traffic signal control using fuzzy logic[C]∥Proceedings of the Intelligent Vehicles92 Symposium.IEEE,1992:98-107.
[4]PANDIT K,GHOSAL D,ZHANG H M,et al.Adaptive traffic signal control with vehicular ad hoc networks[J].IEEE Transactions on Vehicular Technology,2013,62(4):1459-1471.
[5]LIN W H,WANG C.An enhanced 0-1 mixed-integer LP formulation for traffic signal control[J].IEEE Transactions on Intelligent transportation systems,2004,5(4):238-245.
[6]PRASHANTH L A,BHATNAGAR S.Reinforcement learning with function approximation for traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2010,12(2):412-421.
[7]GIRIANNA M,BENEKOHAL R F.Using genetic algorithms to design signal coordination for oversaturated networks[J].Journal of Intelligent Transportation Systems,2004,8(2):117-129.
[8]SANCHEZ-MEDINA J J,GALAN-MORENO M J,RUBIO-ROYO E.Traffic signal optimization in “La Almozara” district in Saragossa under congestion conditions,using genetic algorithms,traffic microsimulation,and cluster computing[J].IEEE Transactions on Intelligent Transportation Systems,2009,11(1):132-141.
[9]YU X H,RECKER W.Stochastic adaptive control model for traffic signal systems[J].Transportation Research Part C:Emerging Technologies,2006,14(4):263-282.
[10]GOKULAN B P,SRINIVASAN D.Distributed geometric fuzzy multi agent urban traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2010,11(3):714-727.
[11]BOWLING M.Multi agent learning in the presence of agents with limitations[R].Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science,2003.
[12]PRASHANTH L,BHATNAGAR S.Threshold tuning using stochastic optimization for graded signal control[J].IEEE Transactions on Vehicular Technology,2012,61(9):3865-3880.
[13]LIU W,QIN G,HE Y,et al.Distributed cooperative reinforce-ment learning-based traffic signal control that integrates v2x networks’ dynamic clustering[J].IEEE Transactions onVehi-cular Technology,2017,66(10):8667-8681.
[14]GENDERS W,RAZAVI S.Using a deep reinforcement learning agent for traffic signal control[J].arXiv:1611.01142.
[15]El-TANTAWY S,ABDULHAI B,ABDELGAWAD H.Multi agent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC):methodology and large-scale application on downtown Toronto[J].IEEE Transactions on Intelligent Transportation Systems,2013,14(3):1140-1150.
[16]WIERING M A.Multi-agent reinforcement learning for traffic light control[C]∥Machine Learning:Proceedings of the Seventeenth International Conference (ICML’2000).2000:1151-1158.
[17]WIERING M,VREEKEN J,VAN VEENEN J,et al.Simulation and optimization of traffic in a city[C]∥IEEE Intelligent Vehicles Symposium,2004.IEEE,2004:453-458.
[18]MARSETIC R,SEMROV D,ZURA M.Road artery traffic light optimization with use of the reinforcement learning[J].PROMET-Traffic & Transportation,2014,26(2):101-108.
[19]PUTERMAN M L.Markov Decision Processes:Discrete Sto-chastic Dynamic Programming[M].John Wiley & Sons,2014.
[20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[21]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥Association for the Advance of Artificial Intelligence.2016:2094-2100.
[22]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥Proceedings of the 4th International Con-ference on Learning Representations.San Juan,Puerto Rico,2016:322-355.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[3] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[4] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[5] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[6] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[7] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[8] 代珊珊, 刘全.
基于动作约束深度强化学习的安全自动驾驶方法
Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method
计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[9] 成昭炜, 沈航, 汪悦, 王敏, 白光伟.
基于深度强化学习的无人机辅助弹性视频多播机制
Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast
计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[10] 周仕承, 刘京菊, 钟晓峰, 卢灿举.
基于深度强化学习的智能化渗透测试路径发现
Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning
计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
[11] 李贝贝, 宋佳芮, 杜卿芸, 何俊江.
DRL-IDS:基于深度强化学习的工业物联网入侵检测系统
DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things
计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
[12] 梁俊斌, 张海涵, 蒋婵, 王天舒.
移动边缘计算中基于深度强化学习的任务卸载研究进展
Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing
计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
[13] 王英恺, 王青山.
能量收集无线通信系统中基于强化学习的能量分配策略
Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting
计算机科学, 2021, 48(7): 333-339. https://doi.org/10.11896/jsjkx.201100154
[14] 牛康力, 谌雨章, 张龚平, 谭前程, 王绎冲, 罗美琪.
基于深度学习的无人机航拍车流量监测
Vehicle Flow Measuring of UVA Based on Deep Learning
计算机科学, 2021, 48(6A): 275-280. https://doi.org/10.11896/jsjkx.200900149
[15] 周欣, 刘硕迪, 潘薇, 陈媛媛.
自然交通场景中的车辆颜色识别
Vehicle Color Recognition in Natural Traffic Scene
计算机科学, 2021, 48(6A): 15-20. https://doi.org/10.11896/jsjkx.200800078
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!