Computer Science ›› 2020, Vol. 47 ›› Issue (2): 169-174.doi: 10.11896/jsjkx.190600154

• Artificial Intelligence • Previous Articles     Next Articles

Traffic Signal Control Method Based on Deep Reinforcement Learning

SUN Hao,CHEN Chun-lin,LIU Qiong,ZHAO Jia-bao   

  1. (Department of Control and Systems Engineering,Nanjing University,Nanjing 210093,China)
  • Received:2019-03-25 Online:2020-02-15 Published:2020-03-18
  • About author:SUN Hao,born in 1996,postgraduate.His main research interests include deep learning and reinforcement lear-ning;ZHAO Jia-bao,born in 1972,Ph.D,associate professor.His main research interests include coordination and control methods for CAVs and knowledge automation in AIOps (Artificial Intelligence for IT Operations).
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (71732003) and National Key Research and Development Program of China (2016YFD0702100).

Abstract: The control of traffic signals is always a hotspot in intelligent transportation systems research.In order to adapt and coordinate traffic more timely and effectively,a novel traffic signal control algorithm based on distributional deep reinforcement learning was proposed.The model utilizes a deep neural network framework composed of target network,double Q network and value distribution to improve the performance.After integrating the discretization of the high-dimensional real-time traffic information at intersections with waiting time,queue length,delay time and phase information as states and making appropriate definitions of actions,rewards in the algorithm,it can learn the control strategy of traffic signals online and realize the adaptive control of traffic signals.It was compared with three typical deep reinforcement learning algorithms,and the experiments were performed in SUMO (Simulation of Urban Mobility) with the same setting.The results show that the distributional deep reinforcement learning algorithm is more efficient and robust,and has better performance on average delay,travel time,queue length,and wai-ting time of vehicles.

Key words: Intelligent transportation, Traffic signal control, Deep reinforcement learning, Distributional reinforcement learning

CLC Number: 

  • TP181
[1]SUTTON R S,BARTO A G.Introduction to reinforcement learning[M].Cambridge:MIT Press,1998.
[2]BELLEMARE M G,DABNEY W,MUNOS R.A distributionalperspective on reinforcement learning[C]∥Proceedings of the 34th International Conference on Machine,2017:449-458.
[3]CHIS S.Adaptive traffic signal control using fuzzy logic[C]∥Proceedings of the Intelligent Vehicles92 Symposium.IEEE,1992:98-107.
[4]PANDIT K,GHOSAL D,ZHANG H M,et al.Adaptive traffic signal control with vehicular ad hoc networks[J].IEEE Transactions on Vehicular Technology,2013,62(4):1459-1471.
[5]LIN W H,WANG C.An enhanced 0-1 mixed-integer LP formulation for traffic signal control[J].IEEE Transactions on Intelligent transportation systems,2004,5(4):238-245.
[6]PRASHANTH L A,BHATNAGAR S.Reinforcement learning with function approximation for traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2010,12(2):412-421.
[7]GIRIANNA M,BENEKOHAL R F.Using genetic algorithms to design signal coordination for oversaturated networks[J].Journal of Intelligent Transportation Systems,2004,8(2):117-129.
[8]SANCHEZ-MEDINA J J,GALAN-MORENO M J,RUBIO-ROYO E.Traffic signal optimization in “La Almozara” district in Saragossa under congestion conditions,using genetic algorithms,traffic microsimulation,and cluster computing[J].IEEE Transactions on Intelligent Transportation Systems,2009,11(1):132-141.
[9]YU X H,RECKER W.Stochastic adaptive control model for traffic signal systems[J].Transportation Research Part C:Emerging Technologies,2006,14(4):263-282.
[10]GOKULAN B P,SRINIVASAN D.Distributed geometric fuzzy multi agent urban traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2010,11(3):714-727.
[11]BOWLING M.Multi agent learning in the presence of agents with limitations[R].Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science,2003.
[12]PRASHANTH L,BHATNAGAR S.Threshold tuning using stochastic optimization for graded signal control[J].IEEE Transactions on Vehicular Technology,2012,61(9):3865-3880.
[13]LIU W,QIN G,HE Y,et al.Distributed cooperative reinforce-ment learning-based traffic signal control that integrates v2x networks’ dynamic clustering[J].IEEE Transactions onVehi-cular Technology,2017,66(10):8667-8681.
[14]GENDERS W,RAZAVI S.Using a deep reinforcement learning agent for traffic signal control[J].arXiv:1611.01142.
[15]El-TANTAWY S,ABDULHAI B,ABDELGAWAD H.Multi agent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC):methodology and large-scale application on downtown Toronto[J].IEEE Transactions on Intelligent Transportation Systems,2013,14(3):1140-1150.
[16]WIERING M A.Multi-agent reinforcement learning for traffic light control[C]∥Machine Learning:Proceedings of the Seventeenth International Conference (ICML’2000).2000:1151-1158.
[17]WIERING M,VREEKEN J,VAN VEENEN J,et al.Simulation and optimization of traffic in a city[C]∥IEEE Intelligent Vehicles Symposium,2004.IEEE,2004:453-458.
[18]MARSETIC R,SEMROV D,ZURA M.Road artery traffic light optimization with use of the reinforcement learning[J].PROMET-Traffic & Transportation,2014,26(2):101-108.
[19]PUTERMAN M L.Markov Decision Processes:Discrete Sto-chastic Dynamic Programming[M].John Wiley & Sons,2014.
[20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[21]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥Association for the Advance of Artificial Intelligence.2016:2094-2100.
[22]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥Proceedings of the 4th International Con-ference on Learning Representations.San Juan,Puerto Rico,2016:322-355.
[1] LI Lin, ZHAO Kai-yue, ZHAO Xiao-yong, WEI Shuai-qin and ZHANG Bing. Contaminated and Shielded Number Plate Recognition Based on Convolutional Neural Network [J]. Computer Science, 2020, 47(6A): 213-219.
[2] ANG Wei-yi,BAI Chen-jia,CAI Chao,ZHAO Ying-nan,LIU Peng. Survey on Sparse Reward in Deep Reinforcement Learning [J]. Computer Science, 2020, 47(3): 182-191.
[3] ZHANG Hao-yu, XIONG Kai. Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control [J]. Computer Science, 2019, 46(6A): 555-557.
[4] LI Jie, LING Xing-hong, FU Yu-chen, LIU Quan. Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism [J]. Computer Science, 2019, 46(5): 169-174.
[5] LAI Jian-hui. Traffic Signal Control Based on Double Deep Q-learning Network with Dueling Architecture [J]. Computer Science, 2019, 46(11A): 117-121.
[6] ZHAO Xing-yu, DING Shi-fei. Research on Deep Reinforcement Learning [J]. Computer Science, 2018, 45(7): 1-6.
[7] DUAN Na and WANG Lei. Image Retrieval of Global and Personalized ROI Adjustment of Features [J]. Computer Science, 2016, 43(Z11): 205-207.
[8] GAO Fa-qin. Path Prediction and Query Algorithm Based on Probability [J]. Computer Science, 2016, 43(8): 207-211.
[9] HOU Li-hong and LI Wei-dong. Internet of Things Technology in Application of ETC System [J]. Computer Science, 2015, 42(Z11): 532-535.
[10] ZHOU Cheng, YUAN Jia-zheng, LIU Hong-zhe and QIU Jing. Survey of Map-matching Algorithm for Intelligent Transport System [J]. Computer Science, 2015, 42(10): 1-6.
[11] CHENG Jia-lang,NI Wei,WU Wei-gang,CAO Jian-nong and LI Hong-jian. Survey on Vehicular Ad hoc Network Based Intelligent Transportation System [J]. Computer Science, 2014, 41(Z6): 1-10.
[12] LUO Qiang,WANG Guo-yin and CHU Wei-dong. Lane Detection in Micro-traffic under Complex Illumination [J]. Computer Science, 2014, 41(3): 46-49.
[13] TAO Hua,WANG Xiao-jun and DAI Hai-kuo. Services Modeling and Scheduling for Wireless Access Network Oriented Intelligent Transportation System [J]. Computer Science, 2014, 41(11): 182-186.
[14] ZHAO Na,YUAN Jia-bin and XU Han. Survey on Intelligent Transportation System [J]. Computer Science, 2014, 41(11): 7-11.
[15] . Discussion on the Intelligent Vehicle Technologies [J]. Computer Science, 2012, 39(5): 1-8.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .