Computer Science ›› 2016, Vol. 43 ›› Issue (8): 171-176.doi: 10.11896/j.issn.1002-137X.2016.08.035

Previous Articles     Next Articles

Promoted Traffic Control Strategy Based on Q Learning and Dynamic Weight

ZHANG Chen, YU Jian and HE Liang-hua   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Q-Learning is widely used in traffic signal control.In traditional multi-agent traffic signal control policy,agents gain intersection information via network,and make the best control decision.It works well in most cases.But traditional policy has a weakness that the global reward is calculated by simple average.This may cause local block in some cases.This paper introduced a promoted area traffic signal control based on Q learning.“Intersection Weight” is used in the new calculation method,which varies dynamically according to the real traffic condition.Both traditional and promoted methods were used to experiment.The results show the advantage of the promoted one.

Key words: Q learning,Area traffic control,Intersection weight

[1] Sutton R S,Barto A G.Introduction to reinforcement learning[M].MIT Press,1998
[2] Sutton S.Introduction:The challenge of reinforcement learning[M].Reinforcement Learning.Springer US,1992:1-3
[3] Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(3/4):279-292
[4] Bazzan A L C.An Evolutionary Game Theoretic Approach for Coordination of Traffic Signal Agents[D].University of Karlsruhe,1997
[5] Bazzan A L C.A Distributed Approach for Coordination of Traffic Signal Agents[J].Autonomous Agents and Multi-Agent Systems,2005(10):131-164
[6] Hunt P B,Robertson D I,Bretherton R D,et al.SCOOT-a traffic responsive method of coordinatingsignals[D].United Kingdom,1981
[7] Sims A G,Dobinson K W.The Sydney Coordinated AdaptiveTraffic (SCAT) system philosophy and benefits[J].IEEE Trans.Veh.Technol.,1980(29):130-137
[8] Abdulhai B,Pringle R,Karakoulas G J.Reinforcement learning for true adaptive traffic signal control[J].Journal of Transportation Engineering,2003,129(3):278-285
[9] Araghi S,Khosravi A,Johnstone M,et al.Q-learning method for controlling traffic signal phase time in a single intersection[C]∥16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).2013:1261-1265
[10] Lu Shou-feng,Zhang Shu,Liu Xi-min.On-line Q Learning Mo-del for Minimizing Average Queue Length Difference of Single Intersection[J].Journal of Highway and Transportation Research and Development,2014,31(11):116-122(in Chinese) 卢守峰,张术,刘喜敏.平均排队长度差最小的单交叉口在线Q学习模型[J].公路交通科技,2014,1(11):116-122
[11] Prabuchandran K J,Kumar H,Bhatnagar A N,et al.Decentra-lized learning for traffic signal control[C]∥2015 7th International Conference on Communication Systems and Networks.2015:1-6
[12] Kar S,Moura J M F,Poor H V.Distributed reinforcement learning in multi-agent networks[C]∥2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).2013:296-299
[13] Abdoos M,Mozayani N,Bazzan A L C.Traffic light control in non-stationary environments based on multi agent Q-learning[C]∥2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).IEEE,2011:1580-1585
[14] Wiering M.Multi-Agent Reinforcement Learning for TrafficLight Control.Machine Learning[C]∥Proceedings of the Se-venteenth International Conference (ICML’2000).2000:1151-1158
[15] Wiering M,et al.Intelligent Traffic Light Control[R].Technical Report UU-CS-2004-029,University Utrecht,2004
[16] Xu Lun-hui,Xia Xin-hai,Luo Qiang.The study of reinforcement learning for traffic self-adaptive control under multiagent Markovgame environment[J].Mathematical Problems in Enginee-ring,2013,2013(6):1-10
[17] Chanloha P,Chinrungrueng J,Usaha W,et al.Cell Transmission Model-Based Multiagent Q-Learning for Network-Scale Signal Control With Transit Priority[J].Computer Journal,2014,57(3):451-468
[18] Arel I,Liu C,Urbanik T,et al.Reinforcement learning-based multi-agent system for network traffic signal control[J].Intelligent Transport Systems,IET,2010,4(2):128-135
[19] Puterman M L.Markov decision processes:discrete stochasticdynamic programming[M].John Wiley & Sons,2009
[20] Papadimitriou C H,Tsitsiklis J N.The complexity of Markovdecision processes[J].Mathematics of Operations Research,1987,12(3):441-450
[21] Rasmussen C E,Williams K I.Gaussian processes for machine learning[M].The MIT Press,2006

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .