基于Q学习和动态权重的改进的区域交通信号控制方法

doi:10.11896/j.issn.1002-137X.2016.08.035

摘要/Abstract

摘要： Q学习在交通信号控制中具有广泛的应用。在区域交通中,基于Q学习的传统区域交通信号控制方法通过agent之间互相交流的方式获取周边路口信息,并作出最有利的决策。传统交通控制方法在大部分情况下具有良好的表现。然而,由于其对周边路口拥堵程度的回馈计算不准确,因此在周边路口堵塞程度相差较大时将出现决策失误,从而导致局部热点拥堵。针对该问题进行分析,并以传统的区域交通信号控制方法为基础,提出一种新的基于Q学习和动态权重的改进的区域交通信号控制方法,引入“路口权重”的概念,通过多目标组合法将其应用于回馈计算,且权重随路口实际交通情况动态改变,解决了易陷入局部热点拥堵的问题。应用仿真软件在3种不同的交通状况下进行模拟,结果表明,所提算法在“拥堵”的状况下较传统控制方法具有更突出的表现。

关键词: Q学习,区域控制,路口权重

Abstract: Q-Learning is widely used in traffic signal control.In traditional multi-agent traffic signal control policy,agents gain intersection information via network,and make the best control decision.It works well in most cases.But traditional policy has a weakness that the global reward is calculated by simple average.This may cause local block in some cases.This paper introduced a promoted area traffic signal control based on Q learning.“Intersection Weight” is used in the new calculation method,which varies dynamically according to the real traffic condition.Both traditional and promoted methods were used to experiment.The results show the advantage of the promoted one.

Key words: Q learning,Area traffic control,Intersection weight

张辰,喻剑,何良华. 基于Q学习和动态权重的改进的区域交通信号控制方法[J]. 计算机科学, 2016, 43(8): 171-176. https://doi.org/10.11896/j.issn.1002-137X.2016.08.035

ZHANG Chen, YU Jian and HE Liang-hua. Promoted Traffic Control Strategy Based on Q Learning and Dynamic Weight[J]. Computer Science, 2016, 43(8): 171-176. https://doi.org/10.11896/j.issn.1002-137X.2016.08.035

参考文献

[1] Sutton R S,Barto A G.Introduction to reinforcement learning[M].MIT Press,1998
[2] Sutton S.Introduction:The challenge of reinforcement learning[M].Reinforcement Learning.Springer US,1992:1-3
[3] Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(3/4):279-292
[4] Bazzan A L C.An Evolutionary Game Theoretic Approach for Coordination of Traffic Signal Agents[D].University of Karlsruhe,1997
[5] Bazzan A L C.A Distributed Approach for Coordination of Traffic Signal Agents[J].Autonomous Agents and Multi-Agent Systems,2005(10):131-164
[6] Hunt P B,Robertson D I,Bretherton R D,et al.SCOOT-a traffic responsive method of coordinatingsignals[D].United Kingdom,1981
[7] Sims A G,Dobinson K W.The Sydney Coordinated AdaptiveTraffic (SCAT) system philosophy and benefits[J].IEEE Trans.Veh.Technol.,1980(29):130-137
[8] Abdulhai B,Pringle R,Karakoulas G J.Reinforcement learning for true adaptive traffic signal control[J].Journal of Transportation Engineering,2003,129(3):278-285
[9] Araghi S,Khosravi A,Johnstone M,et al.Q-learning method for controlling traffic signal phase time in a single intersection[C]∥16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).2013:1261-1265
[10] Lu Shou-feng,Zhang Shu,Liu Xi-min.On-line Q Learning Mo-del for Minimizing Average Queue Length Difference of Single Intersection[J].Journal of Highway and Transportation Research and Development,2014,31(11):116-122(in Chinese) 卢守峰,张术,刘喜敏.平均排队长度差最小的单交叉口在线Q学习模型[J].公路交通科技,2014,1(11):116-122
[11] Prabuchandran K J,Kumar H,Bhatnagar A N,et al.Decentra-lized learning for traffic signal control[C]∥2015 7th International Conference on Communication Systems and Networks.2015:1-6
[12] Kar S,Moura J M F,Poor H V.Distributed reinforcement learning in multi-agent networks[C]∥2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).2013:296-299
[13] Abdoos M,Mozayani N,Bazzan A L C.Traffic light control in non-stationary environments based on multi agent Q-learning[C]∥2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).IEEE,2011:1580-1585
[14] Wiering M.Multi-Agent Reinforcement Learning for TrafficLight Control.Machine Learning[C]∥Proceedings of the Se-venteenth International Conference (ICML’2000).2000:1151-1158
[15] Wiering M,et al.Intelligent Traffic Light Control[R].Technical Report UU-CS-2004-029,University Utrecht,2004
[16] Xu Lun-hui,Xia Xin-hai,Luo Qiang.The study of reinforcement learning for traffic self-adaptive control under multiagent Markovgame environment[J].Mathematical Problems in Enginee-ring,2013,2013(6):1-10
[17] Chanloha P,Chinrungrueng J,Usaha W,et al.Cell Transmission Model-Based Multiagent Q-Learning for Network-Scale Signal Control With Transit Priority[J].Computer Journal,2014,57(3):451-468
[18] Arel I,Liu C,Urbanik T,et al.Reinforcement learning-based multi-agent system for network traffic signal control[J].Intelligent Transport Systems,IET,2010,4(2):128-135
[19] Puterman M L.Markov decision processes:discrete stochasticdynamic programming[M].John Wiley & Sons,2009
[20] Papadimitriou C H,Tsitsiklis J N.The complexity of Markovdecision processes[J].Mathematics of Operations Research,1987,12(3):441-450
[21] Rasmussen C E,Williams K I.Gaussian processes for machine learning[M].The MIT Press,2006

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed