计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 171-176.doi: 10.11896/j.issn.1002-137X.2016.08.035

• 人工智能 • 上一篇    下一篇

基于Q学习和动态权重的改进的区域交通信号控制方法

张辰,喻剑,何良华   

  1. 同济大学电子与信息工程学院 上海400047,同济大学电子与信息工程学院 上海400047,同济大学电子与信息工程学院 上海400047
  • 出版日期:2018-12-01 发布日期:2018-12-01

Promoted Traffic Control Strategy Based on Q Learning and Dynamic Weight

ZHANG Chen, YU Jian and HE Liang-hua   

  • Online:2018-12-01 Published:2018-12-01

摘要: Q学习在交通信号控制中具有广泛的应用。在区域交通中,基于Q学习的传统区域交通信号控制方法通过agent之间互相交流的方式获取周边路口信息,并作出最有利的决策。传统交通控制方法在大部分情况下具有良好的表现。然而,由于其对周边路口拥堵程度的回馈计算不准确,因此在周边路口堵塞程度相差较大时将出现决策失误,从而导致局部热点拥堵。针对该问题进行分析,并以传统的区域交通信号控制方法为基础,提出一种新的基于Q学习和动态权重的改进的区域交通信号控制方法,引入“路口权重”的概念,通过多目标组合法将其应用于回馈计算,且权重随路口实际交通情况动态改变,解决了易陷入局部热点拥堵的问题。应用仿真软件在3种不同的交通状况下进行模拟,结果表明,所提算法在“拥堵”的状况下较传统控制方法具有更突出的表现。

关键词: Q学习,区域控制,路口权重

Abstract: Q-Learning is widely used in traffic signal control.In traditional multi-agent traffic signal control policy,agents gain intersection information via network,and make the best control decision.It works well in most cases.But traditional policy has a weakness that the global reward is calculated by simple average.This may cause local block in some cases.This paper introduced a promoted area traffic signal control based on Q learning.“Intersection Weight” is used in the new calculation method,which varies dynamically according to the real traffic condition.Both traditional and promoted methods were used to experiment.The results show the advantage of the promoted one.

Key words: Q learning,Area traffic control,Intersection weight

[1] Sutton R S,Barto A G.Introduction to reinforcement learning[M].MIT Press,1998
[2] Sutton S.Introduction:The challenge of reinforcement learning[M].Reinforcement Learning.Springer US,1992:1-3
[3] Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(3/4):279-292
[4] Bazzan A L C.An Evolutionary Game Theoretic Approach for Coordination of Traffic Signal Agents[D].University of Karlsruhe,1997
[5] Bazzan A L C.A Distributed Approach for Coordination of Traffic Signal Agents[J].Autonomous Agents and Multi-Agent Systems,2005(10):131-164
[6] Hunt P B,Robertson D I,Bretherton R D,et al.SCOOT-a traffic responsive method of coordinatingsignals[D].United Kingdom,1981
[7] Sims A G,Dobinson K W.The Sydney Coordinated AdaptiveTraffic (SCAT) system philosophy and benefits[J].IEEE Trans.Veh.Technol.,1980(29):130-137
[8] Abdulhai B,Pringle R,Karakoulas G J.Reinforcement learning for true adaptive traffic signal control[J].Journal of Transportation Engineering,2003,129(3):278-285
[9] Araghi S,Khosravi A,Johnstone M,et al.Q-learning method for controlling traffic signal phase time in a single intersection[C]∥16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).2013:1261-1265
[10] Lu Shou-feng,Zhang Shu,Liu Xi-min.On-line Q Learning Mo-del for Minimizing Average Queue Length Difference of Single Intersection[J].Journal of Highway and Transportation Research and Development,2014,31(11):116-122(in Chinese) 卢守峰,张术,刘喜敏.平均排队长度差最小的单交叉口在线Q学习模型[J].公路交通科技,2014,1(11):116-122
[11] Prabuchandran K J,Kumar H,Bhatnagar A N,et al.Decentra-lized learning for traffic signal control[C]∥2015 7th International Conference on Communication Systems and Networks.2015:1-6
[12] Kar S,Moura J M F,Poor H V.Distributed reinforcement learning in multi-agent networks[C]∥2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).2013:296-299
[13] Abdoos M,Mozayani N,Bazzan A L C.Traffic light control in non-stationary environments based on multi agent Q-learning[C]∥2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).IEEE,2011:1580-1585
[14] Wiering M.Multi-Agent Reinforcement Learning for TrafficLight Control.Machine Learning[C]∥Proceedings of the Se-venteenth International Conference (ICML’2000).2000:1151-1158
[15] Wiering M,et al.Intelligent Traffic Light Control[R].Technical Report UU-CS-2004-029,University Utrecht,2004
[16] Xu Lun-hui,Xia Xin-hai,Luo Qiang.The study of reinforcement learning for traffic self-adaptive control under multiagent Markovgame environment[J].Mathematical Problems in Enginee-ring,2013,2013(6):1-10
[17] Chanloha P,Chinrungrueng J,Usaha W,et al.Cell Transmission Model-Based Multiagent Q-Learning for Network-Scale Signal Control With Transit Priority[J].Computer Journal,2014,57(3):451-468
[18] Arel I,Liu C,Urbanik T,et al.Reinforcement learning-based multi-agent system for network traffic signal control[J].Intelligent Transport Systems,IET,2010,4(2):128-135
[19] Puterman M L.Markov decision processes:discrete stochasticdynamic programming[M].John Wiley & Sons,2009
[20] Papadimitriou C H,Tsitsiklis J N.The complexity of Markovdecision processes[J].Mathematics of Operations Research,1987,12(3):441-450
[21] Rasmussen C E,Williams K I.Gaussian processes for machine learning[M].The MIT Press,2006

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!