计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 226-232.doi: 10.11896/jsjkx.200300021

• 人工智能 • 上一篇    下一篇

基于改进深度强化学习方法的单交叉口信号控制

刘志, 曹诗鹏, 沈阳, 杨曦   

  1. 浙江工业大学计算机科学与技术学院 杭州 310023
  • 收稿日期:2020-03-03 修回日期:2020-05-10 发布日期:2020-12-17
  • 通讯作者: 杨曦(xyang@zjut.edu.cn)
  • 作者简介:lzhi@zjut.edu.cn
  • 基金资助:
    浙江省公益技术研究计划项目(LGG20F030008);浙江省自然科学基金项目(LY20F030018)

Signal Control of Single Intersection Based on Improved Deep Reinforcement Learning Method

LIU Zhi, CAO Shi-peng, SHEN Yang, YANG Xi   

  1. College of Computer Science and Technology Zhejiang University of Technology Hangzhou 310023,China
  • Received:2020-03-03 Revised:2020-05-10 Published:2020-12-17
  • About author:LIU Zhi,born in 1969Ph.Dprofessoris a member of China Computer Federation.Her main research interests include intelligent transportation and image processing.
    YANG Xi,born in 1982Ph.Dassociate professor.His main research interests include control and optimization theoryintelligent transportation systems.
  • Supported by:
    Public Welfare Technology Research Project of Zhejiang Province,China(LGG20F030008) and Natural Science Foundation of Zhejiang Province,China(LY20F030018).

摘要: 利用深度强化学习技术实现路口信号控制是智能交通领域的研究热点.现有研究大多利用强化学习来全面刻画交通状态以及设计有效强化学习算法以解决信号配时问题但这些研究往往忽略了信号灯状态对动作选择的影响以及经验池中的数据采样效率导致训练过程不稳定、迭代收敛较慢等问题.为此文中在智能体模型设计方面将信号灯状态纳入状态设计并引入动作奖惩系数来调节智能体动作选择以满足相位最小绿灯时间和最大绿灯时间的约束.同时结合短期内交通流存在的时序相关性文中采用优先级序列经验回放(Priority Sequence Experience ReplayPSER)的方式来更新经验池中序列样本的优先级使得智能体获取与交通状况匹配度更高的前序相关样本并通过双Q网络和竞争式Q网络来进一步提升DQN(Deep Q Network)算法的性能.最后以杭州市萧山区市心中路和山阴路形成的单交叉口为例在仿真平台SUMO(Simulation of Urban Mobility)上对算法进行验证实验结果表明提出的智能体模型优于无约束单一状态模型在此基础上提出的算法能够有效缩短车辆平均等待时间和路口总排队长度控制效果优于实际配时策略以及传统的DQN算法.

关键词: 动作奖惩系数, 多指标系数加权, 深度Q网络, 信号控制, 优先级序列经验回放

Abstract: Using deep reinforcement learning Technology to achieve signal control is a researches hot spot in the field of intelligent transportation.Existing researches mainly focus on the comprehensive description of traffic conditions based on reinforcement learning formulation and the design of effective reinforcement learning algorithms to solve the signal timing problem.Howeverthe influence of signal state on action selection and the efficiency of data sampling in the experience pool are lack of considerationswhich may result in unstable training process and slow convergence of the algorithm.This paper incorporates the signal state into the state design of the agent modeland introduces action reward and punishment coefficients to adjust the agent's action selection in order to meet the constraints of the minimum and maximum green light time.Meanwhileconsidering the temporal correlation of short-term traffic flowthe PSER (Priority Sequence Experience Replay) method is used to update the priorities of sequence samples in the experience pool.It facilitates the agent to obtain the preorder correlation samples with higher matching degree corresponding to traffic conditions.Then the double deep Q network and dueling deep Q network are used to improve the performance of DQN (Deep Q Network) algorithm.Finallytaking the single intersection of Shixinzhong Road and Shanyin RoadXiaoshan DistrictHangzhouas an examplethe algorithm is verified on the simulation platform SUMO (Simulation of Urban Mobility).Experimental results show that the proposed agent model outperforms the unconstrained single-state agent models for traffic signal control problemsand the algorithm proposed in the paper can effectively reduce the average waiting time of vehicles and total queue length at the intersection.The general control performance is better than the actual signal timing strategy and the traditional DQN algorithm.

Key words: Action reward and punishment coefficient, Deep Q Network, Priority sequence experience replay, Signal control, Weighted multi-index coefficient

中图分类号: 

  • TP181
[1] HUO Y S.A Summary of Traffic Signal Control Method Based on Reinforcement Learning[C]//The 12th Annual Conference of China Intelligent Transportation.2017:858-865.
[2] SUN H,CHEN C L,LIU Q,et al.Traffic Signal Control Me-thod Based on Deep Reinforcement Learning[J].Computer Science,2020,47(2):169-174.
[3] ZENG J,HU J,ZHANG Y.Adaptive Traffic Signal Controlwith Deep Recurrent Q-learning[C]//IEEE Intelligent Vehicles Symposium.2018:1215-1220.
[4] GAO J,SHEN Y,LIU J,et al.Adaptive Traffic Signal Control:Deep Reinforcement Learning Algorithm with Experience Replay and Target Network[J].arXiv:1705.02755,2017.
[5] GENDERS W,RAZAVI S.Using a Deep Reinforcement Learning Agent for Traffic Signal Control[J].arXiv:1611.01142,2016.
[6] WAN C H,HWANG M C.Value-based deep reinforcementlearning for adaptive isolated intersection signal control[J].IET Intelligent Transport System,2018,12(9):1005-1010.
[7] MATTHEW M,LIPING F,GUANGGYUAN P.AdaptiveTraffic Signal Control with Deep Reinforcement Learning- An Exploratory Investigation[C]//Transportation Research Board 97th Annual Meeting.2019:18-33.
[8] LI L,LYU Y,WANG F Y,et al.Traffic Signal Timing via Deep Reinforcement Learning[J].IEEE/CAA Journal of Automatic Sinica,2016,3(3):247-254.
[9] LIANG X,DU X,WANG G,et al.A Deep ReinforcementLearning Network for Traffic Light Cycle Control[J].IEEE Transactions on Vehicular Technology ,2019,68(2):1243-1253.
[10] BRITAIN M,BERTRAM J,YANG X,et al.Prioritized Se-quence Experience Replay[J].arXiv:1905.12726,2019.
[11] YAU K,QADIR J,KHOO H,et al.A Survey on Reinforcement Learning Models and Algorithms for Traffic Signal Control[J].ACM Computing Surveys,2017,50(3):1-38.
[12] ASLANI M,SEIPEL S,SAADI M,et al.Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehhran[J].Advanced Engineering Informatics,2018,38:639-655.
[13] ADULHAI B,PRINGLE R,KARAKOULAS G.Reinforcement learning for true adaptive traffic signal control[J].Journal of Transportation Engineering,2003,129(3):278-285.
[14] THROPE T L,ANDERSON C W.Traffic light control usingSARSA with three state representations[R].Technical Report,IBM Corporation,1996.
[15] EI-TANTAWY S,ABDULHAI B,ABDELGA-WAD H.Design of Reinforcement Learning Parameters for Seamless Application of Adaptive Traffic Signal Control[J].Journal of Intelligent Transportation Systems,2014,18(3):227-245.
[16] LAI J H.Traffic Signal Control based on Double Deep Q-learning Network with Dueling Architecture[J].Computer Science,2019,46(S2):117-121.
[17] WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Ar-chitectures for Deep Reinforcement Learning[C]//Proceeding of the 33rd International Conference on Machine Learning.2016:1995-2003.
[18] VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-learning[C]//Association for the Advance of Artificial Intelligence.2016:2094-2100.
[19] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[C]//Proceedings of the 4th International Conference on Learning Representations.2016:322-355.
[20] FOERSTER J N,ASSAEL Y M,DE FREITAS N,et al.Learning to Communicate with Deep Multi-Agent Reinforcement Learning[C]//29th Neural Information Processing Systems.2016:10-22.
[1] 黄志勇, 吴昊霖, 王壮, 李辉.
基于平均神经网络参数的DQN算法
DQN Algorithm Based on Averaged Neural Network Parameters
计算机科学, 2021, 48(4): 223-228. https://doi.org/10.11896/jsjkx.200600177
[2] 马堉银, 郑万波, 马勇, 刘航, 夏云霓, 郭坤银, 陈鹏, 刘诚武.
一种基于深度强化学习与概率性能感知的边缘计算环境多工作流卸载方法
Multi-workflow Offloading Method Based on Deep Reinforcement Learning and ProbabilisticPerformance-awarein Edge Computing Environment
计算机科学, 2021, 48(1): 40-48. https://doi.org/10.11896/jsjkx.200900195
[3] 孙浩,陈春林,刘琼,赵佳宝.
基于深度强化学习的交通信号控制方法
Traffic Signal Control Method Based on Deep Reinforcement Learning
计算机科学, 2020, 47(2): 169-174. https://doi.org/10.11896/jsjkx.190600154
[4] 赖建辉.
基于D3QN的交通信号控制策略
Traffic Signal Control Based on Double Deep Q-learning Network with Dueling Architecture
计算机科学, 2019, 46(11A): 117-121.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!