基于D3QN的交通信号控制策略

摘要/Abstract

摘要： 交叉口是城市路网的核心和枢纽,合理优化交叉口的信号控制可以极大地提高城市交通体系的运行效率,而将实时交通信息作为输入并动态调整交通信号灯的相位时间成为了当前研究的重要方向。文中提出了一种基于D3QN(Double Deep Q-Learning Network with Dueling Architecture)深度强化学习模型的交通信号控制方法,其利用深度学习网络,结合交通信号控制机构成了一个用于调整交叉口信号控制策略的智能体,然后采用DTSE(离散交通状态编码)方法将交叉口的交通状态转换为由车辆的位置和速度信息所组成的二维矩阵,通过深度学习对交通状态特征进行高层抽象表征,从而实现对交通状态的精确感知。在此基础上,通过强化学习来实现自适应交通信号控制策略。最后,利用交通微型仿真器SUMO进行仿真实验,以定时控制和感应控制方法作为对照实验,结果表明文中提出的方法得到了更好的控制效果,因此是可行且有效的。

关键词: 交通信号控制, 强化学习, 深度强化学习, 深度学习, 智能交通

Abstract: The intersection is the core and hub of the urban road network.Reasonable optimization of the signal control at the intersection can greatly improve the operational efficiency of the urban transportation system.Using real-time traffic information as input and dynamically adjusting the phase time of the traffic signal becomes the important direction of current research.This paper proposed a traffic signal control method based on double deep Q-learning network with Dueling Architecture (D3QN).The deep learning network is combined with the traffic signal control machine to form an intelligent agent for adjusting the signal control strategy of the intersection.Then the DTSE (Discrete Traffic State Coding) method is used to transform the traffic state of the intersection into a two-dimensional matrix composed of the position and velocity information of the vehicle.Then high-level features are captured by deep neural network,which makes accurate perception of traffic state come true.On this basis,an adaptive traffic signal control strategy is realized through reinforcement learning.Finally,the traffic micro-simulator (SUMO) is used for simulation experiments,the timing control and induction control methods are used as control experiments.The results show that the proposed method achieves better control effect and is therefore feasible and effective.

Key words: Deep learning, Deep reinforcement learning, Intelligent transportation, Reinforcement learning, Traffic signal control

中图分类号:

TP391

赖建辉. 基于D3QN的交通信号控制策略[J]. 计算机科学, 2019, 46(11A): 117-121. https://doi.org/

LAI Jian-hui. Traffic Signal Control Based on Double Deep Q-learning Network with Dueling Architecture[J]. Computer Science, 2019, 46(11A): 117-121. https://doi.org/

参考文献

[1]LI L,WEN D,YAO D Y.A survey of traffic control with vehicu-lar communications[J].IEEE Transactions on Intelligent Transportation Systems,2014,15(1):425-432.
[2]FADLULLAH Z,TANG F,MAO B,et al.State-of-the-ArtDeep Learning:Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems[J].IEEE Communications Surveys & Tutorials,2017,PP(99):1-1.
[3]NGUYEN,THUY T T,ARMITAGE G J.A survey of tech-niques for internet traffic classification using machine learning[J].IEEE Communications Surveys & Tutorials,2008,10(3):56-76.
[4]CHIN Y K,KOW W Y,KHONG W L,et al.Q-Learning Traffic Signal Optimization within Multiple Intersections Traffic Network[C]∥2012 Sixth UKSim/AMSS European Symposium on Computer Modeling and Simulation (EMS).IEEE,2012.
[5]CHIN Y K,LEE L K,BOLONG N,et al.Exploring Q-Learning Optimization in Traffic Signal Timing Plan Management[C]∥Third International Conference on Computational Intelligence.IEEE,2011.
[6]ARAGHI S,KHOSRAVI A,CREIGHTON D C,et al.Optimal fuzzy traffic signal controller for an isolated intersection[C]∥IEEE International Conference on Systems.IEEE,2014.
[7]CHEN Y H,CHANG C J,HUANG C Y.Fuzzy Q-Learning Admission Control for WCDMA/WLAN Heterogeneous Networks with Multimedia Traffic[J].IEEE Transactions on Mobile Computing,2009,8(11):1469-1479.
[8]CHIU S,CHAND S.Adaptive traffic signal control using fuzzy logic[C]∥IEEE International Conference on Fuzzy Systems.IEEE,1992.
[9]B BINGHAM E.Reinforcement learning in neurofuzzy trafficsignal control[J].European Journal of Operational Research,2001,131(2):232-241.
[10]LA P,BHATNAGAR S.Reinforcement Learning With Func-tion Approximation for Traffic Signal Control[J].IEEE Tran-sactions on Intelligent Transportation Systems,2011,12(2):412-421.
[11]EL-TANTAWY S,ABDULHAI B,ABDELGAWAD H.Mul-tiagent reinforcement learning for integrated network of adaptive trafFIc signal controllers (MARLIN-ATSC):methodology and large-scale application on downtown Toronto[J].IEEE Transactions on Intelligent Transportation Systems,2013,14(3):1140-1150.
[12]OZAN C,BASKAN O,HALDENBILEN S,et al.A modified reinforcement learning algorithm for solving coordinated signalized networks[J].Transportation Research Part C:Emerging Technologies,2015,54:40-55.
[13]ELTANTAWY S,ABDULHAI B,ABDELGAWAD H.Design of Reinforcement Learning Parameters for Seamless Application of Adaptive Traffic Signal Control[J].Journal of Intelligent Transportation Systems,2014,18(3):227-245.
[14]ABDOOS M,MOZAYANI N,BAZZAN A L C.Holonic multi-agent system for traffic signals control[J].Engineering Applications of Artificial Intelligence,2013,26(5／6):1575-1587.
[15]ABDULHAI B,PRINGLE R,KARAKOULAS G J.Reinforcement learning for true adaptive traffic signal control[J].Journal of Transportation Engineering,2003,129(3):278-285.
[16]AREL I,LIU C,URBANIK T,et al.Reinforcement learning-based multi-agent system for network traffic signal control[J].IET Intelligent Transport Systems,2010,4(2):128.
[17]BALAJI P G,GERMAN X,SRINIVASAN D.Urban traffic signal control using reinforcement learning agents[J].IET Intelligent Transport Systems,2010,4(3).
[18]GENDERS W,RAZAVI S.Using a Deep Reinforcement Lear-ning Agent for Traffic Signal Control[J].arXiv:1611.01142v1,2016.
[19]SUTTON R,BARTO A.Reinforcement Learning:An Introduction[M].Cambridge,MA:MIT Press,1998.
[20]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[21]BENGIO Y.Learning deep architectures for AI[J].Foundations and Trendsin Machine Learning,2009,2(1):1-127.
[22]LANGE S,RIEDMILLER M.Deep auto-encoder neural net-works in reinforcement learning[C]∥Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN).Barcelona:IEEE,2010:1-8.
[23]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[24]LI L,LV Y,WANG F Y.Traffic Signal Timing via Deep Reinforcement Learning[J].IEEE/CAA Journal of Automatica Sinica,2016,3(3):247-254.
[25]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].arXiv:1312.5602,2013.

相关文章 15

[1]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[5]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[7]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[8]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[9]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[10]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[11]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[12]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[13]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[15]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed