计算机科学 ›› 2023, Vol. 50 ›› Issue (4): 159-171.doi: 10.11896/jsjkx.220500261
于泽1, 宁念文1,4, 郑燕柳2, 吕怡宁1, 刘富强3, 周毅1,4
YU Ze1, NING Nianwen1,4, ZHENG Yanliu2, LYU Yining1, LIU Fuqiang3, ZHOU Yi1,4
摘要: 随着城市人口快速增加,私家车数量呈指数级增长,使本已不堪重负的交通系统将承受更大的压力,交通拥堵问题愈加凸显。传统交通信号控制技术难以适应复杂多变的交通情况,数据驱动的方法为基于控制的系统带来了新方向。深度强化学习方法与交通控制系统的结合在自适应交通信号控制中扮演着重要角色。首先,文中综述了智能交通信号控制系统应用的最新进展,对智能交通信号控制方法进行了分类讨论,总结了这一领域的现有工作。其次,采用深度强化学习方法能够有效解决智能交通信号控制中状态信息获取不准确、控制算法鲁棒性差以及区域协调控制能力弱等问题,在此基础上,给出了智能交通信号控制的仿真平台和实验设置概述,并通过实例进行了分析和验证。最后,探讨了智能交通信号控制领域面临的挑战和有待解决的问题,并总结了未来的研究方向。
中图分类号:
[1]公安部交通管理局.《今年上半年新注册登记机动车1871万辆》[EB/OL].(2022-01-11).https://www.mps.gov.cn/n2254314/n6409334/c8322353/content.html. [2]DIAO M,KONG H,ZHAO J,et al.Impacts of transportation networkcompanies on urban mobility[J].Nature Sustainability,2021,4(6):494-500. [3]SUN H,CHEN C L,LIU Q,et al.Traffic Signal Control Me-thod Based on Deep Reinforcement Learning[J].Computer Science,2020,47(2):169-174. [4]VARAIYA P.The max-pressure controller for arbitrary net-works of signalized intersections[M].Springer:Advances in Dynamic Network Modeling in Complex Transportation Systems,2013:27-66. [5]ALI M E M,DURDU A,ÇELTEK S A,et al.An adaptivemethod for traffic signal control based on fuzzy logic with webster and modified webster formula using SUMO traffic simulator[J].IEEE Access,2021,9:102985-102997. [6]SHI Y,LI J,HAN Q,et al.A Coordination Algorithm for Signalized Multi-Intersection to Maximize Green Wave Band in V2X Network[J].IEEE Access,2020,8(3):213706-213717. [7]WEI H,CHEN C,ZHENG G,et al.Presslight:Learning maxpressure control to coordinate traffic signals in arterial network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2019:1290-1298. [8]WEI H,ZHENG G,GAYAH V,et al.A survey on traffic signal control methods[J].arXiv:1904.08117,2019. [9]SRINIVASAN D,CHOY M C,CHEU R L.Neuralnetworks for real-time traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2006,7(3):261-272. [10]MANANDHAR B,JOSHI B.Adaptive traffic light control with statistical multiplexing technique and particle swarm optimization insmart cities[C]//2018 IEEE 3rd International Conference on Computing,Communication and Security(ICCCS).IEEE,2018:210-217. [11]SÁNCHEZ-MEDINA J J,GALÁN-MORENO M J,RUBIO-ROYO E.Traffic signal optimization in “La Almozara” district in saragossa under congestion conditions,using genetic algorithms,traffic microsimulation,and cluster computing[J].IEEE Transactions on Intelligent Transportation Systems,2009,11(1):132-141. [12]WEI H,ZHENG G,YAO H,et al.Intellilight:A reinforcement learning approach for intelligent traffic light control[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:2496-2505. [13]ZHENG G,XIONG Y,ZANG X,et al.Learning phase competition for traffic signal control[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Ma-nagement.2019:1963-1972. [14]ZHENG G,ZANG X,XU N,et al.Diagnosing reinforcementlearning for traffic signal control[J].arXiv:1905.04716,2019. [15]GRONAUER S,DIEPOLDK.Multi-agent deep reinforcementlearning:a survey[J].Artificial Intelligence Review,2022,55(2):895-943. [16]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS One,2017,12(4):e0172395.https://doi.org/10.1371/journal.pone.0172395. [17]ZHANG K,YANG Z,BAŞAR T.Multi-agent reinforcementlearning:A selective overview of theories and algorithms[J].Handbook of Reinforcement Learning and Control,2021,325(2):321-384. [18]XIONG Y,ZHENG G,XU K,et al.Learning traffic signal control from demonstrations[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:2289-2292. [19]FOERSTER J,ASSAEL I A,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2016:2145-2153. [20]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Lachine learning.1993:330-337. [21]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:4295-4304. [22]IQBAL S,DE WITT CAS,PENG B,et al.Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning[C]//International Conference on Machine Learning.PMLR,2021:4596-4606. [23]PANDEY D,PANDEY P.Approximate Q-learning:An intro-duction[C]//2010 Second International Conference on Machine Learning and Computing.IEEE,2010:317-320. [24]ARULKUMARAN K,DEISENROTH M P,BRUNDAGE M,et al.Deep reinforcement learning:A brief survey[J].IEEE Signal Processing Magazine,2017,34(6):26-38. [25]LEI L,TAN Y,ZHENG K,et al.Deep reinforcement learning for autonomous internet of things:Model,applications and challenges[J].IEEE Communications Surveys & Tutorials,2020,22(3):1722-1760. [26]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [27]LIU M,DENG J,XU M,et al.Cooperative deep reinforcement learning for traffic signal control[C]//23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD).Halifax.2017. [28]SCHUTERA M,GOBY N,SMOLAREK S,et al.Distributedtraffic light control at uncoupled intersections with real-world topology by deep reinforcement learning[C]//32nd Conference on Neural Information Processing Systems,within Workshop on Machine Learning for Intelligent Transportation Systems.Canada,2018:1-9. [29]LIU X Y,DING Z,BORST S,et al.Deep reinforcement lear-ning for intelligent transportation systems[C]//32nd Confe-rence on Neural Information Processing Systems.Canada,2018. [30]PUTERMAN M L.Markov decision processes:discrete stochastic dynamic programming[M].John Wiley & Sons,2014. [31]TAN T,BAO F,DENG Y,et al.Cooperative deep reinforcement learning for large-scale traffic grid signal control[J].IEEE Transactions on Cybernetics,2019,50(6):2687-2700. [32]WU T,ZHOU P,LIU K,et al.Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks[J].IEEE Transactions on Vehicular Technology,2020,69(8):8243-8256. [33]ZHAO T,WANG P,LI S.Traffic Signal Control with Deep Reinforcement Learning[C]//2019 International Conference on Intelligent Computing,Automation and Systems(ICICAS).IEEE,2019:763-767. [34]ZHANG R,ISHIKAWA A,WANG W,et al.Using reinforce-ment learning with partial vehicle detection for intelligent traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2020,22(1):404-415. [35]CHU T,WANG J,CODECÀ L,et al.Multi-agent deep rein-forcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2019,21(3):1086-1095. [36]MOUSAVI S S,SCHUKAT M,HOWLEY E.Traffic light control using deep policy-gradient and value-function-based reinforcement learning[J].IET Intelligent Transport Systems,2017,11(7):417-423. [37]VAN DER POL E,OLIEHOEK F A.Coordinated deep reinforcement learners for traffic light control[C]//Proceedings of Learning,Inference and Control of Multi-agent Systems(at NIPS 2016).2016:1-8. [38]GONG Y,ABDEL-ATY M,CAI Q,et al.Decentralized network level adaptive signal control by multi-agent deep reinforcement learning[J].Transportation Research Interdisciplinary Perspectives,2019,1:100020. [39]WAN C H,HWANG M C.Value-based deep reinforcementlearning for adaptive isolated intersection signal control[J].IET Intelligent Transport Systems,2018,12(9):1005-1010. [40]ZENG J,HU J,ZHANG Y.Adaptive traffic signal control with deep recurrent Q-learning[C]//2018 IEEE Intelligent Vehicles Symposium(IV).IEEE,2018:1215-1220. [41]WEI H,CHEN C,WU K,et al.Deep reinforcement learning for traffic signal control along arterials[C]//Proceedings of the 2019.DRL4KDD,2019. [42]TAN K L,PODDAR S,SARKAR S,et al.Deep reinforcement lear-ning for adaptive traffic signal control[C]//Dynamic Systems and Control Conference.American Society of Mechanical Engineers,2019. [43]WATKINS C J C H,DAYAN P.Q-learning[J].Machine lear-ning,1992,8(3):279-292. [44]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [45]PETERS J,SCHAAL S.Reinforcement learning of motor skills with policy gradients[J].Neural Networks,2008,21(4):682-697. [46]KONDA V,TSITSIKLIS J.Actor-critic algorithms[C]//Advances in Neural Information Processing Systems.1999:1008-1014. [47]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//International Conference on Learning Representations.American,2016. [48]MONAHAN G E.State of the art-a survey of partially obser-vable Markov decision processes:theory,models,and algorithms[J].Management Science,1982,28(1):1-16. [49]EREZ T,SMART W D.A scalable method for solving high-dimensional continuous POMDPs using local approximation[C]//Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence.California,2010. [50]RITCHER S.Traffic light scheduling using policy-gradient reinforcement learning[C]//The International Conference on Automated Planning and Scheduling.ICAPS,2007. [51]CHU T,QU S,WANG J.Large-scale traffic grid signal control with regional reinforcement learning[C]//2016 American Control Conference(ACC).IEEE,2016:815-820. [52]AZIZ H M A,ZHU F,UKKUSURI S V.Learning-based traffic signal controlalgorithms with neighborhood information sharing:An application for sustainable mobility[J].Journal of Intelligent Transportation Systems,2018,22(1):40-52. [53]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [54]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.1993:330-337. [55]FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabili-sing experience replay for deep multi-agent reinforcement lear-ning[C]//International Conference on Machine Learning.PMLR,2017:1146-1155. [56]GUESTRIN C,KOLLER D,PARR R.Multiagent planning with factored MDPs[C]//Advances in Neural Information Processing Systems.2001,14:1523-1530. [57]KOK J R,VLASSIS N.Collaborative multiagent reinforcement learning by payoff propagation[J].Journal of Machine Learning Research,2006,7(1):1789-1828. [58]CASAS N.Deep deterministic policy gradient for urban traffic light control[J].arXiv:1703.09035,2017. [59]WANG X,KE L,QIAO Z,et al.Large-scale traffic signal control using a novel multiagent reinforcement learning[J].IEEE Transactions on Cybernetics,2020,51(1):174-187. [60]LOPEZ P A,BEHRISCH M,BIEKER-WALZ L,et al.Microscopic traffic simulation using sumo[C]//2018 21st InternationalConference on Intelligent Transportation Systems(ITSC).IEEE,2018:2575-2582. [61]ZHANG H,FENG S,LIU C,et al.Cityflow:A multi-agent reinforcement learning environment for large scale city traffic scenario[C]//The World Wide Web Conference.2019:3620-3624. [62]FELLENDORF M,VORTISCH P.Microscopic traffic flow si-mulator VISSIM[M]//Fundamentals of Traffic Simulation.Springer,New York,NY,2010:63-93. [63]CAMERON G D B,DUNCAN G I D.PARAMICS-Parallel microscopic simulation of road traffic[J].The Journal of Supercomputing,1996,10(1):25-53. [64]GRAHAM B.Spatially-sparse convolutional neural networks[J].arXiv:1409.6070,2014. [65]HUANG D,OU J,XIAO H X,et al.Collaborative optimization of traffic signal lights and vehicle fleet trajectory at intersection[J].Journal of Chongqing University of Technology(Natural Science),2022,36(4):84-93. [66]ZHOU Y,LIU L,WANG L,et al.Service-aware 6G:An intelligent and open network based on the convergence of communication,computing and caching[J].Digital Communications and Networks,2020,6(3):253-256. |
[1] | 徐林玲, 周远, 黄鸿云, 刘杨. 基于碰撞危急程度和深度强化学习的实时轨迹规划算法 Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning 计算机科学, 2023, 50(3): 323-332. https://doi.org/10.11896/jsjkx.220100007 |
[2] | 黄昱洲, 王立松, 秦小麟. 一种基于深度强化学习的无人小车双层路径规划方法 Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning 计算机科学, 2023, 50(1): 194-204. https://doi.org/10.11896/jsjkx.220500241 |
[3] | 徐平安, 刘全. 基于相似度约束的双策略蒸馏深度强化学习方法 Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation 计算机科学, 2023, 50(1): 253-261. https://doi.org/10.11896/jsjkx.211100167 |
[4] | 张启阳, 陈希亮, 张巧. 基于轨迹感知的稀疏奖励探索方法 Sparse Reward Exploration Method Based on Trajectory Perception 计算机科学, 2023, 50(1): 262-269. https://doi.org/10.11896/jsjkx.220700010 |
[5] | 魏楠, 魏祥麟, 范建华, 薛羽, 胡永扬. 面向频谱接入深度强化学习模型的后门攻击方法 Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model 计算机科学, 2023, 50(1): 351-361. https://doi.org/10.11896/jsjkx.220800269 |
[6] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[7] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[8] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[9] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[10] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[11] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[12] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[13] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
[14] | 朱迪迪, 吴超. 群体智能中的协作与对抗 Cooperation and Confrontation in Crowd Intelligence 计算机科学, 2022, 49(11A): 210900249-7. https://doi.org/10.11896/jsjkx.210900249 |
[15] | 蔡岳, 王恩良, 孙哲, 孙知信. 基于双重指针网络的车货匹配双重序列决策研究 Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network 计算机科学, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257 |
|