计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 226-232.doi: 10.11896/jsjkx.200300021
刘志, 曹诗鹏, 沈阳, 杨曦
LIU Zhi, CAO Shi-peng, SHEN Yang, YANG Xi
摘要: 利用深度强化学习技术实现路口信号控制是智能交通领域的研究热点.现有研究大多利用强化学习来全面刻画交通状态以及设计有效强化学习算法以解决信号配时问题但这些研究往往忽略了信号灯状态对动作选择的影响以及经验池中的数据采样效率导致训练过程不稳定、迭代收敛较慢等问题.为此文中在智能体模型设计方面将信号灯状态纳入状态设计并引入动作奖惩系数来调节智能体动作选择以满足相位最小绿灯时间和最大绿灯时间的约束.同时结合短期内交通流存在的时序相关性文中采用优先级序列经验回放(Priority Sequence Experience ReplayPSER)的方式来更新经验池中序列样本的优先级使得智能体获取与交通状况匹配度更高的前序相关样本并通过双Q网络和竞争式Q网络来进一步提升DQN(Deep Q Network)算法的性能.最后以杭州市萧山区市心中路和山阴路形成的单交叉口为例在仿真平台SUMO(Simulation of Urban Mobility)上对算法进行验证实验结果表明提出的智能体模型优于无约束单一状态模型在此基础上提出的算法能够有效缩短车辆平均等待时间和路口总排队长度控制效果优于实际配时策略以及传统的DQN算法.
中图分类号:
[1] HUO Y S.A Summary of Traffic Signal Control Method Based on Reinforcement Learning[C]//The 12th Annual Conference of China Intelligent Transportation.2017:858-865. [2] SUN H,CHEN C L,LIU Q,et al.Traffic Signal Control Me-thod Based on Deep Reinforcement Learning[J].Computer Science,2020,47(2):169-174. [3] ZENG J,HU J,ZHANG Y.Adaptive Traffic Signal Controlwith Deep Recurrent Q-learning[C]//IEEE Intelligent Vehicles Symposium.2018:1215-1220. [4] GAO J,SHEN Y,LIU J,et al.Adaptive Traffic Signal Control:Deep Reinforcement Learning Algorithm with Experience Replay and Target Network[J].arXiv:1705.02755,2017. [5] GENDERS W,RAZAVI S.Using a Deep Reinforcement Learning Agent for Traffic Signal Control[J].arXiv:1611.01142,2016. [6] WAN C H,HWANG M C.Value-based deep reinforcementlearning for adaptive isolated intersection signal control[J].IET Intelligent Transport System,2018,12(9):1005-1010. [7] MATTHEW M,LIPING F,GUANGGYUAN P.AdaptiveTraffic Signal Control with Deep Reinforcement Learning- An Exploratory Investigation[C]//Transportation Research Board 97th Annual Meeting.2019:18-33. [8] LI L,LYU Y,WANG F Y,et al.Traffic Signal Timing via Deep Reinforcement Learning[J].IEEE/CAA Journal of Automatic Sinica,2016,3(3):247-254. [9] LIANG X,DU X,WANG G,et al.A Deep ReinforcementLearning Network for Traffic Light Cycle Control[J].IEEE Transactions on Vehicular Technology ,2019,68(2):1243-1253. [10] BRITAIN M,BERTRAM J,YANG X,et al.Prioritized Se-quence Experience Replay[J].arXiv:1905.12726,2019. [11] YAU K,QADIR J,KHOO H,et al.A Survey on Reinforcement Learning Models and Algorithms for Traffic Signal Control[J].ACM Computing Surveys,2017,50(3):1-38. [12] ASLANI M,SEIPEL S,SAADI M,et al.Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehhran[J].Advanced Engineering Informatics,2018,38:639-655. [13] ADULHAI B,PRINGLE R,KARAKOULAS G.Reinforcement learning for true adaptive traffic signal control[J].Journal of Transportation Engineering,2003,129(3):278-285. [14] THROPE T L,ANDERSON C W.Traffic light control usingSARSA with three state representations[R].Technical Report,IBM Corporation,1996. [15] EI-TANTAWY S,ABDULHAI B,ABDELGA-WAD H.Design of Reinforcement Learning Parameters for Seamless Application of Adaptive Traffic Signal Control[J].Journal of Intelligent Transportation Systems,2014,18(3):227-245. [16] LAI J H.Traffic Signal Control based on Double Deep Q-learning Network with Dueling Architecture[J].Computer Science,2019,46(S2):117-121. [17] WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Ar-chitectures for Deep Reinforcement Learning[C]//Proceeding of the 33rd International Conference on Machine Learning.2016:1995-2003. [18] VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-learning[C]//Association for the Advance of Artificial Intelligence.2016:2094-2100. [19] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[C]//Proceedings of the 4th International Conference on Learning Representations.2016:322-355. [20] FOERSTER J N,ASSAEL Y M,DE FREITAS N,et al.Learning to Communicate with Deep Multi-Agent Reinforcement Learning[C]//29th Neural Information Processing Systems.2016:10-22. |
[1] | 黄志勇, 吴昊霖, 王壮, 李辉. 基于平均神经网络参数的DQN算法 DQN Algorithm Based on Averaged Neural Network Parameters 计算机科学, 2021, 48(4): 223-228. https://doi.org/10.11896/jsjkx.200600177 |
[2] | 马堉银, 郑万波, 马勇, 刘航, 夏云霓, 郭坤银, 陈鹏, 刘诚武. 一种基于深度强化学习与概率性能感知的边缘计算环境多工作流卸载方法 Multi-workflow Offloading Method Based on Deep Reinforcement Learning and ProbabilisticPerformance-awarein Edge Computing Environment 计算机科学, 2021, 48(1): 40-48. https://doi.org/10.11896/jsjkx.200900195 |
[3] | 孙浩,陈春林,刘琼,赵佳宝. 基于深度强化学习的交通信号控制方法 Traffic Signal Control Method Based on Deep Reinforcement Learning 计算机科学, 2020, 47(2): 169-174. https://doi.org/10.11896/jsjkx.190600154 |
[4] | 赖建辉. 基于D3QN的交通信号控制策略 Traffic Signal Control Based on Double Deep Q-learning Network with Dueling Architecture 计算机科学, 2019, 46(11A): 117-121. |
|