用于交通信号灯控制的特征表示近似Q学习

计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 143-145.

用于交通信号灯控制的特征表示近似Q学习

李旻朔^1,2, 姚明海²

浙江师范大学数理信息工程学院浙江金华321004¹
浙江工业大学信息工程学院杭州310000²

出版日期:2019-02-26 发布日期:2019-02-26
通讯作者: 李旻朔(1974-),女,硕士,讲师,主要研究方向为机器学习,E-mail:lmshappy@zjnu.cn
作者简介:姚明海(1964-),男,博士,教授,主要研究方向为认知学习、机器学习。

Q-learning with Feature-based Approximation for Traffic Light Control

LI Min-shuo^1,2, YAO Ming-hai²

College of Mathematics,Physics and Information Engineering,Zhejiang Normal University,Jinhua,Zhejiang 321004,China¹
College of Information Engineering,Zhejiang University of Technology,Hangzhou 310000,China²

Online:2019-02-26 Published:2019-02-26

摘要/Abstract

摘要： 强化学习通过与环境的交互来学习行为策略。强化学习方法是在线的增量学习,易于实现。文中提出了基于函数近似的强化学习算法,并将其用于自适应交通信号灯控制。基于表格的强化学习需要完全的状态表征,随着车道数和路口数的增加,计算复杂度呈指数增长,即使中小规模的交通网络也很难实现,从而不能应用于实际的交通信号灯控制。因此文中使用基于特征的状态表征来有效地解决维数灾难问题;通过简便的方法获取车流的拥塞等级以及红灯的时长,使用函数近似定义Q值,进而实现高效的自适应控制。在GLD上的仿真实验结果验证了该自适应控制方法的有效性和可行性。

关键词: Q学习, 强化学习, 自适应交通灯控制

Abstract: Reinforcement learning(RL) learns the policy through interaction with the environment.RL algorithms are online,incremental,and easy to implement.This paper proposed a Q-learning algorithm with function approximation for adaptive traffic light control (TLC).The application of table-based Q-learning to traffic signal control requires full-state representations and cannot be implemented,even in moderate-sized road networks,because the computational complexity exponentially grows in the numbers of lanes and junctions.This paper tackledthe dimension disaster problem by effectively using feature-based state representations and used a broad characterization of the levels of congestion.The experiment results show that the proposed method is effective and feasible.

Key words: Adaptive traffic light control, Q-learning, Reinforcement learning

中图分类号:

TP181

李旻朔, 姚明海. 用于交通信号灯控制的特征表示近似Q学习[J]. 计算机科学, 2018, 45(11A): 143-145. https://doi.org/

LI Min-shuo, YAO Ming-hai. Q-learning with Feature-based Approximation for Traffic Light Control[J]. Computer Science, 2018, 45(11A): 143-145. https://doi.org/

参考文献

[1]ADAM I,WAHAB A,YAAKOP M,et al.Adaptive fuzzy logic traffic light management system[C]∥2014 4th International Conference on Engineering Technology and Technopreneuship (ICE2T).IEEE,2014:340-343.
[2]COOLS S B,GERSHENSON C,D’HOOGHE B.Self-Organi-zing Traffic Lights:A Realistic Simulation[J].Advances in Applied Self-Organizing Systems,2016,17(4):45-55.
[3]KAUR T,AGRAWAL S.Adaptive Traffic Lights Based on Hy-brid of Neural Network and Genetic Algorithm for Reduced Traffic Congestion[C]∥Recent Advances in Engineering and Computational Sciences (RAECS).2014:1-5.
[4]SRINIVASAN D,CHOY M C,CHEU R L.Neural Networks for Real-Time Traffic Signal Control[J].IEEE Transactions on Intelligent Transportation Systems,2006,7(3):261-272.
[5]SUTTON R S,BARTO A G.Introduction to reinforcement learning [J].IEEE Transactions on Neural Networks,1992,8(3/4):225-227.
[6]高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100.
[7]刘忠,李海红,刘全.强化学习算法研究[J].计算机工程与设计,2008,29(22):5805-5809.
[8]SALKHAM A,CUNNINGHAM R,GARG A,et al.A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization[C].IEEE/WIC/ACM International Conferent on Web Intelligence and Intelligent Agent Technology.2008:560-566.
[9]XIE Y C.Development and evaluation of an arterial adaptive traffic signal control system using reinforcement learning[OL].http://holl.hardle.net/1969.1/ETD-TAMU-2480.
[10]WATKINS C,DAYAN P.Q-learning [J].Machine Learning, 1992,8(3/4):279-292.

相关文章 15

[1]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[4]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[5]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[6]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[7]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[8]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[9]	郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[10]	范静宇, 刘全. 基于随机加权三重Q学习的异策略最大熵强化学习算法 Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning 计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[11]	张佳能, 李辉, 吴昊霖, 王壮. 一种平衡探索和利用的优先经验回放方法 Exploration and Exploitation Balanced Experience Replay 计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[12]	李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[13]	欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[14]	周琴, 罗飞, 丁炜超, 顾春华, 郑帅. 基于逐次超松弛技术的Double Speedy Q-Learning算法 Double Speedy Q-Learning Based on Successive Over Relaxation 计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173
[15]	李素, 宋宝燕, 李冬, 王俊陆. 面向金融活动的复合区块链关联事件溯源方法 Composite Blockchain Associated Event Tracing Method for Financial Activities 计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed