计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 327-331.doi: 10.11896/j.issn.1002-137X.2019.03.048

• 交叉与前沿 • 上一篇    下一篇

利用空间优化的增强学习Sarsa改进预取算法

梁媛,袁景凌,陈旻骋   

  1. (武汉理工大学计算机科学与技术学院 武汉 430070)
  • 收稿日期:2018-01-26 修回日期:2018-04-01 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 袁景凌(1975-),女,教授,博士生导师,CCF会员,主要研究方向为绿色计算、机器学习和数据挖掘,E-mail:yuanjingling@126.com(通信作者)
  • 作者简介:梁媛(1994-),女,硕士生,CCF学生会员,主要研究方向为机器学习,E-mail:lyuan0416@whut.edu.cn;陈旻骋(1990-),男,博士生,主要研究方向为机器学习、数据挖掘。
  • 基金资助:
    国家自然科学基金(61303029),湖北省自然科学基金重点类项目——创新群体项目(2017CFA012),湖北省技术创新专项重大项目(2017AAA122)资助

Prefetching Algorithm of Sarsa Learning Based on Space Optimization

LIANG Yuan, YUAN Jing-ling, CHEN Min-cheng   

  1. School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China
  • Received:2018-01-26 Revised:2018-04-01 Online:2019-03-15 Published:2019-03-22

摘要: 数据中心是高性能计算机的集群中心,CPU集群运行繁忙,不规则的数据结构和算法频繁使用,使得大多数基于时空局部性的预取技术不再适用。文中引用语义局部性的概念,使用增强学习Sarsa算法来近似语义位置,预测不规则数据结构和算法未来的内存访问。由于状态空间和动态空间过大,采用Deep Q-learning方法优化状态-动作空间,将新状态与旧状态拟合,相似则采取相似的做法,从而提高泛化能力。在标准数据集SPECCPU 2006上的实验证明,所提方法的泛化能力强,能够有效提高Cache的命中率。

关键词: Deep Q-learning, Sarsa, 语义局部性, 预取技术, 状态-动作空间优化

Abstract: As the cluster centers for high-performance computers,data centers are busy with CPU clusters.Irregular data structures and algorithms are frequently used,so that most prefetching technologies based on spatio-temporal locality are no longer applicable.This paper referred to the concept of semantic locality,used the reinforcement learning Sarsa algorithm to approximate semantic locations,and predicted irregular data structures and future memory accesses of algorithms.Due to the large state space and action space,this paper used Deep Q-learning method to optimize the State-action space to fit the new state with the old one and took a similar approach if the two states are similar to improve the generalization ability.The experiment on the standard data set SPECCPU 2006 proves that this method has a widegene-ralization ability and can improve the Cache hit rate effectively.

Key words: Deep Q-learning, Prefetching technology, Sarsa, Semantic locality, State-action space optimization

中图分类号: 

  • TP302
[1]SMITH A J.Cache Memories[J].Acm Computing Surveys,
1982,14(3):473-530.
[2]WENISCH T F.Temporal memory streaming[D].Pittsburgh:Carnegie Mellon University,2007.
[3]SOMOGYI S,WENISCH T F,AILAMAKI A,et al.Spatio-
temporal memory streaming[J].Acm Sigarch Computer Architecture News,2009,37(3):69-80.
[4]COOKSEY R N,JOURDAN S J.Method and apparatus for
next-line prefetching from a predicted memory address:US,US 7093077 B2[P].2006.
[5]JOSEPH D,GRUNWALD D.Prefetching using Markov predictors[C]∥Proceedings of the 24th Annual International Sympo-sium on Computer Architecture.IEEE,1997:252-263.
[6]SOMOGYI S,WENISCH T F,AILAMAKI A,et al.Spatial
Memory Streaming[C]∥International Symposium on Computer Architecture.IEEE,2006:252-263.
[7]SOMOGYI S,WENISCH T F,AILAMAKI A,et al.Spatio-
temporal memory streaming[J].AcmSigarch Computer Architecture News,2009,37(3):69-80.
[8]PELED L,MANNOR S,WEISER U,et al.Semantic locality and context-based prefetching using reinforcement learning[C]∥International Symposium on Computer Architecture.ACM,2015:285-297.
[9]MOZER S,M C,HASSELMO M.Reinforcement Learning:An Introduction[J].IEEE Transactions on Neural Networks,2005,16(1):285-286.
[10]IPEK E,MUTLU O,CARUANA R.Self-Optimizing Memory Controllers:A Reinforcement Learning Approach[C]∥International Symposium on Computer Architecture.IEEE,2008:39-50.
[11]SUTTON R S,MCALLESTER D,SINGH S,et al.Policy Gradient Methods for Reinforcement Learning with Function Approximation[J].Advances in Neural Information Processing Systems,2000,12:1057-1063.
[12]TAN F,YAN P,GUAN X.Deep Reinforcement Learning:From Q-Learning to Deep Q-Learning[C]∥International Conference on Neural Information Processing.Springer,Cham,2017:475-483.
[13]LIN L M,WANG H,WANG Y X.Sarsa Reinforcement Lear-
ning Algorithm Based on Neural Networks[J].Computer Technology & Development,2006(1):30-32.(in Chinese)
林联明,王浩,王一雄.基于神经网络的Sarsa强化学习算法.计算机技术与发展,2006(1):30-32.
[14]DURYEA E,GANGER M,HU W.Exploring Deep Reinforcement Learning with Multi Q-Learning[J].Intelligent Control & Automation,2016,7(4):129-144.
[15]HENNING J L.SPEC CPU2006 benchmark descriptions[J].
Acm Sigarch Computer Architecture News,2006,34(4):1-17.
[1] 李斌, 刘全.
基于最小二乘的双权重学习法
Double Weighted Learning Algorithm Based on Least Squares
计算机科学, 2020, 47(12): 210-217. https://doi.org/10.11896/jsjkx.191100084
[2] 曹新平 刘美华 韩真 古志民 张建鑫.
预取技术研究进展

计算机科学, 2003, 30(8): 28-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!