计算机科学 ›› 2015, Vol. 42 ›› Issue (12): 18-22.

• 目次 • 上一篇    下一篇

SIMD代码中的向量访存优化研究

徐金龙,赵荣彩,徐晓燕   

  1. 信息工程大学数学工程与先进计算国家重点实验室 郑州450001,信息工程大学数学工程与先进计算国家重点实验室 郑州450001,信息工程大学数学工程与先进计算国家重点实验室 郑州450001
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家高技术研究发展计划(863)(2009AA01220),“核高基”重大专项(2009zx10036-001-001)资助

Memory Access Optimization for Vector Program of SIMD Form

XU Jin-long, ZHAO Rong-cai and XU Xiao-yan   

  • Online:2018-11-14 Published:2018-11-14

摘要: 向量程序来源于手工编写或由编译器自动生成。受限于编程人员和并行编译器的能力,得到的向量程序都存在一定的优化空间。优化编译器通常关注如何将串行程序向量化,但很少对向量程序进行优化。因此,提出了一种针对SIMD代码的向量访存优化方法。该方法首先分析程序是否需要优化,若存在需求,则对程序同时进行深度冗余优化和对齐优化。实验数据显示,提出的方法可以明显提高程序的运行效率,达到了目标。

关键词: 向量化,SIMD,访存冗余,对齐优化

Abstract: There are two ways to get vector program,one is handwritten,and the other is generated automatically by the compiler.Limited to programmers and parallel compiler’ ability,there always is some optimization space in vector program.The optimizing compiler concernes most about how to transform the serial program into vector form,rarely do further optimization after the vector form generating.We proposed a memory access optimization method for vector program of SIMD form.Firstly it determines that whether the program needs to be optimized.If optimization is needed,redundancy optimization and align optimization will be implemented for the vector form program.Experimental data show that the proposed method can significantly improve the running efficiency of the program,and the goal is achieved.

Key words: Vectorization,Single instruction multiple data,Memory access redundant,Alignment optimization

[1] 李春江,黄娟娟,徐颖,等.典型编译器自动向量化效果评估与分析[J].计算机科学,2013,40(4):41-46 Li Chun-jiang,Huang Juan-juan,Xu Ying,et al.Evaluation and Analysis of Effects of Auto-vectorization in Typical Complier[J].Computer Science,2013,40(4):41-46
[2] Allen R,Kennedy K.现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等译.北京:机械工业出版社,2004 Allen R,Kennedy K.Optimizing compilers modern architectures [M].Zhang Zhao-qin,Qiao Ru-liang,Feng Xiao-bing,et al,eds.Beijing:China Machine Press,2004
[3] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[C]∥Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.2000:145-156
[4] Boekhold M,Karkowski I,Corporaal H.Transforming and parallelizing ANSI C programs using pattern recognition[C]∥Lecture Notes in Computer Science.1999
[5] Manniesing R,Karkowski I,Corporaal H.Automatic SIMD parallelization of embedded applications based on pattern recognition[C]∥Proceedings of 6th International Euro-Par Confe-rence.2000:349-356
[6] Henretty T,Veras R,Franchetti F,et al.A stencil compiler for short-vector simd architectures[C]∥Proceedings of the 27th International ACM Conference on International Conference on Supercomputing.ACM,2013:13-24
[7] Kong M,Veras R,Stock K,et al.When polyhedral transformations meet SIMD code generation[J].ACM SIGPLAN Notices,2013,48(6):127-138
[8] Bondhugula U,Gunluk O,Dash S,et al.A model for fusion and code motion in an automatic parallelizing compiler[C]∥Proceedings of the 19th international conference on Parallel architectures and compilation techniques.ACM,2010:343-352
[9] Rosen I,Nuzman D,Zaks A.Loop-aware SLP in GCC[C]∥GCC summit.2007:131-142
[10] Nuzman D,Rosen I,Zaks A.Auto-vectorization of interleaveddata for SIMD[J]∥ACM SIGPLAN Notices,2006,41(6):132-143
[11] 何颂颂,顾乃杰,任开新.一种面向数据密集型应用的并行程序执行模型[J].小型微型计算机系统,2013,34(7):1457-1461 He Song-song,Gu Nai-jie,Ren Kai-xin.Parallel Program Execution Model for Data-intensive Applications[J].Journal of Chinese Computer Systems,2013,34(7):1457-1461
[12] Open64.Overview of the open64 Compiler Infrastructure[EB/OL].http://open64.sourceforge.net,2006

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!