计算机科学 ›› 2015, Vol. 42 ›› Issue (7): 229-233.doi: 10.11896/j.issn.1002-137X.2015.07.049

• 软件与数据库技术 • 上一篇    下一篇

SIMD向量指令的非满载使用方法研究

徐金龙 赵荣彩 赵 博   

  1. 信息工程大学数学工程与先进计算国家重点实验室 郑州450001
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家高技术研究发展计划(863)(2009AA01220),“核高基”重大专项(2009zx10036-001-001)资助

Research on Non-full Length Usage of SIMD Vector Instruction

XU Jin-long ZHAO Rong-cai ZHAO Bo   

  • Online:2018-11-14 Published:2018-11-14

摘要: 大规模SIMD体系结构提供了更强的向量并行硬件支持,但是,大量迭代次数不足的循环由于不能提供足够的并行性,难以用等价的向量方式实现。为了更有效地利用SIMD,提出了一种非满载地使用SIMD指令的向量化方法。研究了向量寄存器的使用方式,基于非满载的向量寄存器使用方式实现了非满载的向量操作和短循环的向量化,并将非满载的向量化方法用于一般循环的向量化。提供了收益分析方法来为本向量化方法作精确指导。实验结果表明了该方法的有效性,所选测试用例的目标循环被向量化,平均加速比达到1.2。

关键词: 大规模SIMD,并行,向量化,非满载向量操作,收益分析

Abstract: Large-scale SIMD architecture provides stronger vector parallel support on hardware.However,a large number of loops which are short of iterations can not provide sufficient parallelism,and it is difficult to achieve them with the equivalent vector mode.In order to make full use of SIMD,this paper presented a vectorization method which can use non-full length of SIMD vector instruction.This paper studied the vector register usage,achieved a non-full vector operation based on non-full length usage of vector register,which can vectorize short loops.Finally,this method was used to vectorize the common loops.Moreover,This paper provided a benefit analysis method to guide the vectorization method.Experimental results show that the method is available,the target loops of the selected test programs are vectorized and the average speedup is about 1.2.

Key words: Large-scale SIMD,Parallel,Vectorization,Non-full vector operation,Benefit analysis

[1] 魏帅.面向SIMD的向量化算法及重组技术研究[D].郑州:解放军信息工程大学,2012 Wei Shuai.Reaserch of SIMD Vectorization Algorithm and Optimization[D].Zhengzhou:PLA Information Engineering University,2012
[2] Peleg A,Weiser U.MMX Technology Extension to the IntelArchitecture[J].IEEE/ACM International Symposium on Microarchitecture,1996,16(4):42-50
[3] Intel Corporation.Intel 64 and IA-32 Architectures Software Developer’s Manual[EB/OL].http://www.intel.com/Assets/PDF/manual/252046.pdf,2011
[4] Reinders J.AVX-512 instructions[EB/OL].https://software.intel.com/en-us/blogs/2013/avx-512-instructions,2013
[5] Reinders J.Additional AVX-512 instructions[EB/OL].https://software.intel.com/en-us/blogs/additional-avx-512-instructions,2014
[6] 辛乃军,陈旭灿,孙海燕,等.基于 GCC 的高性能 DSP Matrix向量指令集扩展[J].计算机工程与科学,2012,34(1):58-63 Xin Nai-jun,Chen Xu-can,Sun Hai-yan,et al.Extending the Vector Instruction Set for High-Performance DSP Matrixes Based on GCC[J].Computer Engineering and Science,2012,34(1):58-63
[7] Intel Corporation.IA32 Intel Architecture Software Developer’s Manual,Volume 1:Basic Architecture[M].Intel Press,2004
[8] SIMD [EB/OL].http://en.wikipedia.org/wiki/SIMD.2014
[9] Allen R,Kennedyk.现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等译.北京:机械工业出版社,2004 Allen R,Kennedy K.Optimizing compilers for modern architectures:a dependence-based approach[M].Zhang zhao-qing,Qiao Ru-liang,Feng Xiao-bing.San Francisco:Morgan Kaufmann,2002
[10] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[C]∥Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation,2000:145-156
[11] Prieto M,Piuel L,Catthoor F,et al.Improving superword level parallelism support in modern compilers[C]∥Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS’05).IEEE,2005:303-308
[12] Barik R,Zhao J,Sarkar V.Efficient selection of vector instructions using dynamic programming[C]∥2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).IEEE,2010:201-212
[13] Kudriavtsev A,Kogge P.Generation of permutations for SIMD processors[J].ACM SIGPLAN Notices,ACM,2005,40(7):147-156
[14] Manniesing R,Karkowski I,Corporaal H.Automatic SIMD paral-lelization of embedded applications based on pattern recognition[C]∥Euro-Par 2000 Parallel Processing.Springer Berlin Heidelberg,2000:349-356

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!