SIMD向量指令的非满载使用方法研究

doi:10.11896/j.issn.1002-137X.2015.07.049

Abstract

Abstract: Large-scale SIMD architecture provides stronger vector parallel support on hardware.However,a large number of loops which are short of iterations can not provide sufficient parallelism,and it is difficult to achieve them with the equivalent vector mode.In order to make full use of SIMD,this paper presented a vectorization method which can use non-full length of SIMD vector instruction.This paper studied the vector register usage,achieved a non-full vector operation based on non-full length usage of vector register,which can vectorize short loops.Finally,this method was used to vectorize the common loops.Moreover,This paper provided a benefit analysis method to guide the vectorization method.Experimental results show that the method is available,the target loops of the selected test programs are vectorized and the average speedup is about 1.2.

Key words: Large-scale SIMD,Parallel,Vectorization,Non-full vector operation,Benefit analysis

XU Jin-long ZHAO Rong-cai ZHAO Bo. Research on Non-full Length Usage of SIMD Vector Instruction[J].Computer Science, 2015, 42(7): 229-233.

References

[1] 魏帅.面向SIMD的向量化算法及重组技术研究[D].郑州:解放军信息工程大学,2012 Wei Shuai.Reaserch of SIMD Vectorization Algorithm and Optimization[D].Zhengzhou:PLA Information Engineering University,2012
[2] Peleg A,Weiser U.MMX Technology Extension to the IntelArchitecture[J].IEEE/ACM International Symposium on Microarchitecture,1996,16(4):42-50
[3] Intel Corporation.Intel 64 and IA-32 Architectures Software Developer’s Manual[EB/OL].http://www.intel.com/Assets/PDF/manual/252046.pdf,2011
[4] Reinders J.AVX-512 instructions[EB/OL].https://software.intel.com/en-us/blogs/2013/avx-512-instructions,2013
[5] Reinders J.Additional AVX-512 instructions[EB/OL].https://software.intel.com/en-us/blogs/additional-avx-512-instructions,2014
[6] 辛乃军,陈旭灿,孙海燕,等.基于 GCC 的高性能 DSP Matrix向量指令集扩展[J].计算机工程与科学,2012,34(1):58-63 Xin Nai-jun,Chen Xu-can,Sun Hai-yan,et al.Extending the Vector Instruction Set for High-Performance DSP Matrixes Based on GCC[J].Computer Engineering and Science,2012,34(1):58-63
[7] Intel Corporation.IA32 Intel Architecture Software Developer’s Manual,Volume 1:Basic Architecture[M].Intel Press,2004
[8] SIMD [EB/OL].http://en.wikipedia.org/wiki/SIMD.2014
[9] Allen R,Kennedyk.现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等译.北京:机械工业出版社,2004 Allen R,Kennedy K.Optimizing compilers for modern architectures:a dependence-based approach[M].Zhang zhao-qing,Qiao Ru-liang,Feng Xiao-bing.San Francisco:Morgan Kaufmann,2002
[10] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[C]∥Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation,2000:145-156
[11] Prieto M,Piuel L,Catthoor F,et al.Improving superword level parallelism support in modern compilers[C]∥Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS’05).IEEE,2005:303-308
[12] Barik R,Zhao J,Sarkar V.Efficient selection of vector instructions using dynamic programming[C]∥2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).IEEE,2010:201-212
[13] Kudriavtsev A,Kogge P.Generation of permutations for SIMD processors[J].ACM SIGPLAN Notices,ACM,2005,40(7):147-156
[14] Manniesing R,Karkowski I,Corporaal H.Automatic SIMD paral-lelization of embedded applications based on pattern recognition[C]∥Euro-Par 2000 Parallel Processing.Springer Berlin Heidelberg,2000:349-356

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Non-full Length Usage of SIMD Vector Instruction

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0