使用Stencil评估Intel AVX2 Vgather指令

doi:10.11896/j.issn.1002-137X.2017.01.004

计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 20-24.doi: 10.11896/j.issn.1002-137X.2017.01.004

使用Stencil评估Intel AVX2 Vgather指令

林新华,秦强,李硕,文敏华,松岗聪

上海交通大学高性能计算中心上海200240;东京工业大学学术国际情报中心东京152-8550,上海交通大学高性能计算中心上海200240,Intel公司软件与服务部门波特兰999039,上海交通大学高性能计算中心上海200240,东京工业大学学术国际情报中心东京152-8550

出版日期:2018-11-13 发布日期:2018-11-13
基金资助:
本文受国家重点研发计划(2014AA01A302,2016YFB0201800),日本学术振兴会RONPAKU Fellowship资助

Evaluating Intel AVX2 Vgather Instructions with Stencils

LIN Xin-hua, QIN Qiang, LI Shuo, WEN Min-hua and MATSUOKA Satoshi

Online:2018-11-13 Published:2018-11-13

摘要/Abstract

摘要： 为了更好地在向量化时读取离散的数据,Intel在Haswell CPU提供了AVX2vgather指令。由于Stencil在设置边界条件时使用了条件判断,因此编译器生成了vgather指令,并降低了Stencil在Haswell上的性能。提出使用peel优化或intrinsic load的方法来避免vgather指令的生成,并把该方法应用到3个Stencil基准算例、长程Stencil 程序3DFD以及混合Stencil应用3DEW上。这些Stencil在Haswell上的性能都获得了1.22X至3.88X不等的提升。通过研究指令的实现,发现vgather指令会被解码成多个微操作(μops),并为每个要读入的元素生成一个μops。由于vgather指令解码时会产生较高的开销,导致vgather指令成为Stencil在Haswell上的性能瓶颈。了解AVX2 vgather指令的实现以及掌握避免生成vgather指令的优化方法,对在Haswell上调优具有良好空间局部性应用的性能有一定的参考价值。

关键词: AVX2 vgather指令,Stencil,性能评估

Abstract: Intel provided AVX2 vgather instruction on Haswell CPU to better support reading discontinued data in vectorization.We found the compiler generates vgather instructions,which slow down the performance of Stencil on Haswell,because the branches exist in defining boundary condition of Stencils.We proposed to utilize peel optimization or intrinsic load to avoid these vgather instructions.We applied these optimizations to three Stencil benchmarks,a long-range Stencil 3DFD,and a hybrid Stencil application,and archived the speedup from 1.22X to 3.88X on Haswell.By ana-lyzing the implementation of the instruction,we found the vgather instructions are decoded into multiple micro-operations (μops),and the instructions generate one μops for each element to be gathered.Due to the high overhead of deco-der,the vgather instructions become the performance bottleneck of Stencils on Haswell.It is believed that the understanding of the implementation of AVX2 vgather instructions and adopting the optimizations to avoid the vgather instructions are quite helpful for performance tuning the applications with good spatial locality on Haswell.

Key words: AVX2 vgather,Stencil,Performance evaluation

林新华,秦强,李硕,文敏华,松岗聪. 使用Stencil评估Intel AVX2 Vgather指令[J]. 计算机科学, 2017, 44(1): 20-24. https://doi.org/10.11896/j.issn.1002-137X.2017.01.004

LIN Xin-hua, QIN Qiang, LI Shuo, WEN Min-hua and MATSUOKA Satoshi. Evaluating Intel AVX2 Vgather Instructions with Stencils[J]. Computer Science, 2017, 44(1): 20-24. https://doi.org/10.11896/j.issn.1002-137X.2017.01.004

参考文献

[1] HOFMANN J,TREIBIG J,HAGER G,et al.Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator[C]∥27th International Conference on Architecture of Computing Systems (ARCS2014).VDE,2014.
[2] PENNYCOOK S J,HUGHES C J,Smelyanskiy M,et al.Exploring SIMD for Molecular Dynamics,Using Intel Xeon Processors and Intel Xeon Phi Coprocessors[C]∥IPDPS’13.IEEE,2013:1085-1097.
[3] HOFMANN J,TREIBIG J,HAGER G,et al.Comparing the per-formance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycorechips[C]∥Proceedings of the 2014 Workshop on Programming models for SIMD/Vector Processing(WPMVP’14).New York,2014.
[4] KUSSWURM D.Modern X86 Assembly Language Program-ming 32bit,64bit,SSE,and AVX[M].Apress,2014.
[5] IACA.https://software.intel.com/en-us/articles/intel-ar-chitecture-code-analyzer.
[6] 3DFD.https://software.intel.com/en-us/articles/eight-optimizations-for-3-dimensional-finite-difference-3dfd-code-with-an-isotropic-iso.
[7] ZHANG C W J,TIAN Z.P- and s-wave separated elastic wave equation numerical modeling using 2d staggered-grid[C]∥SEG/San Antonio 2007 Annual Meeting.2007.
[8] AVX2-vgather的部分源代码以及IACA结果.https://github.com/jameslinsjtu/AVX2-vgather.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

使用Stencil评估Intel AVX2 Vgather指令

Evaluating Intel AVX2 Vgather Instructions with Stencils

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0