Computer Science ›› 2015, Vol. 42 ›› Issue (11): 37-42.doi: 10.11896/j.issn.1002-137X.2015.11.006

Previous Articles     Next Articles

Node-level Memory Access Optimization on Intel Knights Corner

LIN Xin-hua, LI Shuo, ZHAO Jia-ming and M ATSUOKA Satoshi   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Traditional programming optimization (TPO) has limited effects on Intel Knights Corner (KNC).Therefore,we proposed memory access optimization (MAO) for KNC.We applied MAO to TPO version of Diffusion 3D,and its performance is improved by 39.1%.We made two contributions in this paper:1) MAO is indispensable to KNC and TPO+MAO is the path to Ninja Performance—the best optimized performance.2) Intrinsic-based MAO is more efficient to stencil code than compiler-based MAO.Our findings on MAO will inspire optimizations of large-scale applications on KNC.

Key words: Traditional programming optimization(TPO),Intel Knights Corner(KNC),Memory access optimization(MAO),Ninja performance

[1] Satish N,Kim C,Chhugani J,et al.Can traditional programming bridge the Ninja performance gap for parallel computing applications?[C]∥2012 39th Annual International Symposium on Computer Architecture (ISCA).2012:440-451
[2] Xue W,Yang C,Fu H,et al.Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2[C]∥ Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium.2014
[3] Pennycook S J,Hughes C J,Smelyanskiy M,et al.ExploringSIMD for Molecular Dynamics,Using Intel Xeon Processors and Intel Xeon Phi Coprocessors[C]∥Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.2013:1085-1097
[4] Heinecke A,Vaidyanathan K,Smelyanskiy M,et al.Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel Xeon Phi Coprocessor[C]∥Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.2013:126-137
[5] Krishnaiyer R,Kultursay E,Chawla P,et al.Compiler-BasedData Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor[C]∥Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum.2013:1575-1586
[6] Hofmann J,Treibig J,Hager G,et al.Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accele-rator[C]∥2014 27th International Conference on Presented at the Architecture of Computing Systems (ARCS).2014:1-8
[7] Jeffers J,Reinders J.Intel Xeon Phi Coprocessor High Performance Programming(1st edition)[M].Morgan Kaufmann Publishers Inc,2013
[8] Rahman R.Intel Xeon Phi Coprocessor Architecture and Tools:The Guide for Application Developers[M]∥Intel Xeon Phi Coprocessor Architecture and Tools:The Guide for Application Developers(1st edition).2013
[9] Saini S,Jin H,Jespersen D,et al.An early performance evaluation of many integrated core architecture based SGI rackable computing system[C]∥Proceedings of the International Confe-rence on High Performance Computing,Networking,Storage and Analysis.2013
[10] Hofmann J.Performance Evaluation of the Intel ManyIntegrated Core Architecture for 3D Image Reconstruction in Computed Tomography(Master Thesis)[M].Friedrich-Alexander-University Erlangen-Nuremberg,2010
[11] Fang J,Sips H,Zhang L,et al.Test-driving Intel Xeon Phi[C]∥Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering.New York,USA,2014:137-148
[12] SHOC-MIC benchmark.https://github.com/vetter/shoc-mic
[13] Likwid.https://code.google.com/p/likwid/
[14] PAPI.http://icl.cs.utk.edu/papi/
[15] Ramos S,Hoefler T.Modeling communication in cache-coherent SMP systems:a case-study with Xeon Phi[C]∥Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing.New York,USA,2013:97
[16] Hoefler T,Gropp W,Kramer W,et al.Performance modeling for systematic performance tuning[C]∥2011 International Confe-rence for High Performance Computing,Networking,Storage and Analysis (SC).2011:1-12

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!