计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 37-42.doi: 10.11896/j.issn.1002-137X.2015.11.006
林新华,李 硕,赵嘉明,松岗聪
LIN Xin-hua, LI Shuo, ZHAO Jia-ming and M ATSUOKA Satoshi
摘要: 传统编程优化(Traditional Programming Optimization,TPO)在Intel Knights Corner(KNC)上收效甚微,因此提出内存访问优化(Memory Access Optimization,MAO)。将MAO应用到已经过TPO的程序Diffusion 3D上,发现其性能仍然提高了39.1%。主要有2个贡献:1)提出MAO,认为TPO+MAO有助于在KNC上获取最优化性能;2)发现对于stencil代码,基于intrinsic的MAO比基于编译器的MAO更高效。这些发现对于在KNC上优化大规模应用有启发意义。
[1] Satish N,Kim C,Chhugani J,et al.Can traditional programming bridge the Ninja performance gap for parallel computing applications?[C]∥2012 39th Annual International Symposium on Computer Architecture (ISCA).2012:440-451 [2] Xue W,Yang C,Fu H,et al.Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2[C]∥ Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium.2014 [3] Pennycook S J,Hughes C J,Smelyanskiy M,et al.ExploringSIMD for Molecular Dynamics,Using Intel Xeon Processors and Intel Xeon Phi Coprocessors[C]∥Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.2013:1085-1097 [4] Heinecke A,Vaidyanathan K,Smelyanskiy M,et al.Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel Xeon Phi Coprocessor[C]∥Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.2013:126-137 [5] Krishnaiyer R,Kultursay E,Chawla P,et al.Compiler-BasedData Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor[C]∥Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum.2013:1575-1586 [6] Hofmann J,Treibig J,Hager G,et al.Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accele-rator[C]∥2014 27th International Conference on Presented at the Architecture of Computing Systems (ARCS).2014:1-8 [7] Jeffers J,Reinders J.Intel Xeon Phi Coprocessor High Performance Programming(1st edition)[M].Morgan Kaufmann Publishers Inc,2013 [8] Rahman R.Intel Xeon Phi Coprocessor Architecture and Tools:The Guide for Application Developers[M]∥Intel Xeon Phi Coprocessor Architecture and Tools:The Guide for Application Developers(1st edition).2013 [9] Saini S,Jin H,Jespersen D,et al.An early performance evaluation of many integrated core architecture based SGI rackable computing system[C]∥Proceedings of the International Confe-rence on High Performance Computing,Networking,Storage and Analysis.2013 [10] Hofmann J.Performance Evaluation of the Intel ManyIntegrated Core Architecture for 3D Image Reconstruction in Computed Tomography(Master Thesis)[M].Friedrich-Alexander-University Erlangen-Nuremberg,2010 [11] Fang J,Sips H,Zhang L,et al.Test-driving Intel Xeon Phi[C]∥Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering.New York,USA,2014:137-148 [12] SHOC-MIC benchmark.https://github.com/vetter/shoc-mic [13] Likwid.https://code.google.com/p/likwid/ [14] PAPI.http://icl.cs.utk.edu/papi/ [15] Ramos S,Hoefler T.Modeling communication in cache-coherent SMP systems:a case-study with Xeon Phi[C]∥Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing.New York,USA,2013:97 [16] Hoefler T,Gropp W,Kramer W,et al.Performance modeling for systematic performance tuning[C]∥2011 International Confe-rence for High Performance Computing,Networking,Storage and Analysis (SC).2011:1-12 |
No related articles found! |
|