计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 64-70.doi: 10.11896/j.issn.1002-137X.2017.10.012
孟德龙,文敏华,韦建文,林新华
MENG De-long, WEN Min-hua, WEI Jian-wen and James LIN
摘要: 神威太湖之光是最新一期Top500榜单上排名第一的超级计算机,峰值性能为125.4 PFlops,其计算能力主要归功于国产SW26010众核处理器。OpenFOAM(Open Source Field Operation and Manipulation)是计算流体力学领域使用最广泛的开源软件包,但是由于其基于C++实现,与神威太湖之光上的异构众核处理器SW26010的编译器不兼容,因此无法直接在该架构上有效运行。基于SW26010的主核/从核的体系架构移植了OpenFOAM的核心计算代码,并采用混合语言编程实现的方式来解决编译不兼容的问题。此外,通过寄存器通信、向量化和双缓冲等优化手段,单核组的性能较优化后的主核代码提高了8.03倍,较Intel(R) Xeon(R) CPU E5-2695 v3的串行执行性能提高了1.18倍。同时,将单核组的实现扩展到了神威太湖之光的大规模集群上,并进行了强可扩展性测试,256个核组上实现了184.9倍的加速。采用的移植方式和优化手段也可以为其他复杂C++程序在神威太湖之光上的应用提供借鉴。
[1] ANDERSON J D,WENDT J.Computational fluid dynamics[M].New York:McGraw-Hill,1995. [2] ALONAZI A A.Design and optimization of openfoam-basedCFD applications for modern hybrid and heterogeneous HPC platforms[D].King Abdullah University of Science and Technology,2014. [3] WELLER H G,TABOR G,JASAK H,et al.A tensorial approach to computational continuum mechanics using object-oriented techniques[J].Computers in Physics,1998,12(6):620-631. [4] DONGARRA J.Report on the Sunway TaihuLight System.http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-report-2016.pdf. [5] FU H,LIAO J,YANG J,et al.The Sunway TaihuLight supercomputer:system and applications[J].Science China Information Sciences,2016,59(7):072001. [6] ZHENG F,ZHANG K,WU G M,et al.Architecture Techni-ques of Many-Core Processor for Energy-Efficient in High Performance Computing[J].Chinese Journal of Computers,2014,7(10):2176-2186.(in Chinese) 郑方,张昆,邬贵明,等.面向高性能计算的众核处理器结构级高能效技术[J].计算机学报,2014,37(10):2176-2186. [7] BELL N,GARLAND M.Implementing sparse matrix-vectormultiplication on throughput-oriented processors[C]∥Procee-dings of the Conference on High Performance Computing Networking,Storage and Analysis.ACM,2009:18. [8] HARRIS M.Optimizing parallel reduction in CUDA[J].NVIDIA Developer Technology,2007,2(4):511-519. [9] KLCKNER A.Iterative CUDA .http://mathema.tician.de/software/iterative-cuda. [10] THIBAULT J C,SENOCAK I.CUDA implementation of aNavier-Stokes solver on multi-GPU desktop platforms for incompressible flows[C]∥Proceedings of the 47th AIAA Aerospace Sciences Meeting.2009:1-15. [11] TLKE J.Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA[J].Computing and Visualization in Science,2010,13(1):29-39. [12] KRAWEZIK G P,POOLE G.Accelerating the ANSYS direct sparse solver with GPUs[C]∥Proc.Symposium on Application Accelerators in High Performance Computing (SAAHPC).NCSA,Urbana-Champaign,2009. [13] COMBEST D P,DAY J.Cufflink:a library for linking numerical methods based on cuda c/c++ with openfoam[J/OL].http://cufflink-library.googlecode.com. [14] YING Z.Research on Acceleration of Openfoam Based on GPU[D].Shanghai:Shanghai Jiao Tong University,2012.(in Chinese) 应智.基于 GPU 的 OpenFOAM 并行加速研究[D].上海:上海交通大学,2012. [15] HE X,ZHOU M Z,LIU X.Design and Implementation of Multi-level Heterogenous Parallel Algorithm of 3D Acoustic Wave Equation Forwarded[J].Computer Applications and Software,2014,1(1):264-267.(in Chinese) 何香,周明忠,刘鑫.三维声波方程正演多级异构并行算法设计与实现[J].计算机应用与软件,2014,31(1):264-267. [16] XU J C,GUO S Z,HUANG Y Z,et al.Access Optimization Technique for Mathematical Library of Slave Processors on He-terogeneous Many-core Architectures[J].Computer Science,2014,1(6):12-17.(in Chinese) 许瑾晨,郭绍忠,黄永忠,等.面向异构众核从核的数学函数库访存优化方法[J].计算机科学,2014,41(6):12-17. |
No related articles found! |
|