计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 321-328.doi: 10.11896/j.issn.1002-137X.2019.04.050
陶小涵, 庞建民, 高伟, 王琦, 姚金阳
TAO Xiao-han, PANG Jian-min, GAO Wei, WANG Qi, YAO Jin-yang
摘要: “神威·太湖之光”是中国自主研发的超级计算机,其处理器芯片为国人自主研发的SW26010异构众核处理器,每个处理器内含有4个核组,每个核组包括1个主核和64个从核。NPB-FT程序的功能是利用快速傅立叶变换求解三维偏微分方程,其被广泛用于评测集群的计算和集合能力,因此选用FT程序对“神威·太湖之光”提供的多层次并行资源和体系架构的性能进行测试具有重要的意义。首先,利用加速线程库将程序改写为主从版本,使计算核心能够在从核上执行;其次,利用从核的寄存器通信以及主从核间的数据传输通道,消除FT程序中的数据转置过程;然后,实现了计算与通信隐藏,避免了核间通信时核内的计算资源处于空闲状态;最后,利用向量化和指令流水技术,提升程序的数据级并行和指令级并行。实验结果为:单核上3D-32规模的加速比为66,64核上3D-512规模的加速比为20,256核上3D-2048规模的加速比为46。
中图分类号:
[1]DONGARRA J.Report on the Sunway Taihu Light System: UIEECS-16-742.Knoxville:University of Tennessee,2016. [2]HONG W J,LI K L,QUAN Z,et al.PETSc’s Heterogeneous Parallel Algorithm Design and Performance Optimization on the Sunway TaihuLight System[J].Chinese Journal of Computers,2017,40(9):2057-2069.(in Chinese) 洪文杰,李肯立,全哲,等.面向神威·太湖之光的PETSc可扩展异构并行算法及其性能优化[J].计算机学报,2017,40(9):2057-2069. [3]YUAN W,ZHANG Y Q,SUN J C,et al.Perfomance Analysis of NPB Benchmark on Domestic Tera-Scale Cluster Systems[J].Journal of Computer Research and Development,2005,42(6):1079-1084.(in Chinese) 袁伟,张云泉,孙家昶,等.国产万亿次机群系统NPB性能测试分析[J].计算机研究与发展,2005,42(6):1079-1084. [4]FANG W,SUN G Z,WU C,et al.A Parallel Algorithm of Three-Dimensional Fast Fourier Transform [J].Journal of Computer Research and Development,2011,48(3):440-446.(in Chinese) 方维,孙广中,吴超,等.一种三维快速傅里叶变换并行算法[J].计算机研究与发展,2011,48(3):440-446. [5]WU Y W.Research on Parallel Computing Model for CPU/GPU Heterogeneous System[D].Changsha:National University of Defense Technology,2012.(in Chinese) 吴勇文.CPU/GPU异构集群并行计算模型研究[D].长沙:国防科学技术大学,2012. [6]CHAO Y.Peta-scale fully-implicit solver for nonhydrostatic atmospheric dynamics with 8.5M Cores[C]∥Proc. of SC’16,2016. [7]ZHENG F,XU Y,LI H L,et al.A homegrown many-core processor architecture for high-performance computing[J].SCIENTIA SINICA Informations,2015,45(4):523-534.(in Chinese) 郑方,许勇,李宏亮,等.一种面向高性能计算的自主众核处理器结构[J].中国科学:信息科学,2015,45(4):523-534. [8]DONGARRA J.Sunway Taihu Light super-computer makes its appearance[J].National Science Review,2016,3(3):265-266. [9]YAO W J,CHEN J S,SU Z C,et al.Porting and optimizing of NAMD on Sunway Tai huLight System[J].Computer Engineering & Science,2017,39(6):1022-1030.(in Chinese) 姚文军,陈俊仕,苏志超,等.基于神威太湖之光的NAMD软件的移植与优化[J].计算机工程与科学,2017,39(6):1022-1030. [10]FU H H,LIAO J F,YANG J Z,et al.The Sunway TaihuLight supercomputer:system and applications[J].Science China Information Sciences,2016,59(7):072001:1-072001:16. [11]YAO W J.Implementation and Optimization of Molecular Dynamics Application on Sunway TaihuLight Supercomputer[D].Hefei:University of Science and Technology of China,2017.(in Chinese) 姚文军.神威·太湖之光上分子动力学软件的实现与优化[D].合肥:中国科学技术大学,2017. [12]ZHAO M T,LIU Y,LIU R,et al.Acceleration of histogram of oriented gradient (HOG) based on Sunway many-core processor[J].Computer Engineering & Science,2017,39(4):611-618.(in Chinese) 赵美婷,刘轶,刘锐,等.基于申威众核处理器的HOG特征提取算法并行加速[J].计算机工程与科学,2017,39(4):611-618. [13]WU M C,HUANG L,LIU Y,et al.An OpenCL Compiler for the Homegrown Heterogeneous Many-cor Processor on the Sunway TaihuLight Supercomputer[J].Chinese Journal of Computers,2018,41(10):2236-2250.(in Chinese) 伍明川,黄磊,刘颖,等.面向神威·太湖之光的国产异构众核处理器OpenCL编译系统[J].计算机学报,2018,41(10):2236-2250. [14]SCHLEGEL B,GEMULLA R,LEHNER W.Fast integer compression using SIMD instructions[C]∥International Workshop on Data Management on New Hardware.ACM,2010:34-40. [15]STOJANOV A,TOSKOV I,ROMPF T,et al.SIMD intrinsics on managed language runtimes[C]∥International Symposium.2018:2-15. [16]MENG D L,WEN M H,WEI J W,et al.Porting and Optimizing OpenFOAM on Sunway TaihuLight System[J].Computer Science,2017,44(10):64-70.(in Chinese) 孟德龙,文敏华,韦建文,林新华.神威太湖之光上OpenFOAM的移植与优化[J].计算机科学,2017,44(10):64-70. |
[1] | 谢景明, 胡伟方, 韩林, 赵荣彩, 荆丽娜. 基于“嵩山”超级计算机系统的量子傅里叶变换模拟 Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System 计算机科学, 2021, 48(12): 36-42. https://doi.org/10.11896/jsjkx.201200023 |
[2] | 郭超,杨燕,金炜东. 基于EDBN-SVM的高速列车故障分析 Fault Analysis of High Speed Train Based on EDBN-SVM 计算机科学, 2016, 43(12): 281-286. https://doi.org/10.11896/j.issn.1002-137X.2016.12.052 |
[3] | 李焱,张云泉,王可,赵美超. 异构平台上基于OpenCL的FFT实现与优化 Implementation and Optimization of the FFT Using OpenCL on Heterogeneous Platforms 计算机科学, 2011, 38(8): 284-286. |
[4] | 孙菁,杨静宇,傅德胜. 彩色图像四元数频域幅值调制水印算法 Watermarking Algorithm for Color Images Based on Quaternion Frequency Modulation 计算机科学, 2011, 38(3): 123-126. |
[5] | 马洁,李建福. 基于混沌映射的视频数字水印算法 Novel Video Watermarking Algorithm Based on MPEG7 Contour Description 计算机科学, 2010, 37(9): 287-289. |
[6] | 王彦伟,黄正东,马露杰. 基于FFT的三维CAD模型形状描述 Shape Description of 3D CAD Models Using FFT 计算机科学, 2010, 37(7): 251-254259. |
[7] | 楼天良. 快速傅里叶变换的DSP实现及代码优化 计算机科学, 2008, 35(7): 255-256. |
|