计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 95-99.doi: 10.11896/j.issn.1002-137X.2019.08.015
王一超1, 廖秋承1, 左思成2, 谢锐1, 林新华1
WANG Yi-chao1, LIAO Qiu-cheng1, ZUO Si-cheng2, XIE Rui1, LIN Xin-hua1
摘要: 为探索ARM架构在高效能“绿色计算”领域中,面向高性能计算的应用价值,对一款ARM指令集处理器进行性能评估,并将其与主流商用处理器Intel Xeon进行性能对比。在微架构上,测试了该处理器的浮点数计算能力、访存带宽及延迟。实验结果显示,该处理器的双精度浮点计算能力约为475 GFLOPS,相较于Xeon E5-2680v3,低了33%,访存带宽约为105GB/s,优于Xeon平台。在应用层面,选取4个高性能计算领域的典型应用,包含Stencil并行计算方法等,在该处理器实现移植和编译,并采用线程绑定的运行方法,提升缓存局部性,优化计算性能。实验结果显示,ARM指令集处理器的应用移植简单,其优化思路与主流商用处理器(如Intel Xeon)类似,但在计算密集和随机访存型应用上存在提升空间,在Stencil应用上性能近似,结合低功耗特点,在“绿色计算”领域具有竞争力。后续将持续基于最新的ARM指令集芯片做相关研究。
中图分类号:
[1]JACKSON A,TURNER A,WEILAND M,et al.Evaluating the Arm Ecosystem for High Performance Computing[C]∥Platform for Advanced Scientific Computing (PASC) Conference.Zurich,Swiss:ACM,2019:1-18. [2]MCINTOSH-SMITH S,PRICE J,DEAKIN T,et al.Compara- tive Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard[C]∥Cray User Group (CUG) Conference.2018. [3]YOSHIDA T.Fujitsu high performance CPU for the Post-K Computer[C]∥Hot Chips 30 Symposium (HCS).Cupertino,US:IEEE,2018. [4]STEPHENS N,BILES S,BOETTCHER M,et al.The ARM Scalable Vector Extension[C]∥IEEE Micro.Boston,US:IEEE,2017. [5]MCCORMICK P S,BRAITHWAITE R K,FENG W.Empirical Memory-Access Cost Models in Multicore NUMA Architectures[C]∥International Conference on Parallel Processing (ICPP).Taipei:2011. [6]LAURENZANO M A,TIWARI A,CAUBLE-CHANTRENNE A,et al.Characterization and bottleneck analysis of a 64-bit ARMv8 platform[C]∥ISPASS 2016 - International Symposium on Performance Analysis of Systems and Software.2016. [7]MALLINSON A C,BECKINGSALE D A,GAUDIN W P,et al.CloverLeaf:Preparing Hydrodynamics Codes for Exascale[C]∥CRAY User Group.2013. [8]MCINTOSH-SMITH S,MARTINEAU M,DEAKIN T,et al.TeaLeaf:A mini-application to enable design-space explorations for iterative sparse linear solvers[C]∥Proceedings of IEEE International Conference on Cluster Computing.ICCC,2017. [9]ZERR R,BAKER R.SNAP:SN (discrete ordinates) application proxy:Description[R].2013. [10]MARTINEAU M,MCINTOSH-SMITH S.Exploring On-Node Parallelism with Neutral,a Monte Carlo Neutral Particle Transport Mini-App[C]∥Proceedings of IEEE International Confe-rence on Cluster Computing.ICCC,2017. [11]PARLETT B N.LINPACK Users’ Guide (J.J.Dongarra,J.R.Bunch,C.B.Moler and G.W.Stewart)[M].Philadelphia:SIAM Review,2005. [12]MCCALPIN J D.Memory Bandwidth and Machine Balance in Current High Performance Computers[J].IEEE ComputerSocie-ty Technical Committee on Computer Architecture Newsletter,1995,2:19-25. [13]MCVOY L,STAELIN C.lmbench:Portable Tools for Perfor- mance Analysis[C]∥Proceedings of the USENIX Annual Technical Conference.1996. [14]LIU J,WU J,PANDA D K.High performance RDMA-based MPI implementation over InfiniBand[C]∥International Journal of Parallel Programming.2004. [15]LIN X H,WANG Y C,QIN Q,et al.Modeling and Evaluating Intel IMCI Vgather Instruction using Stencils[J].Computer Engineering & Science,2016,38(9):1741-1747.(in Chinese) 林新华,王一超,秦强,等.利用Stencil建模及评估Intel IMCI vgather指令[J].计算机工程与科学,2016,38(9):1741-1747. |
[1] | 郭拯危, 付泽文, 李宁, 白澜. 高分辨率斜视聚束SAR回波仿真加速算法研究 Study on Acceleration Algorithm for Raw Data Simulation of High Resolution Squint Spotlight SAR 计算机科学, 2022, 49(8): 178-183. https://doi.org/10.11896/jsjkx.210600066 |
[2] | 刘云, 董守杰. 基于CUDA核函数的多路视频图像拼接加速算法 Acceleration Algorithm of Multi-channel Video Image Stitching Based on CUDA Kernel Function 计算机科学, 2022, 49(6A): 441-446. https://doi.org/10.11896/jsjkx.210600043 |
[3] | 刘林云, 陈开颜, 李雄伟, 张阳, 谢方方. 基于卷积神经网络的旁路密码分析综述 Overview of Side Channel Analysis Based on Convolutional Neural Network 计算机科学, 2022, 49(5): 296-302. https://doi.org/10.11896/jsjkx.210300286 |
[4] | 瞿伟, 余飞鸿. 基于多核处理器的非对称嵌入式系统研究综述 Survey of Research on Asymmetric Embedded System Based on Multi-core Processor 计算机科学, 2021, 48(6A): 538-542. https://doi.org/10.11896/jsjkx.200900204 |
[5] | 陈孟东, 郭东升, 谢向辉, 吴东. 基于异构计算平台的规则处理器的设计与实现 Design and Implementation of Rule Processor Based on Heterogeneous Computing Platform 计算机科学, 2020, 47(4): 312-317. https://doi.org/10.11896/jsjkx.190300104 |
[6] | 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化 Performance Optimization of FT Program Based on SW26010 Processor 计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050 |
[7] | 罗殊彦, 朱怡安, 曾诚. 嵌入式异构多核处理器核间的通信性能评估与优化 Performance Evaluation and Optimization of Inter-cores Communication for Heterogeneous Multi-core Processor Unit 计算机科学, 2018, 45(6A): 262-265. |
[8] | 高放,黄樟钦. 基于异构多核并行加速的嵌入式神经网络人脸识别方法 Embedded Neural Network Face Recognition Method Based on Heterogeneous Multicore Parallel Acceleration 计算机科学, 2018, 45(3): 288-293. https://doi.org/10.11896/j.issn.1002-137X.2018.03.047 |
[9] | 朱君鹏, 李晖, 陈梅, 戴震宇. SNS:一种快速无偏的分层图抽样算法 SNS:A Fast and Unbiased Stratified Graph Sampling Algorithm 计算机科学, 2018, 45(11): 249-255. https://doi.org/10.11896/j.issn.1002-137X.2018.11.039 |
[10] | 马飞越,游洪,佃松宜,杨家勇,彭新智,王博,丁培. 一种用于气体绝缘开关设备异物清扫与检测的机器人系统 Robot System for GIS Foreign Body Clean and Cavity Detection 计算机科学, 2017, 44(Z11): 592-595. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.127 |
[11] | 李红军,崔西宁,牟明,韩伟. 一种面向分布式嵌入式计算机的性能评估模型 Research on Distributed Embedded Computer Performance Evaluation Model 计算机科学, 2017, 44(4): 153-156. https://doi.org/10.11896/j.issn.1002-137X.2017.04.033 |
[12] | 唐滔,彭林,黄春,杨灿群. 面向存储层次设计优化的GPU程序性能分析 Performance Analysis of GPU Programs Towards Better Memory Hierarchy Design 计算机科学, 2017, 44(12): 1-10. https://doi.org/10.11896/j.issn.1002-137X.2017.12.001 |
[13] | 孟德龙,文敏华,韦建文,林新华. 神威太湖之光上OpenFOAM的移植与优化 Porting and Optimizing OpenFOAM on Sunway TaihuLight System 计算机科学, 2017, 44(10): 64-70. https://doi.org/10.11896/j.issn.1002-137X.2017.10.012 |
[14] | 王伟,王嘉郡,王明明,张文静,陈金广. 以网络性能为核心的移动自组网Flooding攻击防御技术 Defense Technology Based on Dynamic Space-Time Performance for Flooding Attacks in Mobile Ad Hoc Networks 计算机科学, 2017, 44(1): 159-166. https://doi.org/10.11896/j.issn.1002-137X.2017.01.031 |
[15] | 林新华,秦强,李硕,文敏华,松岗聪. 使用Stencil评估Intel AVX2 Vgather指令 Evaluating Intel AVX2 Vgather Instructions with Stencils 计算机科学, 2017, 44(1): 20-24. https://doi.org/10.11896/j.issn.1002-137X.2017.01.004 |
|