Computer Science ›› 2019, Vol. 46 ›› Issue (8): 95-99.doi: 10.11896/j.issn.1002-137X.2019.08.015

• HPC China 2018 • Previous Articles     Next Articles

Performance Evaluation of ARM-ISA SoC for High Performance Computing

WANG Yi-chao1, LIAO Qiu-cheng1, ZUO Si-cheng2, XIE Rui1, LIN Xin-hua1   

  1. (Network & Information Center,Shanghai Jiao Tong University,Shanghai 200240,China)1
    (School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)2
  • Received:2019-01-20 Online:2019-08-15 Published:2019-08-16

Abstract: In order to compare the performance of Intel Xeon processor for high performance computing,this paper eva-luated an ARM-ISA based-SoC floating point computing capacity,memory access bandwidth and latency.Computing capacity of double floating point on this is about 475 GFLOPS that is only 66% of Intel Xeon E5-2680v3.Memory bandwidth is 105 GB/s,better than Xeon processor.Moreover,this paper ported 4 scientific computing applications including stencil method on this SoC.The experiments show that the performance of two stencil applications on this SoC is close to that on Intel Xeon processors,and thread mapping for cache locality is a kind of performance optimization methods for this SoC.More performance study later on the new generation ARM Server SoC will be explored

Key words: ARMv8, performance evaluation, Processor

CLC Number: 

  • TP391
[1]JACKSON A,TURNER A,WEILAND M,et al.Evaluating the Arm Ecosystem for High Performance Computing[C]∥Platform for Advanced Scientific Computing (PASC) Conference.Zurich,Swiss:ACM,2019:1-18.
[2]MCINTOSH-SMITH S,PRICE J,DEAKIN T,et al.Compara- tive Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard[C]∥Cray User Group (CUG) Conference.2018.
[3]YOSHIDA T.Fujitsu high performance CPU for the Post-K Computer[C]∥Hot Chips 30 Symposium (HCS).Cupertino,US:IEEE,2018.
[4]STEPHENS N,BILES S,BOETTCHER M,et al.The ARM Scalable Vector Extension[C]∥IEEE Micro.Boston,US:IEEE,2017.
[5]MCCORMICK P S,BRAITHWAITE R K,FENG W.Empirical Memory-Access Cost Models in Multicore NUMA Architectures[C]∥International Conference on Parallel Processing (ICPP).Taipei:2011.
[6]LAURENZANO M A,TIWARI A,CAUBLE-CHANTRENNE A,et al.Characterization and bottleneck analysis of a 64-bit ARMv8 platform[C]∥ISPASS 2016 - International Symposium on Performance Analysis of Systems and Software.2016.
[7]MALLINSON A C,BECKINGSALE D A,GAUDIN W P,et al.CloverLeaf:Preparing Hydrodynamics Codes for Exascale[C]∥CRAY User Group.2013.
[8]MCINTOSH-SMITH S,MARTINEAU M,DEAKIN T,et al.TeaLeaf:A mini-application to enable design-space explorations for iterative sparse linear solvers[C]∥Proceedings of IEEE International Conference on Cluster Computing.ICCC,2017.
[9]ZERR R,BAKER R.SNAP:SN (discrete ordinates) application proxy:Description[R].2013.
[10]MARTINEAU M,MCINTOSH-SMITH S.Exploring On-Node Parallelism with Neutral,a Monte Carlo Neutral Particle Transport Mini-App[C]∥Proceedings of IEEE International Confe-rence on Cluster Computing.ICCC,2017.
[11]PARLETT B N.LINPACK Users’ Guide (J.J.Dongarra,J.R.Bunch,C.B.Moler and G.W.Stewart)[M].Philadelphia:SIAM Review,2005.
[12]MCCALPIN J D.Memory Bandwidth and Machine Balance in Current High Performance Computers[J].IEEE ComputerSocie-ty Technical Committee on Computer Architecture Newsletter,1995,2:19-25.
[13]MCVOY L,STAELIN C.lmbench:Portable Tools for Perfor- mance Analysis[C]∥Proceedings of the USENIX Annual Technical Conference.1996.
[14]LIU J,WU J,PANDA D K.High performance RDMA-based MPI implementation over InfiniBand[C]∥International Journal of Parallel Programming.2004.
[15]LIN X H,WANG Y C,QIN Q,et al.Modeling and Evaluating Intel IMCI Vgather Instruction using Stencils[J].Computer Engineering & Science,2016,38(9):1741-1747.(in Chinese) 林新华,王一超,秦强,等.利用Stencil建模及评估Intel IMCI vgather指令[J].计算机工程与科学,2016,38(9):1741-1747.
[1] LIU Lin-yun, CHEN Kai-yan, LI Xiong-wei, ZHANG Yang, XIE Fang-fang. Overview of Side Channel Analysis Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(5): 296-302.
[2] QU Wei, YU Fei-hong. Survey of Research on Asymmetric Embedded System Based on Multi-core Processor [J]. Computer Science, 2021, 48(6A): 538-542.
[3] CHEN Meng-dong, GUO Dong-sheng, XIE Xiang-hui, WU Dong. Design and Implementation of Rule Processor Based on Heterogeneous Computing Platform [J]. Computer Science, 2020, 47(4): 312-317.
[4] ZHANG Biao, DONG Meng-yu, FAN Bei-bei. Enterprise Performance Evaluation Model Based on Triangular Fuzzy Multi-attribute Decision Making [J]. Computer Science, 2019, 46(6A): 547-549.
[5] TAO Xiao-han, PANG Jian-min, GAO Wei, WANG Qi, YAO Jin-yang. Performance Optimization of FT Program Based on SW26010 Processor [J]. Computer Science, 2019, 46(4): 321-328.
[6] DING Wei-long, XUE Li-li, CHEN Wan-jun, WU Fu-li. Visualization Method for University Teachers’ Performance Data Based on Mixed Layout Strategy [J]. Computer Science, 2019, 46(2): 24-29.
[7] DONG Si-qi, LI Hai-long, QU Yu-ben, ZHANG Zhao, HU Lei. Survey of Research on Computation Unloading Strategy in Mobile Edge Computing [J]. Computer Science, 2019, 46(11): 32-40.
[8] LUO Shu-yan, ZHU Yi-an, ZENG Cheng. Performance Evaluation and Optimization of Inter-cores Communication for Heterogeneous
Multi-core Processor Unit
[J]. Computer Science, 2018, 45(6A): 262-265.
[9] GAO Fang and HUANG Zhang-qin. Embedded Neural Network Face Recognition Method Based on Heterogeneous Multicore Parallel Acceleration [J]. Computer Science, 2018, 45(3): 288-293.
[10] ZHANG Shao-nan, QIU Ke-ni, ZHANG Wei-gong, WANG Jing, ZHENG Jia-xin, BAI Rui-ying and ZHU Xiao-yan. Queuing Theory-guided Performance Evaluation on Reconfigurable High-speed Device Connected Bus [J]. Computer Science, 2017, 44(Z6): 504-509.
[11] MA Fei-yue, YOU Hong, DIAN Song-yi, YANG Jia-yong, PENG Xin-zhi, WANG Bo and DING Pei. Robot System for GIS Foreign Body Clean and Cavity Detection [J]. Computer Science, 2017, 44(Z11): 592-595.
[12] LI Hong-jun, CUI Xi-ning, MU Ming and HAN Wei. Research on Distributed Embedded Computer Performance Evaluation Model [J]. Computer Science, 2017, 44(4): 153-156.
[13] MENG De-long, WEN Min-hua, WEI Jian-wen and James LIN. Porting and Optimizing OpenFOAM on Sunway TaihuLight System [J]. Computer Science, 2017, 44(10): 64-70.
[14] LIN Xin-hua, QIN Qiang, LI Shuo, WEN Min-hua and MATSUOKA Satoshi. Evaluating Intel AVX2 Vgather Instructions with Stencils [J]. Computer Science, 2017, 44(1): 20-24.
[15] WANG Wei, WANG Jia-jun, WANG Ming-ming, ZHANG Wen-jing and CHEN Jin-guang. Defense Technology Based on Dynamic Space-Time Performance for Flooding Attacks in Mobile Ad Hoc Networks [J]. Computer Science, 2017, 44(1): 159-166.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!