计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 98-104.doi: 10.11896/jsjkx.191000013
所属专题: 高性能计算
袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀
YUAN Xin-hui, LIN Rong-fen, WEI Di, YIN Wan-wang, XU Jin-xiu
摘要: 近年来, 人们越来越关注计算机对数据密集型课题的处理能力。宽度优先搜索(Breadth First Search, BFS)是一种典型的数据密集型课题, 被广泛应用于多种图算法。Graph 500 Benchmark以BFS搜索为核心算法, 已经成为评价计算机处理大数据能力的基准。神威太湖之光超级计算机从2016年6月至2017年11月连续4次荣登Top 500榜单榜首, 其处理器SW26010是首款由我国自主研制的异构众核处理器。文中研究了如何利用SW26010的体系结构特点加速BFS算法的问题, 在SW26010上实现了基于单个核组的方向优化的融合BFS算法, 使用字节图(bytemap)释放内层循环依赖性, 利用异步DMA隐藏计算与便签存储器的访问开销, 利用异构架构协同运算并对图做预处理。最终, 以Graph 500作为基准测试程序处理scale为22的图, SW26010处理器单核组BFS的性能达到457.54MTEPS。
中图分类号:
[1] Graph 500[OL].http://www.graph500.org. [2] AGARWAL V, PETRINI F, PASETTO D, et al.Scalable Graph Exploration on Multicore Processors∥International Conference for High Performance Computing, Networking, Storage & Analysis.IEEE, 2010:1-11. [3] UENO K, SUZUMURA T.Highly scalable graph search for the Graph500 benchmark[C]∥International Symposium on High-performance Parallel & Distributed Computing.2012:149-160. [4] TITHI J J, MATANI D, MENGHANI G, et al.Avoiding Locks and Atomic Instructions in Shared-Memory Parallel BFS Using Optimistic Parallelization[C]∥IEEE International Symposium on Parallel & Distributed Processing.IEEE, 2013:1628-1637. [5] CHHUGANI J, SATISH N, KIM C, et al.Fast and EfficientGraph Traversal Algorithm for CPUs:Maximizing Single-Node Efficiency[C]∥IEEE International Parallel & Distributed Processing Symposium.2012:378-389. [6] BEAMER S, BULUC A, ASANOVIC K, et al.DistributedMemory Breadth-First Search Revisited:Enabling Bottom-Up Search[C]∥IEEE International Parallel & Distributed Proces-sing Symposium Workshops &Phd Forum.2013:1618-1627. [7] YASUI Y, FUJISAWA K, GOTO K.NUMA-optimized parallelbreadth-first search on multicore single-node system[C]∥IEEE International Conference on Big Data.IEEE, 2013:394-402. [8] YASUI Y, FUJISAWA K, SATO Y.Fast and Energy-efficientBreadth-First Search on a Single NUMA System[M]∥Supercomputing.Cham:Springer, 2014:365-381. [9] TAO G, YUTONG L, GUANG S.Using MIC to Accelerate aTypical Data-Intensive Application:The Breadth-first Search[C]∥IEEE International Symposium on Parallel & Distributed Processing.2013:1117-1125. [10]BUSATO F, BOMBIERI N.BFS-4K:an Efficient Implementa-tion of BFS for Kepler GPU Architectures[J].IEEE Transactions on Parallel & Distributed Systems, 2014(1):1-1. [11]ZOU D, DOU Y, WANG Q, et al.Direction-Optimizing Breadth-First Search on CPU-GPU Heterogeneous Platforms[C]∥IEEE International Conference on High Performance Computing & Communications & IEEE International Conference on Embedded & Ubiquitous Computing.IEEE, 2013:1064-1069. [12]CHECCONI F, PETRINI F, WILLCOCK J, et al.Breaking thespeed and scalability Barriers for Graph exploration on distributed-memory machines[J].International Conference for High Performance Computing, Networking, Storage & Analysis, 2012, 7196(5):1-12. [13]FU H, LIAO J, YANG J, et al.The Sunway TaihuLight supercomputer:system and applications[J].Sciece China Information Sciences, 2016, 59:1-16 [14]DUANE M, MICHAEL G, ANDREW G.Scalable GPU graph traversal[J].Acm Sigplan Notices, 2012, 47(8). [15]FU Z S, HARISH K D, BRADLEY B, et al.Parallel breadth first search on GPU clusters[C]∥2014 IEEE International Conference on Big Data(Big Data).IEEE, 2014. [16]PETER Z, ERIC H, JOHN M, et al.Dynamic parallelism for simple and efficient GPU graph algorithms[C]∥Proceedings of the 5th Workshop on Irregular Applications:Architectures and Algorithms.ACM, 2015. [17]YANG H, HANG L, HUANG H H.High-performance triangle counting on gpus[C]∥2018 IEEE High Performance extreme Computing Conference(HPEC).IEEE, 2018. |
[1] | 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157 |
[2] | 朱雨, 庞建民, 徐金龙, 陶小涵, 王军. 面向SW26010处理器的三维Stencil自适应分块参数算法 Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor 计算机科学, 2021, 48(6): 10-18. https://doi.org/10.11896/jsjkx.200700059 |
[3] | 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化 Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform 计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051 |
[4] | 郭杰, 高希然, 陈莉, 傅游, 刘颖. 用数据驱动的编程模型并行多重网格应用 Parallelizing Multigrid Application Using Data-driven Programming Model 计算机科学, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093 |
[5] | 吕小敬, 刘钊, 褚学森, 石树鹏, 孟虹松, 黄震春. 面向超大规模并行模拟的LBM计算流体力学软件 Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations 计算机科学, 2020, 47(4): 13-17. https://doi.org/10.11896/jsjkx.191000010 |
[6] | 倪鸿, 刘鑫. 非结构网格下稀疏下三角方程求解器众核优化技术研究 Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids 计算机科学, 2019, 46(6A): 518-522. |
[7] | 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化 Performance Optimization of FT Program Based on SW26010 Processor 计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050 |
[8] | 姚庆, 郑凯, 刘垚, 王肃, 孙军, 徐梦轩. SOM算法在申威众核上的实现和优化 Implementation and Optimization of SOM Algorithm on Sunway Many-core Processors 计算机科学, 2018, 45(11A): 591-596. |
[9] | 洪菁. 大数据时代的思维特点研究 Research on Characteristics of Thinking in Era of Big Data 计算机科学, 2016, 43(Z11): 472-473. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.106 |
[10] | 陈伟,SMIELIAUSKAS Wally. 大数据环境下的电子数据审计:机遇、挑战与方法 Opportunities,Challenges and Methods of Electric Data Auditing in Big Data Environments 计算机科学, 2016, 43(1): 8-13. https://doi.org/10.11896/j.issn.1002-137X.2016.01.002 |
[11] | 许瑾晨,郭绍忠,黄永忠,王磊. 面向异构众核从核的数学函数库访存优化方法 Access Optimization Technique for Mathematical Library of Slave Processors on Heterogeneous Many-core Architectures 计算机科学, 2014, 41(6): 12-17. https://doi.org/10.11896/j.issn.1002-137X.2014.06.003 |
[12] | 张建勋,古志民. 帮助线程预取技术研究综述 Survey of Helper Thread Prefetching 计算机科学, 2013, 40(7): 19-23. |
[13] | 杜薇,崔国华,刘伟,石飞燕,位凯志. 云环境下面向数据密集型应用的数据选择策略研究 Data Selection Strategy for Data-intensive Applications in Cloud 计算机科学, 2012, 39(6): 30-34. |
[14] | 薛卫,梁敬东,林金星. 基于肤色信息与宽度优先搜索的AAM人脸特征定位算法 AAM Facial Feature Localization Algorithm Based on Skin Model and Breadth-First Search 计算机科学, 2011, 38(8): 275-277. |
[15] | 石宣化 金海. 有服务质量保证的数据密集型网格应用管理研究 计算机科学, 2007, 34(6): 131-135. |
|