计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 314-319.doi: 10.11896/j.issn.1002-137X.2019.01.049

• 交叉与前沿 • 上一篇    下一篇

众核平台上广度优先搜索算法的优化

徐启泽, 韩文廷, 陈俊仕, 安虹   

  1. (中国科学技术大学计算机科学与技术学院 合肥230027)
  • 收稿日期:2018-01-18 出版日期:2019-01-15 发布日期:2019-02-25
  • 作者简介:徐启泽(1992-),男,硕士生,主要研究方向为高性能计算;韩文廷(1965-),男,博士,副教授,主要研究方向为高性能计算;陈俊仕(1990-),男,博士生,主要研究方向为多核并行编程;安 虹(1963-),女,博士,教授,主要研究方向为超大规模并行计算机系统结构、大数据并行存储与处理系统、高性能计算,E-mail:han@ustc.edu.cn(通信作者)。
  • 基金资助:
    国家重点研发计划(2016YFB0201902)资助

Optimization of Breadth-first Search Algorithm Based on Many-core Platform

XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong   

  1. (School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China)
  • Received:2018-01-18 Online:2019-01-15 Published:2019-02-25

摘要: 图算法在多个领域具有重要的应用价值。随着社会信息化程度的提高,需要处理的图数据量越来越大,图算法的性能已成为研究热点。广度优先搜索算法是一种重要的图算法,研究它的性能优化技术可以为其他图算法的性能优化提供借鉴。目前,在新一代Xeon Phi众核处理器上的工作均基于自顶向下算法且没有考虑到非均匀访存(NUMA)对性能的影响。文中以混合广度优先搜索算法为基础,结合NUMA拓扑结构,从任务分配、向量化和数据预处理3个方面展开优化,在Xeon Phi平台上设计并实现了高性能并行广度优先搜索算法。一系列实验结果表明,优化后的算法在不同规模的测试数据上与Graph500官方优化的算法相比取得了50%~145%的性能提升。

关键词: 非均匀访存, 广度优先搜索, 向量化, 性能优化, 众核架构

Abstract: Graph algorithms have important application value in many fields.With the development of social informatization,the amount of graph data to be processed has become larger and larger,the performance of graph algorithms has become a research hotspot.Breadth-first search algorithm is an important graph algorithm,studying its performance optimization techniques can provide reference for the performance optimization of other graph algorithms.The current work on new generation Xeon Phi many-core processors are based on top-down algorithm and do not take into account NUMA effects on performance.Based on the hybrid breadth-first search algorithm and the NUMA topology,this paper optimized Breadth-first search algorithm from the perspective of tasks allocation,vectorization and data preprocessing,designed and implemented a high-performance parallel breadth-first search algorithm on the Xeon Phi platform.A series of experimental results show that the optimized algorithm has gained 50%~145% improvement compared with Graph500 official tuned algorithm on different scales of test data.

Key words: Breadth-first search, Many-core architecture, NUMA, Performance optimization, Vectorization

中图分类号: 

  • TP302
[1]LIU H,HUANG H H.Enterprise:breadth-first graph traversal on GPUs[C]//SC15:International Conference for High Performance Computing,Networking,Storage and Analysis.2015:1-12.<br /> [2]BEAMER S,BULUC A,ASANOVIC K,et al.Distributed Memory Breadth-First Search Revisited:Enabling Bottom-Up Search[C]//2013 IEEE International Symposium on Parallel & Distributed Processing,Workshops and Phd Forum.2013:1618-1627.<br /> [3]LIN H,TANG X C,YU B W,et al.Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores[C]//International Parallel and Distributed Processing Symposium (IPDPS).2017:635-645.<br /> [4]PAREDES M,RILEY G,LUJAN M.Breadth first search vectorization on the Intel Xeon Phi[C]//Proceedings of the ACM International Conference on Computing Frontiers (CF’16).2016:1-10.<br /> [5]GAO T,LU Y T,ZHANG B D,et al.Using the Intel Many Integrated Core to accelerate graph traversal[J].International Journal of High Performance Computing Applications,2014,28(3):255-266<br /> [6]BESTA M,MARENDING F,SOLOMONIK E,et al.SlimSell:A Vectorizable Graph Representation for Breadth-First Search[C]//2017 IEEE International Parallel and Distributed Proces-sing Symposium (IPDPS).2017:32-41.<br /> [7]AGARWAL V,PETRINI F,PASETTO D,et al.Scalable Graph Exploration on Multicore Processors[C]//In Proceedings of the 2010 ACM/IEEE International Conference for High Perfor-mance Computing,Networking,Storage and Analysis (SC’10).2010:1-11.<br /> [8]YASUI Y,FUJISAWA K.Fast and scalable NUMA-based thread parallel breadth-first search[C]//2015 International Conference on High Performance Computing & Simulation (HPCS).2015:377-385.<br /> [9]Graph500[OL].https://graph500.org.<br /> [10]LESKOVEC J,CHAKRABARTI D,KLEINBERG J,et al.Kronecker Graphs:An Approach to Modeling Networks[J].The Journal of Machine Learning Research,2010,11(3):985-1042.<br /> [11]Sparse Matrix[OL].https://en.wikipedia.org/wiki/Sparse_ma-trix.<br /> [12]SODANI A.Knights landing (KNL):2nd Generation Intel<sup>®</sup> Xeon Phi processor[C]//2015 IEEE Hot Chips 27 Symposium (HCS).2015:1-24.<br /> [13]HENG D D,TANG Y H,YI X D,et al.Implementation and performance analysis of BFS algorithm on a parallel prototype system[J].Computer Engineering & Science,2017,39(1):27-34.(in Chinese)<br /> 衡冬冬,唐玉华,易晓东,等.并行原型系统上BFS算法设计实现与测试分析[J].计算机工程与科学,2017,39(1):27-34.
[1] 陈钧吾, 余华山.
面向无尺度图的Δ-stepping算法改进策略
Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs
计算机科学, 2022, 49(6A): 594-600. https://doi.org/10.11896/jsjkx.210400062
[2] 叶跃进, 李芳, 陈德训, 郭恒, 陈鑫.
基于国产众核架构的非结构网格分区块重构预处理算法研究
Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture
计算机科学, 2022, 49(6): 73-80. https://doi.org/10.11896/jsjkx.210900045
[3] 鄂海红, 张田宇, 宋美娜.
基于Web的数据可视化图表渲染优化方法
Web-based Data Visualization Chart Rendering Optimization Method
计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038
[4] 张晓, 张思蒙, 石佳, 董聪, 李战怀.
Ceph分布式存储系统性能优化技术研究综述
Review on Performance Optimization of Ceph Distributed Storage System
计算机科学, 2021, 48(2): 1-12. https://doi.org/10.11896/jsjkx.201000149
[5] 金琪, 王俊昌, 付雄.
基于智能放置策略的Cuckoo哈希表
Cuckoo Hash Table Based on Smart Placement Strategy
计算机科学, 2020, 47(8): 80-86. https://doi.org/10.11896/jsjkx.191200109
[6] 徐江峰谭玉龙.
基于机器学习的HBase配置参数优化研究
Research on HBase Configuration Parameter Optimization Based on Machine Learning
计算机科学, 2020, 47(6A): 474-479. https://doi.org/10.11896/JsJkx.190900046
[7] 张彭奕, 宋杰.
区块链共识算法效能优化研究进展
Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms
计算机科学, 2020, 47(12): 296-303. https://doi.org/10.11896/jsjkx.200700020
[8] 徐传福,王曦,刘舒,陈世钊,林玉.
基于Python的大规模高性能LBM多相流模拟
Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python
计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009
[9] 李芳,李志辉,徐金秀,范昊,褚学森,李新亮.
基于十亿亿次国产超算系统的流体力学软件众核适应性研究
Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System
计算机科学, 2020, 47(1): 24-30. https://doi.org/10.11896/jsjkx.181102176
[10] 张凌浩, 桂盛霖, 穆逢君, 王胜.
基于后缀树的二进制可执行代码的克隆检测算法
Clone Detection Algorithm for Binary Executable Code with Suffix Tree
计算机科学, 2019, 46(10): 141-147. https://doi.org/10.11896/jsjkx.180801573
[11] 周蓓, 黄永忠, 许瑾晨, 郭绍忠.
向量数学库的向量化方法研究
Study on SIMD Method of Vector Math Library
计算机科学, 2019, 46(1): 320-324. https://doi.org/10.11896/j.issn.1002-137X.2019.01.050
[12] 姚金阳, 赵荣彩, 王琦, 李颖颖.
面向间接数组索引的向量化方法
Vectorization Methods for Indirect Array Index
计算机科学, 2018, 45(9): 220-223. https://doi.org/10.11896/j.issn.1002-137X.2018.09.036
[13] 孙涛, 张俊星.
SDN性能优化技术研究综述
Review of SDN Performance Optimization Technology
计算机科学, 2018, 45(11A): 84-91.
[14] 赵澄, 陈君新, 姚明海.
基于SVM分类器的XSS攻击检测技术
XSS Attack Detection Technology Based on SVM Classifier
计算机科学, 2018, 45(11A): 356-360.
[15] 吴卫祖,刘利群,谢冬青.
基于神经网络的异构网络向量化表示方法
Vectorized Representation of Heterogeneous Network Based on Neural Networks
计算机科学, 2017, 44(5): 272-275. https://doi.org/10.11896/j.issn.1002-137X.2017.05.049
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!