计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 34-40.doi: 10.11896/jsjkx.201100051

• 计算机体系结构* • 上一篇    下一篇

基于神威平台的Floyd并行算法的实现和优化

何亚茹1, 庞建民1,2, 徐金龙2, 朱雨2, 陶小涵2   

  1. 1 郑州大学中原网络安全研究院 郑州450000
    2 信息工程大学网络空间安全学院 郑州450000
  • 收稿日期:2020-11-05 修回日期:2021-03-21 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 庞建民(jianmin_pang@126.com)
  • 基金资助:
    之江实验室重大科研基金资助项目(2018FD0ZX01)

Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform

HE Ya-ru1, PANG Jian-min1,2, XU Jin-long2, ZHU Yu2, TAO Xiao-han2   

  1. 1 Zhong Yuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450000,China
    2 School of Cyberspace Security,Information Engineering University,Zhengzhou 450000,China
  • Received:2020-11-05 Revised:2021-03-21 Online:2021-06-15 Published:2021-06-03
  • About author:HE Ya-ru,born in 1994,postgraduate.Her main research interests include high-performance computing and so on.(i_heyr@163.com)
    PANG Jian-min,born in 1964,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include high-perfor-mance computing and information security.
  • Supported by:
    Major Scientific Project of Zhejiang Lab Advanced Industrial Network Security Platform(2018FD0ZX01).

摘要: 求解全源最短路径的Floyd算法是许多实际应用基础上的关键构建块,由于其时间复杂度较高,串行Floyd算法不适用于大规模输入图计算,针对不同平台的并行Floyd算法设计可为解决现实问题提供有效帮助。针对Floyd算法与国产自主研发处理器匹配滞后的问题,首次提出基于神威平台的Floyd并行算法的实现和优化。根据SW26010处理器主-从核架构的特点,采用主从加速编程模型进行并行实现,并分析了影响该算法性能的关键因素,通过算法优化、数组划分和双缓冲技术进行优化,逐步提升算法性能。测试结果表明,与主核上串行算法相比,基于神威平台的Floyd并行算法在单个SW26010处理器上可以获得106倍的最高加速。

关键词: Floyd算法, SW26010, 并行计算, 数组划分

Abstract: The Floyd algorithm for finding shortest paths in a weighted graph is a key building block which is used frequently in a variety of practical applications.However,the Floyd algorithm cannot scale to large-scale graphs due to its time complexity.Its parallel implementations for different architectures are thus proposed and have been proved effective.To address the mismatching between existing ineffective parallel implementation of the Floyd algorithm and domestically designed processors,this paper implements and optimizes the Floyd algorithm targeting the Sunway platform.More specifically,this paper implements the algorithm using the programming model designed for the heterogeneous architecture of the Sunway TaihuLight,and captures the performance bottleneck when executed on the target.This paper next improves the performance of the Floyd algorithm by means of algorithmic optimization,array partitioning and double buffering.The experimental results show that the implementation of the Floyd algorithm on the Sunway platform can achieve the highest speedup of 106X over the sequential version executed on the managing processing element of the SW26010 processor.

Key words: Array partitioning, Floyd algorithm, Parallel computing, SW26010

中图分类号: 

  • TP391
[1]DOR D,HALPERIN S,ZWICK U.All pairs almost shortest paths[C]//SIAM Journal on Computing.1996,29(5):452-461.
[2]CHAN T M.More algorithms for all-pairs shortest paths inweighted graphs[J].Proceedings of the Annual ACM Sympo-sium on Theory of Computing,2010,39:2075-2089.
[3]LI J W,ZHANG J,ZHAO J C,et al.A Load Balancing Shortest Path Routing Algorithm for SRIO Network[J].Computer Engineering,2020,46(3):214-221,228.
[4]DONGARRA J.Report on the sunway taihulight system[D].Knoxville:University of Tennessee,2016.
[5]LIU X,GUO H,SUN R J,et al.The Characteristic Analysis and Exascale Scalability Research of Large Scale Parallel Applications on Sunway TaihuLight Supercomputer[J].Chinese Journal of Computers,2018,41(10):2209-2220.
[6]LI M,YANG C,SUN Q,et al.Enabling highly efficient k-Means computations on the SW26010 many-core processor of sunway taihulight[J].Journal of Computer Science & Technology,2019,34(1):77-93.
[7]NI H,LIU X.Multi-Core optimization technology of unstructured grid based on Sunway TaihuLight[J].Computer Engineering,2019,45(6):45-51.
[8]XU Z,LIN J,MATSUOKA S.Benchmarking SW26010 many-core processor[C]//Parallel & Distributed Processing Sympo-sium Workshops.IEEE,2017.
[9]FLOYD R W.Algorithm 97,Shortest path algorithms[J].Communications of the ACM,1962,5(6):345.
[10]VENKATARAMAN G,SAHNI S,MUKHOPADHYAYA S.A blocked all-pairs shortest-paths algorithm[C]//Scandinavian Workshop on Algorithm Theory.Berlin,Heidelberg:Springer,2000.
[11]ZHANG D Q,WU G L,LIU D F.Accelerated and OptimizedMethod of Floyd Algorithm to Find out Shortest Path[J].Computer Engineering and Applications,2009(17):45-47,50.
[12]ZUO X F,SHEN W J.Improved Algorithm about Multi-shortest Path Problem Based on Floyd Algorithm[J].Computer Science,2017,44(5):238-240,273.
[13]LU L G,LIU L Y,LU T D,et al.A Modified Floyd Algorithm[J].Journal of East China University of Technology,2019,42(1):81-84.
[14]SRINIVASAN T,BALAKRISHNAN R,GANGADHARAN S A,et al.A scalable parallelization of all-pairs shortest path algorithm for a high performance cluster environment[C]//International Conference on Parallel & Distributed Systems.IEEE,2007.
[15]TESKEREDZIC E,KARAHODZIC K,NOSOVIC N.Comparison of the non-blocked and blocked floyd-warshall algorithm with regard to speedup and energy saving on an embedded GPU[C]//19th International Symposium INFOTEH-JAHORINA.2020:18-20.
[16]BONDHUGULA U,DEVULAPALLI A,FERNANDO J,et al.Parallel FPGA-based all-pairs shortest-paths in a directed graph[C]//20th International Parallel and Distributed Processing Symposium(IPDPS 2006).IEEE,2006.
[17]XING X X,ZHAO G X,LUO Z Y,et al.GPU-based Algorithm of Shortest Path[J].Computer Science,2012,39(3):299-303.
[18]FOSTER I.Designing and building parallel programs:concepts and tools for parallel software engineering[J].Tetrahedron Letters,1995,11(3):296-300.
[1] 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香.
面向国产异构众核架构的CFD非结构网格计算并行优化方法
Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture
计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157
[2] 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文.
一种面向构件化并行应用程序的性能骨架分析方法
Performance Skeleton Analysis Method Towards Component-based Parallel Applications
计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115
[3] 朱雨, 庞建民, 徐金龙, 陶小涵, 王军.
面向SW26010处理器的三维Stencil自适应分块参数算法
Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor
计算机科学, 2021, 48(6): 10-18. https://doi.org/10.11896/jsjkx.200700059
[4] 冯凯, 马鑫玉.
(n,k)-冒泡排序网络的子网络可靠性
Subnetwork Reliability of (n,k)-bubble-sort Networks
计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139
[5] 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立.
基于GPU加速的并行WMD算法
Parallel WMD Algorithm Based on GPU Acceleration
计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[6] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁.
显示导向型的大规模地理矢量实时可视化技术
Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data
计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
[7] 陈国良, 张玉杰.
并行计算学科发展历程
Development of Parallel Computing Subject
计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027
[8] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘.
异构混合并行计算综述
Survey of Heterogeneous Hybrid Parallel Computing
计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045
[9] 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀.
面向国产异构众核处理器SW26010的BFS优化方法
Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010
计算机科学, 2020, 47(8): 98-104. https://doi.org/10.11896/jsjkx.191000013
[10] 冯凯, 李婧.
k元n方体的子网络可靠性研究
Study on Subnetwork Reliability of k-ary n-cubes
计算机科学, 2020, 47(7): 31-36. https://doi.org/10.11896/jsjkx.190700170
[11] 吕小敬, 刘钊, 褚学森, 石树鹏, 孟虹松, 黄震春.
面向超大规模并行模拟的LBM计算流体力学软件
Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations
计算机科学, 2020, 47(4): 13-17. https://doi.org/10.11896/jsjkx.191000010
[12] 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰.
基于Spark Streaming的流式并行文本校对
Streaming Parallel Text Proofreading Based on Spark Streaming
计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070
[13] 邓定胜.
一种改进的DBSCAN算法在Spark平台上的应用
Application of Improved DBSCAN Algorithm on Spark Platform
计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071
[14] 徐传福,王曦,刘舒,陈世钊,林玉.
基于Python的大规模高性能LBM多相流模拟
Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python
计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009
[15] 徐磊, 陈荣亮, 蔡小川.
基于非结构化网格的高可扩展并行有限体积格子
Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid
计算机科学, 2019, 46(8): 84-88. https://doi.org/10.11896/j.issn.1002-137X.2019.08.013
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!