计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 34-40.doi: 10.11896/jsjkx.201100051

• 计算机体系结构* • 上一篇    下一篇

基于神威平台的Floyd并行算法的实现和优化

何亚茹1, 庞建民1,2, 徐金龙2, 朱雨2, 陶小涵2   

  1. 1 郑州大学中原网络安全研究院 郑州450000
    2 信息工程大学网络空间安全学院 郑州450000
  • 收稿日期:2020-11-05 修回日期:2021-03-21 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 庞建民(jianmin_pang@126.com)
  • 基金资助:
    之江实验室重大科研基金资助项目(2018FD0ZX01)

Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform

HE Ya-ru1, PANG Jian-min1,2, XU Jin-long2, ZHU Yu2, TAO Xiao-han2   

  1. 1 Zhong Yuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450000,China
    2 School of Cyberspace Security,Information Engineering University,Zhengzhou 450000,China
  • Received:2020-11-05 Revised:2021-03-21 Online:2021-06-15 Published:2021-06-03
  • About author:HE Ya-ru,born in 1994,postgraduate.Her main research interests include high-performance computing and so on.(i_heyr@163.com)
    PANG Jian-min,born in 1964,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include high-perfor-mance computing and information security.
  • Supported by:
    Major Scientific Project of Zhejiang Lab Advanced Industrial Network Security Platform(2018FD0ZX01).

摘要: 求解全源最短路径的Floyd算法是许多实际应用基础上的关键构建块,由于其时间复杂度较高,串行Floyd算法不适用于大规模输入图计算,针对不同平台的并行Floyd算法设计可为解决现实问题提供有效帮助。针对Floyd算法与国产自主研发处理器匹配滞后的问题,首次提出基于神威平台的Floyd并行算法的实现和优化。根据SW26010处理器主-从核架构的特点,采用主从加速编程模型进行并行实现,并分析了影响该算法性能的关键因素,通过算法优化、数组划分和双缓冲技术进行优化,逐步提升算法性能。测试结果表明,与主核上串行算法相比,基于神威平台的Floyd并行算法在单个SW26010处理器上可以获得106倍的最高加速。

关键词: SW26010, Floyd算法, 并行计算, 数组划分

Abstract: The Floyd algorithm for finding shortest paths in a weighted graph is a key building block which is used frequently in a variety of practical applications.However,the Floyd algorithm cannot scale to large-scale graphs due to its time complexity.Its parallel implementations for different architectures are thus proposed and have been proved effective.To address the mismatching between existing ineffective parallel implementation of the Floyd algorithm and domestically designed processors,this paper implements and optimizes the Floyd algorithm targeting the Sunway platform.More specifically,this paper implements the algorithm using the programming model designed for the heterogeneous architecture of the Sunway TaihuLight,and captures the performance bottleneck when executed on the target.This paper next improves the performance of the Floyd algorithm by means of algorithmic optimization,array partitioning and double buffering.The experimental results show that the implementation of the Floyd algorithm on the Sunway platform can achieve the highest speedup of 106X over the sequential version executed on the managing processing element of the SW26010 processor.

Key words: SW26010, Floyd algorithm, Parallel computing, Array partitioning

中图分类号: 

  • TP391
[1] DOR D,HALPERIN S,ZWICK U.All pairs almost shortest paths[C]//SIAM Journal on Computing.1996,29(5):452-461.
[2] CHAN T M.More algorithms for all-pairs shortest paths inweighted graphs[J].Proceedings of the Annual ACM Sympo-sium on Theory of Computing,2010,39:2075-2089.
[3] LI J W,ZHANG J,ZHAO J C,et al.A Load Balancing Shortest Path Routing Algorithm for SRIO Network[J].Computer Engineering,2020,46(3):214-221,228.
[4] DONGARRA J.Report on the sunway taihulight system[D].Knoxville:University of Tennessee,2016.
[5] LIU X,GUO H,SUN R J,et al.The Characteristic Analysis and Exascale Scalability Research of Large Scale Parallel Applications on Sunway TaihuLight Supercomputer[J].Chinese Journal of Computers,2018,41(10):2209-2220.
[6] LI M,YANG C,SUN Q,et al.Enabling highly efficient k-Means computations on the SW26010 many-core processor of sunway taihulight[J].Journal of Computer Science & Technology,2019,34(1):77-93.
[7] NI H,LIU X.Multi-Core optimization technology of unstructured grid based on Sunway TaihuLight[J].Computer Engineering,2019,45(6):45-51.
[8] XU Z,LIN J,MATSUOKA S.Benchmarking SW26010 many-core processor[C]//Parallel & Distributed Processing Sympo-sium Workshops.IEEE,2017.
[9] FLOYD R W.Algorithm 97,Shortest path algorithms[J].Communications of the ACM,1962,5(6):345.
[10] VENKATARAMAN G,SAHNI S,MUKHOPADHYAYA S.A blocked all-pairs shortest-paths algorithm[C]//Scandinavian Workshop on Algorithm Theory.Berlin,Heidelberg:Springer,2000.
[11] ZHANG D Q,WU G L,LIU D F.Accelerated and OptimizedMethod of Floyd Algorithm to Find out Shortest Path[J].Computer Engineering and Applications,2009(17):45-47,50.
[12] ZUO X F,SHEN W J.Improved Algorithm about Multi-shortest Path Problem Based on Floyd Algorithm[J].Computer Science,2017,44(5):238-240,273.
[13] LU L G,LIU L Y,LU T D,et al.A Modified Floyd Algorithm[J].Journal of East China University of Technology,2019,42(1):81-84.
[14] SRINIVASAN T,BALAKRISHNAN R,GANGADHARAN S A,et al.A scalable parallelization of all-pairs shortest path algorithm for a high performance cluster environment[C]//International Conference on Parallel & Distributed Systems.IEEE,2007.
[15] TESKEREDZIC E,KARAHODZIC K,NOSOVIC N.Comparison of the non-blocked and blocked floyd-warshall algorithm with regard to speedup and energy saving on an embedded GPU[C]//19th International Symposium INFOTEH-JAHORINA.2020:18-20.
[16] BONDHUGULA U,DEVULAPALLI A,FERNANDO J,et al.Parallel FPGA-based all-pairs shortest-paths in a directed graph[C]//20th International Parallel and Distributed Processing Symposium(IPDPS 2006).IEEE,2006.
[17] XING X X,ZHAO G X,LUO Z Y,et al.GPU-based Algorithm of Shortest Path[J].Computer Science,2012,39(3):299-303.
[18] FOSTER I.Designing and building parallel programs:concepts and tools for parallel software engineering[J].Tetrahedron Letters,1995,11(3):296-300.
[1] 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文. 一种面向构件化并行应用程序的性能骨架分析方法[J]. 计算机科学, 2021, 48(6): 1-9.
[2] 朱雨, 庞建民, 徐金龙, 陶小涵, 王军. 面向SW26010处理器的三维Stencil自适应分块参数算法[J]. 计算机科学, 2021, 48(6): 10-18.
[3] 冯凯, 马鑫玉. (n,k)-冒泡排序网络的子网络可靠性[J]. 计算机科学, 2021, 48(4): 43-48.
[4] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术[J]. 计算机科学, 2020, 47(9): 117-122.
[5] 陈国良, 张玉杰. 并行计算学科发展历程[J]. 计算机科学, 2020, 47(8): 1-4.
[6] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘. 异构混合并行计算综述[J]. 计算机科学, 2020, 47(8): 5-16.
[7] 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀. 面向国产异构众核处理器SW26010的BFS优化方法[J]. 计算机科学, 2020, 47(8): 98-104.
[8] 冯凯, 李婧. k元n方体的子网络可靠性研究[J]. 计算机科学, 2020, 47(7): 31-36.
[9] 吕小敬, 刘钊, 褚学森, 石树鹏, 孟虹松, 黄震春. 面向超大规模并行模拟的LBM计算流体力学软件[J]. 计算机科学, 2020, 47(4): 13-17.
[10] 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对[J]. 计算机科学, 2020, 47(4): 36-41.
[11] 邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用[J]. 计算机科学, 2020, 47(11A): 425-429.
[12] 徐传福,王曦,刘舒,陈世钊,林玉. 基于Python的大规模高性能LBM多相流模拟[J]. 计算机科学, 2020, 47(1): 17-23.
[13] 徐磊, 陈荣亮, 蔡小川. 基于非结构化网格的高可扩展并行有限体积格子[J]. 计算机科学, 2019, 46(8): 84-88.
[14] 倪鸿, 刘鑫. 非结构网格下稀疏下三角方程求解器众核优化技术研究[J]. 计算机科学, 2019, 46(6A): 518-522.
[15] 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化[J]. 计算机科学, 2019, 46(4): 321-328.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[2] 赵利博,刘奇,付方玲,何凌. 基于小波变换和倒谱分析的腭裂高鼻音等级自动识别[J]. 计算机科学, 2018, 45(4): 278 -284 .
[3] 戴文静, 袁家斌. 隐含子群问题的研究现状[J]. 计算机科学, 2018, 45(6): 1 -8 .
[4] 郑香平, 於志勇, 温广槟. 地点网络中的社区发现[J]. 计算机科学, 2018, 45(6): 46 -50 .
[5] 朱文强. 面向O2O服务的移动社交网络个性化可信群体识别模型[J]. 计算机科学, 2018, 45(6): 76 -83 .
[6] 池凯凯, 林一民, 李燕君, 程珍. 能量捕获传感网中吞吐量最大化的占空比方案[J]. 计算机科学, 2018, 45(6): 100 -104 .
[7] 陈福才, 李思豪, 张建朋, 黄瑞阳. 基于标签关系改进的多标签特征选择算法[J]. 计算机科学, 2018, 45(6): 228 -234 .
[8] 徐丽丽,董一鸿,潘剑飞,陈华辉. 面向复杂网络的图稀疏算法综述[J]. 计算机科学, 2018, 45(5): 24 -30 .
[9] 刘盼,李华康,孙国梓. 基于短时多源回归算法的P2P平台风险观测方法[J]. 计算机科学, 2018, 45(5): 97 -101 .
[10] 艾拓,梁亚玲,杜明辉. 基于难负样本挖掘的改进Faster RCNN训练方法[J]. 计算机科学, 2018, 45(5): 250 -254 .