Computer Science ›› 2021, Vol. 48 ›› Issue (6): 34-40.doi: 10.11896/jsjkx.201100051

• Computer Architecture • Previous Articles     Next Articles

Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform

HE Ya-ru1, PANG Jian-min1,2, XU Jin-long2, ZHU Yu2, TAO Xiao-han2   

  1. 1 Zhong Yuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450000,China
    2 School of Cyberspace Security,Information Engineering University,Zhengzhou 450000,China
  • Received:2020-11-05 Revised:2021-03-21 Online:2021-06-15 Published:2021-06-03
  • About author:HE Ya-ru,born in 1994,postgraduate.Her main research interests include high-performance computing and so on.(
    PANG Jian-min,born in 1964,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include high-perfor-mance computing and information security.
  • Supported by:
    Major Scientific Project of Zhejiang Lab Advanced Industrial Network Security Platform(2018FD0ZX01).

Abstract: The Floyd algorithm for finding shortest paths in a weighted graph is a key building block which is used frequently in a variety of practical applications.However,the Floyd algorithm cannot scale to large-scale graphs due to its time complexity.Its parallel implementations for different architectures are thus proposed and have been proved effective.To address the mismatching between existing ineffective parallel implementation of the Floyd algorithm and domestically designed processors,this paper implements and optimizes the Floyd algorithm targeting the Sunway platform.More specifically,this paper implements the algorithm using the programming model designed for the heterogeneous architecture of the Sunway TaihuLight,and captures the performance bottleneck when executed on the target.This paper next improves the performance of the Floyd algorithm by means of algorithmic optimization,array partitioning and double buffering.The experimental results show that the implementation of the Floyd algorithm on the Sunway platform can achieve the highest speedup of 106X over the sequential version executed on the managing processing element of the SW26010 processor.

Key words: Array partitioning, Floyd algorithm, Parallel computing, SW26010

CLC Number: 

  • TP391
[1]DOR D,HALPERIN S,ZWICK U.All pairs almost shortest paths[C]//SIAM Journal on Computing.1996,29(5):452-461.
[2]CHAN T M.More algorithms for all-pairs shortest paths inweighted graphs[J].Proceedings of the Annual ACM Sympo-sium on Theory of Computing,2010,39:2075-2089.
[3]LI J W,ZHANG J,ZHAO J C,et al.A Load Balancing Shortest Path Routing Algorithm for SRIO Network[J].Computer Engineering,2020,46(3):214-221,228.
[4]DONGARRA J.Report on the sunway taihulight system[D].Knoxville:University of Tennessee,2016.
[5]LIU X,GUO H,SUN R J,et al.The Characteristic Analysis and Exascale Scalability Research of Large Scale Parallel Applications on Sunway TaihuLight Supercomputer[J].Chinese Journal of Computers,2018,41(10):2209-2220.
[6]LI M,YANG C,SUN Q,et al.Enabling highly efficient k-Means computations on the SW26010 many-core processor of sunway taihulight[J].Journal of Computer Science & Technology,2019,34(1):77-93.
[7]NI H,LIU X.Multi-Core optimization technology of unstructured grid based on Sunway TaihuLight[J].Computer Engineering,2019,45(6):45-51.
[8]XU Z,LIN J,MATSUOKA S.Benchmarking SW26010 many-core processor[C]//Parallel & Distributed Processing Sympo-sium Workshops.IEEE,2017.
[9]FLOYD R W.Algorithm 97,Shortest path algorithms[J].Communications of the ACM,1962,5(6):345.
[10]VENKATARAMAN G,SAHNI S,MUKHOPADHYAYA S.A blocked all-pairs shortest-paths algorithm[C]//Scandinavian Workshop on Algorithm Theory.Berlin,Heidelberg:Springer,2000.
[11]ZHANG D Q,WU G L,LIU D F.Accelerated and OptimizedMethod of Floyd Algorithm to Find out Shortest Path[J].Computer Engineering and Applications,2009(17):45-47,50.
[12]ZUO X F,SHEN W J.Improved Algorithm about Multi-shortest Path Problem Based on Floyd Algorithm[J].Computer Science,2017,44(5):238-240,273.
[13]LU L G,LIU L Y,LU T D,et al.A Modified Floyd Algorithm[J].Journal of East China University of Technology,2019,42(1):81-84.
[14]SRINIVASAN T,BALAKRISHNAN R,GANGADHARAN S A,et al.A scalable parallelization of all-pairs shortest path algorithm for a high performance cluster environment[C]//International Conference on Parallel & Distributed Systems.IEEE,2007.
[15]TESKEREDZIC E,KARAHODZIC K,NOSOVIC N.Comparison of the non-blocked and blocked floyd-warshall algorithm with regard to speedup and energy saving on an embedded GPU[C]//19th International Symposium INFOTEH-JAHORINA.2020:18-20.
[16]BONDHUGULA U,DEVULAPALLI A,FERNANDO J,et al.Parallel FPGA-based all-pairs shortest-paths in a directed graph[C]//20th International Parallel and Distributed Processing Symposium(IPDPS 2006).IEEE,2006.
[17]XING X X,ZHAO G X,LUO Z Y,et al.GPU-based Algorithm of Shortest Path[J].Computer Science,2012,39(3):299-303.
[18]FOSTER I.Designing and building parallel programs:concepts and tools for parallel software engineering[J].Tetrahedron Letters,1995,11(3):296-300.
[1] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[2] FU Tian-hao, TIAN Hong-yun, JIN Yu-yang, YANG Zhang, ZHAI Ji-dong, WU Lin-ping, XU Xiao-wen. Performance Skeleton Analysis Method Towards Component-based Parallel Applications [J]. Computer Science, 2021, 48(6): 1-9.
[3] ZHU Yu, PANG Jian-min, XU Jin-long, TAO Xiao-han, WANG Jun. Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor [J]. Computer Science, 2021, 48(6): 10-18.
[4] LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204.
[5] HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28.
[6] MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122.
[7] CHEN Guo-liang, ZHANG Yu-jie, . Development of Parallel Computing Subject [J]. Computer Science, 2020, 47(8): 1-4.
[8] YANG Wang-dong, WANG Hao-tian, ZHANG Yu-feng, LIN Sheng-le, CAI Qin-yun. Survey of Heterogeneous Hybrid Parallel Computing [J]. Computer Science, 2020, 47(8): 5-16.
[9] LIU Xiao-nan, JING Li-na, WANG Li-xin, WANG Mei-ling. Large-scale Quantum Fourier Transform Simulation Based on SW26010 [J]. Computer Science, 2020, 47(8): 93-97.
[10] YUAN Xin-hui, LIN Rong-fen, WEI Di, YIN Wan-wang, XU Jin-xiu. Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010 [J]. Computer Science, 2020, 47(8): 98-104.
[11] LV Xiao-jing, LIU Zhao, CHU Xue-sen, SHI Shu-peng, MENG Hong-song, HUANG Zhen-chun. Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations [J]. Computer Science, 2020, 47(4): 13-17.
[12] YANG Zong-lin, LI Tian-rui, LIU Sheng-jiu, YIN Cheng-feng, JIA Zhen, ZHU Jie. Streaming Parallel Text Proofreading Based on Spark Streaming [J]. Computer Science, 2020, 47(4): 36-41.
[13] DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429.
[14] XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23.
[15] XU Lei, CHEN Rong-liang, CAI Xiao-chuan. Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid [J]. Computer Science, 2019, 46(8): 84-88.
Full text



No Suggested Reading articles found!