Computer Science ›› 2015, Vol. 42 ›› Issue (1): 82-85.doi: 10.11896/j.issn.1002-137X.2015.01.019

Previous Articles     Next Articles

Implementation of Depth First Search Parallel Algorithm on Cluster of GPUs

YU Ying, LI Ken-li and ZHENG Guang-yong   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Straightforward implementation of depth first search algorithm for large graph on GPU cluster,may lead to load imbalance between threads and un-coalesced memory accesses,giving rise to the low performance of the algorithm.In order to achieve improvement of the performance in a single GPU and multi-GPUs environment,a series of effective operations were used to reschedule before processing the data.A novel strategy for mapping between threads and data was proposed,and by using the prefix sum and binary search operations, load balancing was achieved perfectly.In order to reduce the communication overhead,we performed pruning operation on the set of edges which needs to be exchanged at all branches of DFS.Experimental results show that the algorithm can achieve its best parallelism available on a single GPU and minimize communication overhead among GPUs.GPU cluster can effectively perform the distributed DFS on graphs which contain billions of nodes.

Key words: GPU,DFS,Distributed algorithm,CUDA,MPI

[1] 王海峰,陈庆奎.图形处理器通用计算关键技术研究综述[J].计算机学报,2013,6(4):757-772
[2] Daniel B,Maxim L,Carlos Linares L,et al.Anytime AND/OR depth-first search for combinatorial optimization[J].AI Communications,2012,5(3):211-227
[3] Hera J-H,Ramakrishna R S.An external-memorydepth-firstsearch algorithm for general grid graphs[J].Theoretical Computer Science,2007,4(1-3):170-180
[4] Hong S,Oguntebi T,Olukotun K.Efficientparallel graph exploration on multi-core cpu and gpu[C]∥2011 International Conference on Parallel Architectures and Compilation Techniques(PACT),2011:78-88
[5] Grimshaw A,Merrill D,Garland M.High Performance and Scalable GPU Graph Traversal[R].Technical Report,Nvidia,2011
[6] 吴鸿伟,汤伟宾,李晓潮,等.GPU编程原理及其在网络安全领域的应用算法分析[J].计算机科学,2012,9(11):24-27
[7] AlfonsLaarman,van de Pol J.Variations on Multi-Core Nested Depth-First Search[J].Electronic Proceedings in Theoretical Computer Science,2011,2:13-28
[8] Chao Li-wen,Sen Yan-hong.Depth-first search algorithm based on special properties of PFSP[J].Control Decis,2009,4(8):1203-1208
[9] Mastrostefano E,Bernaschi M.Efficient breadth first search on multi-GPU systems[J].Journal of Parallel and Distributed Computing,2013,3(9):1292-1305
[10] 张庆科,杨波,王琳,等.基于GPU的现代并行优化算法[J].计算机科学,2012,9(4):304-311
[11] Harish P,Narayanan P.Accelerating large graph algorithms on the gpu using cuda[C]∥Aluru S,Parashar M,Badrinath R,et al.,eds.H igh Performance Computing-HiPC2007.Heidelberg:Springer,Berlin,2007:197-208
[12] Agarwal V,Petrini F,Pasetto D,et al.Scalable graph exploration on multicore processors[C]∥2010 International Conference for High Performance Computing,Networking,Storage and Analysis.SC:2010:1-11
[13] Leskovec J,Chakrabarti D,Kleinberg J,et al.Kronecker graphs:an approach to modeling networks[J].Journal of Machine Learning Research,2010,1:985-1042
[14] 卢风顺,宋君强,银福康,等.CPU/GPU协同并行计算研究综述[J].计算机科学,2011,8(3):5-9,6

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!