Computer Science ›› 2017, Vol. 44 ›› Issue (3): 20-22.doi: 10.11896/j.issn.1002-137X.2017.03.005

Previous Articles     Next Articles

Accelerating Gene Clustering on Heterogeneous Clusters

WEI Jian-wen, XU Zhi-geng, WANG Bing-qiang, Simon SEE and James LIN   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Metagenome clustering is a novel approach to detect flaw genes which relies on massive gene data,effective clustering algorithms and efficient implementation.In clustering,calculating correlation matrix is essential,accounting most of computing time.To take a gene repo as an example,which has 1300 samples and million genes,it will take about 27 years to cluster them.Therefore,developing efficient implementations for calculating correlation matrix is most essential.After analyzing the algorithms,we proposed and took several optimization approaches.First,we implemented an efficient multithread one using OpenMP dynamic scheduling.Secondly,we further improved the performance by utilizing cache on CPU and shared memory on GPU efficiently.Thirdly,we implemented a loadbalance work distribution which works well on the MPI program on CPU.Compared to the unoptimized single-threaded CPU program,the two fasted one,MPI+OpenMP on 256 CPU cores and MPI+CUDA on 6 GPU cards,achieve 238.8 and 263.8 speedups.

Key words: Gene clustering,Heterogeneous computing,Cache optimization,Load balance

[1] QIN J,LI Y,CAI Z,et al.A metagenome-wide association study of gut microbiota in type 2 diabetes[J].Nature,2012,490(7418):55-60.
[2] GUO G X,LU X J,QIU S,et al.GPU-accelerated Gene Clustering Method for Metagenome [C]∥Proceedings of HPC China 2014.Guangzhou,2014:324-328.(in Chinese)郭贵鑫,陆旭佳,邱爽,等.基于GPU加速的宏基因组聚类方法[C]∥HPC China 2014会议.广州,2014:324-328.
[3] WANG M,ZHANG W,DING W,et al.Parallel Clustering Algorithm for Large-Scale Biological Data Sets[J].PLoS ONE,2014,9(4):13-15.
[4] W W M,KIRK D B.Programming Massively Parallel Processors [M].New York:Morgan Kaufmann,2012:120-128.
[5] EKANAYAKE V,GUNARATHNE T,QIU J.Cloud Techno-logies for Bioinformatics Applications[J].IEEE Transactions on Parallel and Distributed Systems,2011,22(6):998-1011.
[6] BUSTAMAM A,BURRAGE K,HAMILTON N A.Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2012,9(3):234-240.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!