计算机科学 ›› 2017, Vol. 44 ›› Issue (3): 20-22.doi: 10.11896/j.issn.1002-137X.2017.03.005
韦建文,许志耿,王丙强,Simon SEE,林新华
WEI Jian-wen, XU Zhi-geng, WANG Bing-qiang, Simon SEE and James LIN
摘要: 宏基因组基因聚类是筛选致病基因的新型方法,其依赖于海量的测序数据、有效的聚类算法以及高效的计算机来实现。相关系数矩阵的计算是进行聚类前必须完成的操作,占总计算量的比重较大。以某基因库为例,包含1300个样本、每样本百万基因的数据,单线程运行需要27年。充分发挥多核CPU的潜力,利用GPU加速卡强大的计算能力,将程序扩展到多节点集群上运行,是重要而迫切的工作。在仔细分析算法的基础上,首先针对单CPU节点和单GPU卡做了高效实现,获得了接近理想的加速比;然后利用缓存优化进一步提升性能;最后使用负载均衡方法在MPI线程间分发计算任务,实现了良好的扩展。相比未优化的单线程程序,16节点CPU获得了238.8倍的加速,6 块GPU卡获得了263.8倍的加速。
[1] QIN J,LI Y,CAI Z,et al.A metagenome-wide association study of gut microbiota in type 2 diabetes[J].Nature,2012,490(7418):55-60. [2] GUO G X,LU X J,QIU S,et al.GPU-accelerated Gene Clustering Method for Metagenome [C]∥Proceedings of HPC China 2014.Guangzhou,2014:324-328.(in Chinese)郭贵鑫,陆旭佳,邱爽,等.基于GPU加速的宏基因组聚类方法[C]∥HPC China 2014会议.广州,2014:324-328. [3] WANG M,ZHANG W,DING W,et al.Parallel Clustering Algorithm for Large-Scale Biological Data Sets[J].PLoS ONE,2014,9(4):13-15. [4] W W M,KIRK D B.Programming Massively Parallel Processors [M].New York:Morgan Kaufmann,2012:120-128. [5] EKANAYAKE V,GUNARATHNE T,QIU J.Cloud Techno-logies for Bioinformatics Applications[J].IEEE Transactions on Parallel and Distributed Systems,2011,22(6):998-1011. [6] BUSTAMAM A,BURRAGE K,HAMILTON N A.Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2012,9(3):234-240. |
No related articles found! |
|