计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800189-7.doi: 10.11896/jsjkx.210800189

• 交叉&应用 • 上一篇    下一篇

一种基于GPU的核苷酸分子系统发育树条件似然概率可扩展并行计算方法

黄佳为, 李晓鹏, 凌诚   

  1. 北京化工大学信息科学与技术学院 北京 100000
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 凌诚(lingcheng@mail.buct.edu.cn)
  • 作者简介:(867454915@qq.com)
  • 基金资助:
    国家自然科学基金(61602026)

Scalable Parallel Computing Method for Conditional Likelihood Probability of Nucleotide Molecular Phylogenetic Tree Based on GPU

HUANG Jia-wei, LI Xiao-peng, LING Cheng   

  1. School of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100000,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HUANG Jia-wei,born in 1996,postgraduate.His main research interests include high-performance GPU computing in bioinformatics and so on.
    LING Cheng,born in 1987,Ph.D,associate professor.His main research inte-rests include high-performance GPU computing on computational biology and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(61602026).

摘要: 贝叶斯与Metropolis-Hastings算法的高效实现让MrBayes成为使用广泛的分子序列系统发育分析工具。然而,分子序列与进化参数的增加导致候选分子树样本空间急剧扩大,使得系统发育树的重构工作面临巨大计算挑战。为降低MrBayes系统发育分析中分子树条件似然概率的计算时间,提高分析效率,近年来出现一批基于图形处理器(GPU)的并行加速方法。为提高并行方法的可扩展性,提出了一种优化的似然概率多线程并行计算方法。根据位点间可变进化速率模型中分子状态似然概率的计算需要对应不同转移概率矩阵,将前期使用多线程对不同位点似然概率的并行计算,进一步分解为多位点间不同转移概率矩阵下的条件似然概率的计算。该策略在不改变单个线程计算传输比的基础上,通过增加线程数量,优化了线程warp间的并行重叠度,提高了并行效率。此外,由于每个线程warp只计算同一种转移概率矩阵下的似然概率,避免了在使用共享内存时不同warp间的同步开销,进一步提升了内核计算效率。所提方法与前期方法在4组实际数据和30组模拟数据上的计算结果表明,在核心似然函数的计算加速上,本文取得的计算性能超过tgMC3(2.0版)和nMC3(2.1.1版)方法,最高达1.78和2.04倍。

关键词: MrBayes, 似然计算, GPU, 并行计算, CUDA编程

Abstract: The efficient implementation of Bayesian and Metropolis Hastings algorithms makes Mrbayes a widely used tool for molecular sequence phylogenetic analysis.However,the increase of molecular sequences and evolutionary parameters leads to the rapid expansion of the sample space of candidate molecular trees,which makes the reconstruction of phylogenetic trees face great computational challenges.In order to reduce the calculation time of conditional likelihood probability of molecular tree in mrbayes phylogenetic analysis and improve the analysis efficiency,a number of parallel acceleration methods based on graphics processor(GPU) have emerged in recent years.In order to improve the scalability of parallel methods,an optimized likelihood probability multithreaded parallel computing method is proposed in this paper.As the calculation of molecular state likelihood probability in the variable evolution rate model between sites needs to correspond to different transition probability matrices,this method further decomposes the parallel calculation of likelihood probability of different sites using multithreading into the calculation of conditional likelihood probability under different transition probability matrices between multiple sites.This strategy optimizes the parallel overlap between threads and improves the parallel efficiency by increasing the number of threads without changing the calculation transmission ratio of a single thread.In addition,because each thread warp only calculates the likelihood probability under the same transition probability matrix,it avoids the synchronization overhead between different warps when using shared memory,and further improves the computing efficiency of the kernel.Calculation results of 4 groups of actual data and 30 groups of simulated data show that the computational performance of this method is 1.78 and 2.04 times higher than that of tgMC3(version 2.0) and nMC3(version 2.1.1) in the calculation acceleration of core likelihood function.

Key words: MrBayes, Likelihood computing, GPU, Parallel computing, CUDA

中图分类号: 

  • TP399
[1]ARROWSMITH C,BOUNTRA C,FISH P,et al.Epigeneticprotein families:a new frontier for drug discovery.Nature Reviews Drug Discovery,2012,11(5):384-400.
[2]RAJAPAKSA S,RASANJANA W,PERERA I,et al.GPU Accelerated Maximum Likelihood Analysis for Phylogenetic Infe-rence[P].Software and Computer Applications,2019.
[3]GUINDON S,GASCUEL O.A Simple,Fast,and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood[J].Systematic Biology,2003,52(5):696-704.
[4]YANG Z,RANNALA B.Bayesian phylogenetic inference using DNA sequences:a markov chain monte carlo method[J].Mole-cular Biology & Evolution,1997(7):717-724.
[5]HASTINGS W K.Monte Carlo Sampling Methods Using Mar-kov Chains and Their Applications[J].Biometrika,1970,57(1):97-107.
[6]GREEN P J.Reversible Jump Markov Chain Monte Carlo Computation andBayesian Model Determination[J].Biometrika,1995,82(4):711-732.
[7]METROPOLIS N,ROSENBLUTH A W,ROSENBLUTH MN,et al.Equation of State by Fast Computing Machines[J].Journal of Chemical Physics,1953,21(6):1087-1092.
[8]PRATAS F,TRANCOSO P,STAMATAKIS A,et al.Fine-grain parallelism using multi-core,Cell/BE,and GPU systems:Accelerating the phylogenetic likelihood function[C]//2009 International Conference on Parallel Processing.IEEE,2009:9-17.
[9]PANG S A,STONES R J,REN M M,et al.GPU MrBayes V3.1:MrBayes on Graphics Processing Units for Protein Sequence Data.[J].Molecular Biology and Evolution,2015,32(9):2496-2497.
[10]ZHAO M,REN Q,WANG Y,et al.A Three-Level Parallel Algorithm for MrBayes 3.2[C]//2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications(ISPA/IUCC).IEEE,2017:1246-1250.
[11]PRATAS F,TRANCOSO P,SOUSA L,et al.Fine-grain parallelism using multi-core,Cell/BE,and GPU Systems[J].Parallel Computing,2012,38(8):365-390.
[12]ZHOU J F,LIU X G,STONES D S,et al.MrBayes on a Gra-phics Processing Unit[J].Bioinformatics,2011,27(9):1-7.
[13]BAO J,XIA H J,ZHOU J F,et al.Efficient implementation of MrBayes on multi-GPU[J].Molecular Biology and Evolution,2013,30(6):1471-1479.
[14]LING C,HAMADA T,BAI J N,et al.MrBayes tgMC3:ATight GPU Implementation of MrBayes[J].PLOS ONE,2013,8(4):1-9.
[15]LING C,HAMADA T,GAO J Y,et al.MrBayes tgMC3++:A High Performance and Resource-Efficient GPU-Oriented Phylogenetic Analysis Method[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2016,13(5):1-9.
[16]KUAN L,PRATAS F,SOUSA L,et al.MrBayes sMC3:Accelerating Bayesian inference of phylogenetic trees[J].The International Journal of High Performance Computing Applications,2018,32(2):754-755.
[17]AYRES D L,DARLING A,ZWICKL D J,et al.BEAGLE:An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics[J].Systematic Bio-logy,2012,61(1):170-173.
[18]AYRES DANIEL L,CUMMINGS MICHAEL P,BAELEG,et al.BEAGLE 3:Improved Performance,Scaling,and Usability for a High-Performance Computing Library for Statistical Phylogenetics.[J].Systematic Biology,2019,68(6):1052-1061.
[19]YANG Z.Maximum likelihood phylogenetic estimation fromDNA sequences with variable rates over sites:approximate methods[J].Journal of molecular evolution,1994,39(3):306-314.
[20]ANDREW R,NICHOLAS C.Grass.Seq-Gen:an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees[J].Bioinformatics,1997,13(3):235-238.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!