计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800189-7.doi: 10.11896/jsjkx.210800189
黄佳为, 李晓鹏, 凌诚
HUANG Jia-wei, LI Xiao-peng, LING Cheng
摘要: 贝叶斯与Metropolis-Hastings算法的高效实现让MrBayes成为使用广泛的分子序列系统发育分析工具。然而,分子序列与进化参数的增加导致候选分子树样本空间急剧扩大,使得系统发育树的重构工作面临巨大计算挑战。为降低MrBayes系统发育分析中分子树条件似然概率的计算时间,提高分析效率,近年来出现一批基于图形处理器(GPU)的并行加速方法。为提高并行方法的可扩展性,提出了一种优化的似然概率多线程并行计算方法。根据位点间可变进化速率模型中分子状态似然概率的计算需要对应不同转移概率矩阵,将前期使用多线程对不同位点似然概率的并行计算,进一步分解为多位点间不同转移概率矩阵下的条件似然概率的计算。该策略在不改变单个线程计算传输比的基础上,通过增加线程数量,优化了线程warp间的并行重叠度,提高了并行效率。此外,由于每个线程warp只计算同一种转移概率矩阵下的似然概率,避免了在使用共享内存时不同warp间的同步开销,进一步提升了内核计算效率。所提方法与前期方法在4组实际数据和30组模拟数据上的计算结果表明,在核心似然函数的计算加速上,本文取得的计算性能超过tgMC3(2.0版)和nMC3(2.1.1版)方法,最高达1.78和2.04倍。
中图分类号:
[1]ARROWSMITH C,BOUNTRA C,FISH P,et al.Epigeneticprotein families:a new frontier for drug discovery.Nature Reviews Drug Discovery,2012,11(5):384-400. [2]RAJAPAKSA S,RASANJANA W,PERERA I,et al.GPU Accelerated Maximum Likelihood Analysis for Phylogenetic Infe-rence[P].Software and Computer Applications,2019. [3]GUINDON S,GASCUEL O.A Simple,Fast,and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood[J].Systematic Biology,2003,52(5):696-704. [4]YANG Z,RANNALA B.Bayesian phylogenetic inference using DNA sequences:a markov chain monte carlo method[J].Mole-cular Biology & Evolution,1997(7):717-724. [5]HASTINGS W K.Monte Carlo Sampling Methods Using Mar-kov Chains and Their Applications[J].Biometrika,1970,57(1):97-107. [6]GREEN P J.Reversible Jump Markov Chain Monte Carlo Computation andBayesian Model Determination[J].Biometrika,1995,82(4):711-732. [7]METROPOLIS N,ROSENBLUTH A W,ROSENBLUTH MN,et al.Equation of State by Fast Computing Machines[J].Journal of Chemical Physics,1953,21(6):1087-1092. [8]PRATAS F,TRANCOSO P,STAMATAKIS A,et al.Fine-grain parallelism using multi-core,Cell/BE,and GPU systems:Accelerating the phylogenetic likelihood function[C]//2009 International Conference on Parallel Processing.IEEE,2009:9-17. [9]PANG S A,STONES R J,REN M M,et al.GPU MrBayes V3.1:MrBayes on Graphics Processing Units for Protein Sequence Data.[J].Molecular Biology and Evolution,2015,32(9):2496-2497. [10]ZHAO M,REN Q,WANG Y,et al.A Three-Level Parallel Algorithm for MrBayes 3.2[C]//2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications(ISPA/IUCC).IEEE,2017:1246-1250. [11]PRATAS F,TRANCOSO P,SOUSA L,et al.Fine-grain parallelism using multi-core,Cell/BE,and GPU Systems[J].Parallel Computing,2012,38(8):365-390. [12]ZHOU J F,LIU X G,STONES D S,et al.MrBayes on a Gra-phics Processing Unit[J].Bioinformatics,2011,27(9):1-7. [13]BAO J,XIA H J,ZHOU J F,et al.Efficient implementation of MrBayes on multi-GPU[J].Molecular Biology and Evolution,2013,30(6):1471-1479. [14]LING C,HAMADA T,BAI J N,et al.MrBayes tgMC3:ATight GPU Implementation of MrBayes[J].PLOS ONE,2013,8(4):1-9. [15]LING C,HAMADA T,GAO J Y,et al.MrBayes tgMC3++:A High Performance and Resource-Efficient GPU-Oriented Phylogenetic Analysis Method[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2016,13(5):1-9. [16]KUAN L,PRATAS F,SOUSA L,et al.MrBayes sMC3:Accelerating Bayesian inference of phylogenetic trees[J].The International Journal of High Performance Computing Applications,2018,32(2):754-755. [17]AYRES D L,DARLING A,ZWICKL D J,et al.BEAGLE:An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics[J].Systematic Bio-logy,2012,61(1):170-173. [18]AYRES DANIEL L,CUMMINGS MICHAEL P,BAELEG,et al.BEAGLE 3:Improved Performance,Scaling,and Usability for a High-Performance Computing Library for Statistical Phylogenetics.[J].Systematic Biology,2019,68(6):1052-1061. [19]YANG Z.Maximum likelihood phylogenetic estimation fromDNA sequences with variable rates over sites:approximate methods[J].Journal of molecular evolution,1994,39(3):306-314. [20]ANDREW R,NICHOLAS C.Grass.Seq-Gen:an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees[J].Bioinformatics,1997,13(3):235-238. |
|