Computer Science ›› 2023, Vol. 50 ›› Issue (12): 322-329.doi: 10.11896/jsjkx.221100057

• Artificial Intelligence • Previous Articles     Next Articles

Mixed Path HMC Sampling Methods for Molecular Tree Spaces

LI Xiaopeng1, LING Cheng2, GAO Jingyang1   

  1. 1 School of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100000,China
    2 China Overseas International Center,Advanced Micro Devices,Inc(AMD),Beijing 100000,China
  • Received:2022-11-07 Revised:2023-03-14 Online:2023-12-15 Published:2023-12-07
  • About author:LI Xiaopeng,born in 1998,postgra-duate.His main research interest is computational phylogenetics.
    GAO Jingyang,born in 1966,Ph.D,professor,Ph.D supervisor.Her main research interests include artificial intelligence and bioinformatics.
  • Supported by:
    Natural Science Foundation of Beijing(5182018).

Abstract: With the increasing abundance of modern molecular sequence data and the dramatic expansion of the tree-like topological space describing historical relationships between species,reliable inference of phylogenetic trees continues to face enormous challenges.In recent years,the most advanced Hamiltonian Markov Monte Carlo(HMC) algorithm in the Markov Chain Monte Carlo(MCMC) family has been shown to be applicable to phylogenetic analysis,which can avoid the large amount of random walk behaviors present in traditional MCMC algorithms and speed up the mixing of Markov chains.However,in the more complex multimodal development tree space,the HMC algorithm cannot escape from the local high probability region by obtaining propo-sals from other modes.In order to improve the robustness of the algorithm,a hybrid path Hamiltonian Markov Monte Carlo(MPHMC) optimization strategy is proposed in this paper.Without adding additional computational cost,the algorithm samples paths with a non-HMC update component for discrete parameters,alternating with HMC deterministic updates,and introduces a branch rearrangement strategy with greater topological variation in the tree space,enabling freer traversal of the entire posterior distribution's tree space.Experiments on five empirical datasets demonstrate that the MPHMC method better samples from the correct posterior distribution,and the HMC single-path sampling algorithm may fail when run on larger datasets that are more difficult to sample,while the MPHMC method achieves a sampling efficiency gain over 14% than the widely used phylogenetic analysis tool,Mrbayes(MCMC).

Key words: MrBayes, Tree space, HMC, Multimodal posterior distribution, Mixed path

CLC Number: 

  • TP399
[1]KAPLI P,YANG Z,TELFORD M J.Phylogenetic tree building in the genomic age[J].Nature Reviews Genetics,2020,21(7):428-444.
[2]YANG Z.Computational molecular evolution[M].Oxford:Oxford University Press,2006.
[3]HUELSENBECK J P,RONQUIST F.MRBAYES:Bayesian inference of phylogenetic trees[J].Bioinformatics,2001,17(8):754-755.
[4]RONQUIST F,HUELSENBECK J P.MrBayes 3:Bayesianphylogenetic inference under mixed models[J].Bioinformatics,2003,19(12):1572-1574.
[5]RONQUIST F,TESLENKO M,VAN DER MARK P,et al.MrBayes 3.2:efficient Bayesian phylogenetic inference and model choice across a large model space[J].Systematic Biology,2012,61(3):539-542.
[6]DRUMMOND A J,RAMBAUT A.BEAST:Bayesian evolu-tionary analysis by sampling trees[J].BMC Evolutionary Biology,2007,7(1):1-8.
[7]METROPOLIS N,ROSENBLUTH A W,ROSENBLUTH MN,et al.Equation of state calculations by fast computing machines[J].The Journal of Chemical Physics,1953,21(6):1087-1092.
[8]HASTINGS W K.Monte Carlo sampling methods using Markov chains and their applications[J].Biometrika,1970,57(1):97-109.
[9]NEAL R M.MCMC using Hamiltonian dynamics[J].Handbook of Markov Chain Monte Carlo,2011,2(11):2.
[10]DINH V,BILGE A,ZHANG C,et al.Probabilistic path hamiltonian monte carlo[C]//International Conference on Machine Learning.PMLR,2017:1009-1018.
[11]BILLERA L J,HOLMES S P,VOGTMANN K.Geometry of the space of phylogenetic trees[J].Advances in Applied Mathematics,2001,27(4):733-767.
[12]JUKES T H,CANTOR C R.Evolution of protein molecules[J].Mammalian Protein Metabolism,1969,3:21-132.
[13]YANG Z.Estimating the pattern of nucleotide substitution[J].Journal of Molecular Evolution,1994,39(1):105-111.
[14]MAU B,NEWTON M A.Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo[J].Journal of Computational and Graphical Statistics,1997,6(1):122-131.
[15]YANG Z,RANNALA B.Bayesian phylogenetic inference using DNA sequences:a Markov Chain Monte Carlo method[J].Molecular Biology and Evolution,1997,14(7):717-724.
[16]BROMHAM L,DUCHÊNE S,HUA X,et al.Bayesian molecular dating:opening up the black box[J].Biological Reviews,2018,93(2):1165-1191.
[17]YANG Z,RANNALA B.Molecular phylogenetics:principlesand practice[J].Nature Reviews Genetics,2012,13(5):303-314.
[18]BETANCOURT M.A conceptual introduction to HamiltonianMonte Carlo[J].arXiv:1701.02434,2017.
[19]AFSHAR H M,DOMKE J.Reflection,Refraction,and Hamiltonian Monte Carlo[C]//NIPS.2015:3007-3015.
[20]FELSENSTEIN J.Evolutionary trees from DNA sequences:amaximum likelihood approach[J].Journal of molecular evolution,1981,17(6):368-376.
[21]JI X,ZHANG Z,HOLBROOK A,et al.Gradients do grow on trees:a linear-time O(N)-dimensional gradient for statistical phylogenetics[J].Molecular Biology and Evolution,2020,37(10):3047-3060.
[22]BOU-RABEE N,SANZ-SERNA J M.Geometric integrators and the Hamiltonian Monte Carlo method[J].Acta Numerica,2018,27:113-206.
[23]WHIDDEN C,MATSEN IV F A.Quantifying MCMC exploration of phylogenetic tree space[J].Systematic Biology,2015,64(3):472-491.
[24]MANGOUBI O,PILLAI N S,SMITH A.Does HamiltonianMonte Carlo mix faster than a random walk on multimodal densities?[J].arXiv:1808.03230,2018.
[25]PARK J.Sampling from multimodal distributions using tem-pered Hamiltonian transitions[J].arXiv:2111.06871,2021.
[26]LAKNER C,VAN DER MARK P,HUELSENBECK J P,et al.Efficiency of Markov chain Monte Carlo tree proposals in Baye-sian phylogenetics[J].Systematic Biology,2008,57(1):86-103.
[27]ZHOU G.Metropolis Augmented Hamiltonian Monte Carlo[J].arXiv:2201.08044,2022.
[28]SUN S,SHEN Y.“Parallel-Tempering”-Assisted Hybrid Monte Carlo Algorithm for Bayesian Inference in Dynamical Systems[C]//UK Workshop on Computational Intelligence.Cham:Springer,2019:357-368.
[29]HUELSENBECK J P.Performance of phylogenetic methods in simulation[J].Systematic Biology,1995,44(1):17-48.
[30]NEI M,TAKEZAKI N,SITNIKOVA T.Assessing molecular phylogenies[J].Science,1995,267(5195):253-255.
[31]XIE Q,BU W,ZHENG L.The Bayesian phylogenetic analysis of the 18S rRNA sequences from the main lineages of Trichophora(Insecta:Heteroptera:Pentatomomorpha)[J].Molecular Phylogenetics and Evolution,2005,34(2):448-451.
[32]XIE Q,TIAN Y,ZHENG L,et al.18S rRNA hyper-elongation and the phylogeny of Euhemiptera(Insecta:Hemiptera)[J].Molecular Phylogenetics and Evolution,2008,47(2):463-471.
[33]XIE Q,WANG Y,LIN J,et al.Potential key bases of ribosomal RNA to kingdom-specific spectra of antibiotic susceptibility and the possible archaeal origin of eukaryotes[J].PloS one,2012,7(1):e29468.
[34]GEYER C J.Markov Chain Monte Carlo Maximum Likelihood[M].American Cancer Society,2005.
[35]NYLANDER J A A,FREDRIK R,HUELSENBECK J P,et al.Bayesian Phylogenetic Analysis of Combined Data[J].Systema-tic Biology,2004,53(1):47-67.
[36]RAMBAUT A,DRUMMOND A J,XIE D,et al.Posterior summarization in Bayesian phylogenetics using Tracer 1.7[J].Systematic Biology,2018,67(5):901-904.
[37]SUKUMARAN J,HOLDER M T.DendroPy:a Python library for phylogenetic computing[J].Bioinformatics,2010,26(12):1569-1571.
[38]HUERTA-CEPAS J,SERRA F,BORK P.ETE 3:reconstruction,analysis,and visualization of phylogenomic data[J].Mole-cular Biology and Evolution,2016,33(6):1635-1638.
[39]SMITH M R.Robust analysis of phylogenetic tree space[J].Systematic Biology,2022,71(5):1255-1270.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!