计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600225-7.doi: 10.11896/jsjkx.230600225

• 人工智能 • 上一篇    下一篇

远程模板检测算法及其在蛋白质结构预测中的应用

梁方, 徐旭瑶, 赵凯龙, 赵炫锋, 张贵军   

  1. 浙江工业大学信息工程学院 杭州 310023
  • 发布日期:2024-06-06
  • 通讯作者: 张贵军(zgj@zjut.edu.cn)
  • 作者简介:(zgj@zjut.edu.cn)
  • 基金资助:
    国家自然科学基金(62173304);国家重点研发计划(2019YFE0126100)

Remote Template Detection Algorithm and Its Application in Protein Structure Prediction

LIANG Fang, XU Xuyao, ZHAO Kailong, ZHAO Xuanfeng, ZHANG Guijun   

  1. School of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
  • Published:2024-06-06
  • About author:LIANG Fang,born in 1999,research assistant.Her main research interests include intelligent information proces-sing,optimization theory and algorithm design and bioinformatics.
    ZHANG Guijun,born in 1974,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.50785G).His main research interests include intelligent information processing,optimization theo-ry and algorithm design and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(62173304)and National Key Research and Development Program of China(2019YFE0126100).

摘要: 在从传统力场驱动的蛋白质结构预测到当前数据驱动的AI结构建模的发展历程中,蛋白质结构模板检测是蛋白质结构预测中的关键环节,如何检测高精度蛋白质结构远程模板对提升结构的预测精度具有重要的研究意义。该研究提出了一种基于自适应特征向量提取的远程同源模板检测算法ASEalign。首先,采用多特征信息融合的深度学习技术预测蛋白质接触图;然后,设计了融合接触图、二级结构、序列谱谱比对和溶剂可及性等多维度特征打分函数,并通过自适应地提取接触图矩阵中的特征值和特征向量进行模板比对;最后,将检测出的高质量模板输入AlphaFold2中进行结构建模。在135个蛋白质的测试集上的结果表明,ASEalign相于主流的模板检测算法HHsearch精度提升了11.5%;同时,结构建模的精度优于AlphaFold2。

关键词: 模板检测, 模板建模, 接触图预测, 深度学习, 二级结构

Abstract: In the development process from traditional force field-driven protein structure prediction to current data-driven AI structure modeling,protein structure template detection is a key module in protein structure prediction,and how to detect high-precision protein structure remote templates is important to improve the prediction accuracy of structures.In this paper,a remote homology template detection algorithm ASEalign based on adaptive eigenvector extraction is proposed.Firstly,a deep learning technique of multi-feature information fusion is used to predict protein contact maps.Then,a multi-dimensional feature scoring function is designed to fuse contact maps,secondary structures,sequence profiles-profiles alignment and solvent accessibility,and the eigenvalue and eigenvector in the contact map matrix extracted by adaptive template alignment is performed.Finally,the detected high-quality templates are input to AlphaFold2 for structural modeling.Results on the test set of 135 proteins indicate that,compared to HHsearch,ASEalign improves the accuracy by 11.5%.Meanwhile,its accuracy of modeled structure is better than that of AlphaFold2.

Key words: Template detection, Template modeling, Contact map prediction, Deep learning, Secondary structure

中图分类号: 

  • TP389
[1]DILL K A,MACCALLUM J L.The protein-folding problem,50 years on[J].Science,2012,338(6110):1042-1046.
[2]CHEUNG M S,CHAVEZ L L,ONUCHIC J N.The energylandscape for protein folding and possible connections to function[J].Polymer,2004,45(2):547-555.
[3]CARLSON H A.Protein flexibility is an important component of structure-based drug discovery[J].Current Pharmaceutical Design,2002,8(17):1571-1578.
[4]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction:Progress and new directions in round XI[J].Proteins,2016,84(Suppl 1):4-14.
[5]DENG H Y,JIA Y,ZHANG Y.Protein structure prediction[J].Acta Physica Sinica,2016,65(17):169-179.
[6]ZHOU X,ZHENG W,LI Y,et al.I-TASSER-MTD:a deep-learning-based platform for multi-domain protein structure and function prediction[J].Nature Protocols,2022,17(10):2326-2353.
[7]SCHWEDE T,KOPP J,GUEXN,et al.SWISS-MODEL:an automated protein homology-modeling server[J].Nucleic Acids Research,2003,31(13):3381-3385.
[8]SONG Y,DIMAIO F,WANG R Y,et al.High-resolution comparative modeling with RosettaCM[J].Structure(London,England:1993),2013,21(10):1735-1742.
[9]WEBB B,SALI A.Comparative Protein Structure ModelingUsing MODELLER[J].Current Protocols in Bioinformatics,2016,54:5.6.1-5.6.37.
[10]XIA Y H,PENG C X,ZHOUX G,et al.A sequential niche multimodal conformational sampling algorithm for protein structure prediction[J].Bioinformatics(Oxford,England),2021,37(23):4357-4365.
[11]ROHL C A,STRAUSS C E,MISURA K M,et al.Proteinstructure prediction using Rosetta[C]//Methods in Enzymology.Elsevier,2004:66-93.
[12]ZHAO K L,LIU J,ZHOU X G,et al.MMpred:a distance-assisted multimodal conformation sampling for de novo protein structure prediction[J].Bioinformatics(Oxford,England),2021,37(23):4350-4356.
[13]FENG Q,HOU M,LIU J,et al.Construct a variable-lengthfragment library for de novo protein structure prediction[J].Briefings in Bioinformatics,2022,23(3):bbac086.
[14]XIE T Y,ZHOU X G,HU J,et al.Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm[J].Computer Science,2020,47(1):59-65.
[15]ABRIATA L A,TAMÒ G E,DAL PERARO M.A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments[J].Proteins:Structure,Function,Bioinformatics,2019,87(12):1100-1112.
[16]JUMPER J,EVANS R,PRITZEL A,et al.Highly accurate protein structure prediction with AlphaFold[J].Nature,2021,596(7873):583-589.
[17]BAEK M,DIMAIO F,ANISHCHENKO I,et al.Accurate prediction of protein structures and interactions using a three-track neural network[J].Science,2021,373(6557):871-876.
[18]SU H,WANG W,DU Z,et al.Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates[J].Advanced Science(Weinheim,Baden-Wurttemberg,Germany),2021,8(24):e2102592.
[19]JONES D T,THORNTON J M.The impact of AlphaFold2 one year on[J].Nature methods,2022,19(1):15-20.
[20]ALTSCHUL S F,MADDEN T L,SCHÄFFERA A,et al.Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[J].Nucleic Acids Research,1997,25(17):3389-3402.
[21]ALTSCHUL S F,KOONIN E V.Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases[J].Trends in Biochemical Sciences,1998,23(11):444-447.
[22]SÖDING J.Protein homology detection by HMM-HMM com-parison[J].Bioinformatics(Oxford,England),2005,21(7):951-960.
[23]WU S,ZHANG Y.MUSTER:Improving protein sequence profile-profile alignments by using multiple sources of structure information[J].Proteins,2008,72(2):547-556.
[24]YANG Y,FARAGGI E,ZHAO H,et al.Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates[J].Bioinformatics(Oxford,England),2011,27(15):2076-2082.
[25]BUCHAN D W A,JONES D T.EigenTHREADER:analogous protein fold recognition by efficient contact map threading[J].Bioinformatics(Oxford,England),2017,33(17):2684-2690.
[26]ZHENG W,WUYUN Q,LI Y,et al.Detecting distant-homology protein structures by aligning deep neural-network based contact maps[J].PLoS Computational Biology,2019,15(10):e1007411.
[27]WU S,ZHANG Y.LOMETS:a local meta-threading-server for protein structure prediction[J].Nucleic Acids Research,2007,35(10):3375-3382.
[28]ZHENG W,ZHANG C,WU Y ,et al.LOMETS2:improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins[J].Nucleic Acids Research,2019,47(W1):W429-W436.
[29]ZHENG W,QI Q G,WU Y ,et al.LOMETS3:integrating deep learning and profile alignment for advanced protein template recognition and function annotation[J].Nucleic Acids Research,2022,50(W1):W454-W464.
[30]REMMERT M,BIEGERT A,HAUSERA,et al.HHblits:lightning-fast iterative protein sequence searching by HMM-HMM alignment[J].Nature Methods,2012,9(2):173-175.
[31]MIRDITA M,VON DEN DRIESCH L,GALIEZ C,et al.Uniclust databases of clustered and deeply annotated protein sequences and alignments[J].Nucleic Acids Research,2017,45(D1):D170-D176.
[32]ZHAO K,XIA Y,ZHANG F,et al.Protein structure and fol-ding pathway prediction based on remote homologs recognition using PAthreader[J].Communications Biology,2023,6(1):243.
[33]THORNTON J M,LASKOWSKI R A,BORKAKOTIN.Al-phaFold heralds a data-driven revolution in biology and medicine[J].Nature Medicine,2021,27(10):1666-1669.
[34]TUNYASUVUNAKOOL K,ADLER J,WU Z,et al.Highly accurate protein structure prediction for the human proteome[J].Nature,2021,596(7873):590-596.
[35]FU L,NIU B,ZHU Z,et al.CD-HIT:accelerated for clustering the next-generation sequencing data[J].Bioinformatics(Oxford,England),2012,28(23):3150-3152.
[36]FOX N K,BRENNER S E,CHANDONIA J-M.SCOPe:Structural Classification of Proteins—extended,integrating SCOP and ASTRAL data and classification of new structures[J].Nucleic Acids Research,2014,42(D1):D304-D309.
[37]LI Z W,X L Q,ZHOU X G,et al.Multimodal Optimization Algorithm for Protein Conformation Space[J].Computer Science,2020,47(7):161-165.
[38]SEEMAYER S,GRUBER M,SÖDING J.CCMpred-fast andprecise prediction of protein residue-residue contacts from correlated mutations[J].Bioinformatics(Oxford,England),2014,30(21):3128-3130.
[39]DU Z,PAN S,WUQ,et al.CATHER:a novel threading algorithm with predicted contacts[J].Bioinformatics,2020,36(7):2119-2125.
[40]SKOLNICK J,GAO M,ZHOU H,et al.AlphaFold 2:why itworks and its implications for understanding the relationships of protein sequence,structure,and function[J].Journal of Chemical Information,2021,61(10):4827-4831.
[41]CONNELL K B,MILLER E J,MARQUSEE S.The folding tra-jectory of RNase H is dominated by its topology and not local stability:a protein engineering study of variants that fold via two-state and three-state mechanisms[J].Journal of Molecular Biology,2009,391(2):450-460.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!