计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600225-7.doi: 10.11896/jsjkx.230600225
梁方, 徐旭瑶, 赵凯龙, 赵炫锋, 张贵军
LIANG Fang, XU Xuyao, ZHAO Kailong, ZHAO Xuanfeng, ZHANG Guijun
摘要: 在从传统力场驱动的蛋白质结构预测到当前数据驱动的AI结构建模的发展历程中,蛋白质结构模板检测是蛋白质结构预测中的关键环节,如何检测高精度蛋白质结构远程模板对提升结构的预测精度具有重要的研究意义。该研究提出了一种基于自适应特征向量提取的远程同源模板检测算法ASEalign。首先,采用多特征信息融合的深度学习技术预测蛋白质接触图;然后,设计了融合接触图、二级结构、序列谱谱比对和溶剂可及性等多维度特征打分函数,并通过自适应地提取接触图矩阵中的特征值和特征向量进行模板比对;最后,将检测出的高质量模板输入AlphaFold2中进行结构建模。在135个蛋白质的测试集上的结果表明,ASEalign相于主流的模板检测算法HHsearch精度提升了11.5%;同时,结构建模的精度优于AlphaFold2。
中图分类号:
[1]DILL K A,MACCALLUM J L.The protein-folding problem,50 years on[J].Science,2012,338(6110):1042-1046. [2]CHEUNG M S,CHAVEZ L L,ONUCHIC J N.The energylandscape for protein folding and possible connections to function[J].Polymer,2004,45(2):547-555. [3]CARLSON H A.Protein flexibility is an important component of structure-based drug discovery[J].Current Pharmaceutical Design,2002,8(17):1571-1578. [4]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction:Progress and new directions in round XI[J].Proteins,2016,84(Suppl 1):4-14. [5]DENG H Y,JIA Y,ZHANG Y.Protein structure prediction[J].Acta Physica Sinica,2016,65(17):169-179. [6]ZHOU X,ZHENG W,LI Y,et al.I-TASSER-MTD:a deep-learning-based platform for multi-domain protein structure and function prediction[J].Nature Protocols,2022,17(10):2326-2353. [7]SCHWEDE T,KOPP J,GUEXN,et al.SWISS-MODEL:an automated protein homology-modeling server[J].Nucleic Acids Research,2003,31(13):3381-3385. [8]SONG Y,DIMAIO F,WANG R Y,et al.High-resolution comparative modeling with RosettaCM[J].Structure(London,England:1993),2013,21(10):1735-1742. [9]WEBB B,SALI A.Comparative Protein Structure ModelingUsing MODELLER[J].Current Protocols in Bioinformatics,2016,54:5.6.1-5.6.37. [10]XIA Y H,PENG C X,ZHOUX G,et al.A sequential niche multimodal conformational sampling algorithm for protein structure prediction[J].Bioinformatics(Oxford,England),2021,37(23):4357-4365. [11]ROHL C A,STRAUSS C E,MISURA K M,et al.Proteinstructure prediction using Rosetta[C]//Methods in Enzymology.Elsevier,2004:66-93. [12]ZHAO K L,LIU J,ZHOU X G,et al.MMpred:a distance-assisted multimodal conformation sampling for de novo protein structure prediction[J].Bioinformatics(Oxford,England),2021,37(23):4350-4356. [13]FENG Q,HOU M,LIU J,et al.Construct a variable-lengthfragment library for de novo protein structure prediction[J].Briefings in Bioinformatics,2022,23(3):bbac086. [14]XIE T Y,ZHOU X G,HU J,et al.Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm[J].Computer Science,2020,47(1):59-65. [15]ABRIATA L A,TAMÒ G E,DAL PERARO M.A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments[J].Proteins:Structure,Function,Bioinformatics,2019,87(12):1100-1112. [16]JUMPER J,EVANS R,PRITZEL A,et al.Highly accurate protein structure prediction with AlphaFold[J].Nature,2021,596(7873):583-589. [17]BAEK M,DIMAIO F,ANISHCHENKO I,et al.Accurate prediction of protein structures and interactions using a three-track neural network[J].Science,2021,373(6557):871-876. [18]SU H,WANG W,DU Z,et al.Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates[J].Advanced Science(Weinheim,Baden-Wurttemberg,Germany),2021,8(24):e2102592. [19]JONES D T,THORNTON J M.The impact of AlphaFold2 one year on[J].Nature methods,2022,19(1):15-20. [20]ALTSCHUL S F,MADDEN T L,SCHÄFFERA A,et al.Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[J].Nucleic Acids Research,1997,25(17):3389-3402. [21]ALTSCHUL S F,KOONIN E V.Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases[J].Trends in Biochemical Sciences,1998,23(11):444-447. [22]SÖDING J.Protein homology detection by HMM-HMM com-parison[J].Bioinformatics(Oxford,England),2005,21(7):951-960. [23]WU S,ZHANG Y.MUSTER:Improving protein sequence profile-profile alignments by using multiple sources of structure information[J].Proteins,2008,72(2):547-556. [24]YANG Y,FARAGGI E,ZHAO H,et al.Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates[J].Bioinformatics(Oxford,England),2011,27(15):2076-2082. [25]BUCHAN D W A,JONES D T.EigenTHREADER:analogous protein fold recognition by efficient contact map threading[J].Bioinformatics(Oxford,England),2017,33(17):2684-2690. [26]ZHENG W,WUYUN Q,LI Y,et al.Detecting distant-homology protein structures by aligning deep neural-network based contact maps[J].PLoS Computational Biology,2019,15(10):e1007411. [27]WU S,ZHANG Y.LOMETS:a local meta-threading-server for protein structure prediction[J].Nucleic Acids Research,2007,35(10):3375-3382. [28]ZHENG W,ZHANG C,WU Y ,et al.LOMETS2:improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins[J].Nucleic Acids Research,2019,47(W1):W429-W436. [29]ZHENG W,QI Q G,WU Y ,et al.LOMETS3:integrating deep learning and profile alignment for advanced protein template recognition and function annotation[J].Nucleic Acids Research,2022,50(W1):W454-W464. [30]REMMERT M,BIEGERT A,HAUSERA,et al.HHblits:lightning-fast iterative protein sequence searching by HMM-HMM alignment[J].Nature Methods,2012,9(2):173-175. [31]MIRDITA M,VON DEN DRIESCH L,GALIEZ C,et al.Uniclust databases of clustered and deeply annotated protein sequences and alignments[J].Nucleic Acids Research,2017,45(D1):D170-D176. [32]ZHAO K,XIA Y,ZHANG F,et al.Protein structure and fol-ding pathway prediction based on remote homologs recognition using PAthreader[J].Communications Biology,2023,6(1):243. [33]THORNTON J M,LASKOWSKI R A,BORKAKOTIN.Al-phaFold heralds a data-driven revolution in biology and medicine[J].Nature Medicine,2021,27(10):1666-1669. [34]TUNYASUVUNAKOOL K,ADLER J,WU Z,et al.Highly accurate protein structure prediction for the human proteome[J].Nature,2021,596(7873):590-596. [35]FU L,NIU B,ZHU Z,et al.CD-HIT:accelerated for clustering the next-generation sequencing data[J].Bioinformatics(Oxford,England),2012,28(23):3150-3152. [36]FOX N K,BRENNER S E,CHANDONIA J-M.SCOPe:Structural Classification of Proteins—extended,integrating SCOP and ASTRAL data and classification of new structures[J].Nucleic Acids Research,2014,42(D1):D304-D309. [37]LI Z W,X L Q,ZHOU X G,et al.Multimodal Optimization Algorithm for Protein Conformation Space[J].Computer Science,2020,47(7):161-165. [38]SEEMAYER S,GRUBER M,SÖDING J.CCMpred-fast andprecise prediction of protein residue-residue contacts from correlated mutations[J].Bioinformatics(Oxford,England),2014,30(21):3128-3130. [39]DU Z,PAN S,WUQ,et al.CATHER:a novel threading algorithm with predicted contacts[J].Bioinformatics,2020,36(7):2119-2125. [40]SKOLNICK J,GAO M,ZHOU H,et al.AlphaFold 2:why itworks and its implications for understanding the relationships of protein sequence,structure,and function[J].Journal of Chemical Information,2021,61(10):4827-4831. [41]CONNELL K B,MILLER E J,MARQUSEE S.The folding tra-jectory of RNase H is dominated by its topology and not local stability:a protein engineering study of variants that fold via two-state and three-state mechanisms[J].Journal of Molecular Biology,2009,391(2):450-460. |
|