Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230600225-7.doi: 10.11896/jsjkx.230600225

• Artificial Intelligenc • Previous Articles     Next Articles

Remote Template Detection Algorithm and Its Application in Protein Structure Prediction

LIANG Fang, XU Xuyao, ZHAO Kailong, ZHAO Xuanfeng, ZHANG Guijun   

  1. School of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
  • Published:2024-06-06
  • About author:LIANG Fang,born in 1999,research assistant.Her main research interests include intelligent information proces-sing,optimization theory and algorithm design and bioinformatics.
    ZHANG Guijun,born in 1974,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.50785G).His main research interests include intelligent information processing,optimization theo-ry and algorithm design and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(62173304)and National Key Research and Development Program of China(2019YFE0126100).

Abstract: In the development process from traditional force field-driven protein structure prediction to current data-driven AI structure modeling,protein structure template detection is a key module in protein structure prediction,and how to detect high-precision protein structure remote templates is important to improve the prediction accuracy of structures.In this paper,a remote homology template detection algorithm ASEalign based on adaptive eigenvector extraction is proposed.Firstly,a deep learning technique of multi-feature information fusion is used to predict protein contact maps.Then,a multi-dimensional feature scoring function is designed to fuse contact maps,secondary structures,sequence profiles-profiles alignment and solvent accessibility,and the eigenvalue and eigenvector in the contact map matrix extracted by adaptive template alignment is performed.Finally,the detected high-quality templates are input to AlphaFold2 for structural modeling.Results on the test set of 135 proteins indicate that,compared to HHsearch,ASEalign improves the accuracy by 11.5%.Meanwhile,its accuracy of modeled structure is better than that of AlphaFold2.

Key words: Template detection, Template modeling, Contact map prediction, Deep learning, Secondary structure

CLC Number: 

  • TP389
[1]DILL K A,MACCALLUM J L.The protein-folding problem,50 years on[J].Science,2012,338(6110):1042-1046.
[2]CHEUNG M S,CHAVEZ L L,ONUCHIC J N.The energylandscape for protein folding and possible connections to function[J].Polymer,2004,45(2):547-555.
[3]CARLSON H A.Protein flexibility is an important component of structure-based drug discovery[J].Current Pharmaceutical Design,2002,8(17):1571-1578.
[4]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction:Progress and new directions in round XI[J].Proteins,2016,84(Suppl 1):4-14.
[5]DENG H Y,JIA Y,ZHANG Y.Protein structure prediction[J].Acta Physica Sinica,2016,65(17):169-179.
[6]ZHOU X,ZHENG W,LI Y,et al.I-TASSER-MTD:a deep-learning-based platform for multi-domain protein structure and function prediction[J].Nature Protocols,2022,17(10):2326-2353.
[7]SCHWEDE T,KOPP J,GUEXN,et al.SWISS-MODEL:an automated protein homology-modeling server[J].Nucleic Acids Research,2003,31(13):3381-3385.
[8]SONG Y,DIMAIO F,WANG R Y,et al.High-resolution comparative modeling with RosettaCM[J].Structure(London,England:1993),2013,21(10):1735-1742.
[9]WEBB B,SALI A.Comparative Protein Structure ModelingUsing MODELLER[J].Current Protocols in Bioinformatics,2016,54:5.6.1-5.6.37.
[10]XIA Y H,PENG C X,ZHOUX G,et al.A sequential niche multimodal conformational sampling algorithm for protein structure prediction[J].Bioinformatics(Oxford,England),2021,37(23):4357-4365.
[11]ROHL C A,STRAUSS C E,MISURA K M,et al.Proteinstructure prediction using Rosetta[C]//Methods in Enzymology.Elsevier,2004:66-93.
[12]ZHAO K L,LIU J,ZHOU X G,et al.MMpred:a distance-assisted multimodal conformation sampling for de novo protein structure prediction[J].Bioinformatics(Oxford,England),2021,37(23):4350-4356.
[13]FENG Q,HOU M,LIU J,et al.Construct a variable-lengthfragment library for de novo protein structure prediction[J].Briefings in Bioinformatics,2022,23(3):bbac086.
[14]XIE T Y,ZHOU X G,HU J,et al.Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm[J].Computer Science,2020,47(1):59-65.
[15]ABRIATA L A,TAMÒ G E,DAL PERARO M.A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments[J].Proteins:Structure,Function,Bioinformatics,2019,87(12):1100-1112.
[16]JUMPER J,EVANS R,PRITZEL A,et al.Highly accurate protein structure prediction with AlphaFold[J].Nature,2021,596(7873):583-589.
[17]BAEK M,DIMAIO F,ANISHCHENKO I,et al.Accurate prediction of protein structures and interactions using a three-track neural network[J].Science,2021,373(6557):871-876.
[18]SU H,WANG W,DU Z,et al.Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates[J].Advanced Science(Weinheim,Baden-Wurttemberg,Germany),2021,8(24):e2102592.
[19]JONES D T,THORNTON J M.The impact of AlphaFold2 one year on[J].Nature methods,2022,19(1):15-20.
[20]ALTSCHUL S F,MADDEN T L,SCHÄFFERA A,et al.Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[J].Nucleic Acids Research,1997,25(17):3389-3402.
[21]ALTSCHUL S F,KOONIN E V.Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases[J].Trends in Biochemical Sciences,1998,23(11):444-447.
[22]SÖDING J.Protein homology detection by HMM-HMM com-parison[J].Bioinformatics(Oxford,England),2005,21(7):951-960.
[23]WU S,ZHANG Y.MUSTER:Improving protein sequence profile-profile alignments by using multiple sources of structure information[J].Proteins,2008,72(2):547-556.
[24]YANG Y,FARAGGI E,ZHAO H,et al.Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates[J].Bioinformatics(Oxford,England),2011,27(15):2076-2082.
[25]BUCHAN D W A,JONES D T.EigenTHREADER:analogous protein fold recognition by efficient contact map threading[J].Bioinformatics(Oxford,England),2017,33(17):2684-2690.
[26]ZHENG W,WUYUN Q,LI Y,et al.Detecting distant-homology protein structures by aligning deep neural-network based contact maps[J].PLoS Computational Biology,2019,15(10):e1007411.
[27]WU S,ZHANG Y.LOMETS:a local meta-threading-server for protein structure prediction[J].Nucleic Acids Research,2007,35(10):3375-3382.
[28]ZHENG W,ZHANG C,WU Y ,et al.LOMETS2:improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins[J].Nucleic Acids Research,2019,47(W1):W429-W436.
[29]ZHENG W,QI Q G,WU Y ,et al.LOMETS3:integrating deep learning and profile alignment for advanced protein template recognition and function annotation[J].Nucleic Acids Research,2022,50(W1):W454-W464.
[30]REMMERT M,BIEGERT A,HAUSERA,et al.HHblits:lightning-fast iterative protein sequence searching by HMM-HMM alignment[J].Nature Methods,2012,9(2):173-175.
[31]MIRDITA M,VON DEN DRIESCH L,GALIEZ C,et al.Uniclust databases of clustered and deeply annotated protein sequences and alignments[J].Nucleic Acids Research,2017,45(D1):D170-D176.
[32]ZHAO K,XIA Y,ZHANG F,et al.Protein structure and fol-ding pathway prediction based on remote homologs recognition using PAthreader[J].Communications Biology,2023,6(1):243.
[33]THORNTON J M,LASKOWSKI R A,BORKAKOTIN.Al-phaFold heralds a data-driven revolution in biology and medicine[J].Nature Medicine,2021,27(10):1666-1669.
[34]TUNYASUVUNAKOOL K,ADLER J,WU Z,et al.Highly accurate protein structure prediction for the human proteome[J].Nature,2021,596(7873):590-596.
[35]FU L,NIU B,ZHU Z,et al.CD-HIT:accelerated for clustering the next-generation sequencing data[J].Bioinformatics(Oxford,England),2012,28(23):3150-3152.
[36]FOX N K,BRENNER S E,CHANDONIA J-M.SCOPe:Structural Classification of Proteins—extended,integrating SCOP and ASTRAL data and classification of new structures[J].Nucleic Acids Research,2014,42(D1):D304-D309.
[37]LI Z W,X L Q,ZHOU X G,et al.Multimodal Optimization Algorithm for Protein Conformation Space[J].Computer Science,2020,47(7):161-165.
[38]SEEMAYER S,GRUBER M,SÖDING J.CCMpred-fast andprecise prediction of protein residue-residue contacts from correlated mutations[J].Bioinformatics(Oxford,England),2014,30(21):3128-3130.
[39]DU Z,PAN S,WUQ,et al.CATHER:a novel threading algorithm with predicted contacts[J].Bioinformatics,2020,36(7):2119-2125.
[40]SKOLNICK J,GAO M,ZHOU H,et al.AlphaFold 2:why itworks and its implications for understanding the relationships of protein sequence,structure,and function[J].Journal of Chemical Information,2021,61(10):4827-4831.
[41]CONNELL K B,MILLER E J,MARQUSEE S.The folding tra-jectory of RNase H is dominated by its topology and not local stability:a protein engineering study of variants that fold via two-state and three-state mechanisms[J].Journal of Molecular Biology,2009,391(2):450-460.
[1] ZHANG Le, YU Ying, GE Hao. Mural Inpainting Based on Fast Fourier Convolution and Feature Pruning Coordinate Attention [J]. Computer Science, 2024, 51(6A): 230400083-9.
[2] WU Yibo, HAO Yingguang, WANG Hongyu. Rice Defect Segmentation Based on Dual-stream Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230600107-8.
[3] HOU Linhao, LIU Fan. Remote Sensing Image Fusion Combining Multi-scale Convolution Blocks and Dense Convolution Blocks [J]. Computer Science, 2024, 51(6A): 230400110-6.
[4] HUANG Yuanhang, BIAN Shan, WANG Chuntao. Gaussian Enhancement Module for Reinforcing High-frequency Details in Camera ModelIdentification [J]. Computer Science, 2024, 51(6A): 230700125-5.
[5] SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6.
[6] SHI Songhao, WANG Xiaodan, YANG Chunxiao, WANG Yifei. SAR Image Target Recognition Based on Cross Domain Few Shot Learning [J]. Computer Science, 2024, 51(6A): 230800136-7.
[7] LI Yuanxin, GUO Zhongfeng, YANG Junlin. Container Lock Hole Recognition Algorithm Based on Lightweight YOLOv5s [J]. Computer Science, 2024, 51(6A): 230900021-6.
[8] HUANG Haixin, WU Di. Steel Defect Detection Based on Improved YOLOv7 [J]. Computer Science, 2024, 51(6A): 230800018-5.
[9] LYU Yiming, WANG Jiyang. Iron Ore Image Classification Method Based on Improved Efficientnetv2 [J]. Computer Science, 2024, 51(6A): 230600212-6.
[10] YANG Xiuzhang, WU Shuai, REN Tianshu, LIAO Wenjing, XIANG Meiyu, YU Xiaomin, LIU Jianyi, CHEN Dengjian. Complex Environment License Plate Recognition Algorithm Based on Improved Image Enhancement and CNN [J]. Computer Science, 2024, 51(6A): 220200162-7.
[11] SONG Zhen, WANG Jiqiang, HOU Moyu, ZHAO Lin. Conveyor Belt Defect Detection Network Combining Attention Mechanism with Line Laser Assistance [J]. Computer Science, 2024, 51(6A): 230800115-6.
[12] WU Chunming, LIU Yali. Method for Lung Nodule Detection on CT Images Using Improved YOLOv5 [J]. Computer Science, 2024, 51(6A): 230500019-6.
[13] YIN Xudong, CHEN Junyang, ZHOU Bo. Study on Industrial Defect Augmentation Data Filtering Based on OOD Scores [J]. Computer Science, 2024, 51(6A): 230700111-7.
[14] QIAO Hong, XING Hongjie. Attention-based Multi-scale Distillation Anomaly Detection [J]. Computer Science, 2024, 51(6A): 230300223-11.
[15] SI Jia, LIANG Jianfeng, XIE Shuo, DENG Yingjun. Research Progress of Anomaly Detection in IaaS Cloud Operation Driven by Deep Learning [J]. Computer Science, 2024, 51(6A): 230400016-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!