计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 117-123.doi: 10.11896/jsjkx.231100014
魏想想, 孟朝晖
WEI Xiangxiang, MENG Zhaohui
摘要: 蛋白质是一种具有空间结构的物质。蛋白质结构预测的主要目标是从已有的大规模的蛋白质数据集中提取有效的信息,从而预测自然界中蛋白质的结构。目前蛋白质结构预测实验存在的一个问题是,缺少能够进一步反映出蛋白质空间结构特征的数据集。当前主流的 PDB 蛋白质数据集虽然是经过实验测得,但没有利用到蛋白质的空间特征,而且存在掺杂核酸数据和部分数据不完整的问题。针对以上问题,从蛋白质的空间结构角度来研究蛋白质的预测。在原始 PDB 数据集的基础上,提出了河海图结构蛋白质数据集(Hohai Graphic Protein Data Bank,HohaiGPDB)。该数据集以图结构为基础,表达出了蛋白质的空间结构特征。基于传统 Transformer 网络模型对新的数据集进行了相关的蛋白质结构预测实验,在 HohaiGPDB 数据集上的预测准确率可以达到 59.38%,证明了HohaiGPDB数据集的研究价值。HohaiGPDB 数据集可以作为蛋白质相关研究的通用数据集。
中图分类号:
[1]BERMAN H M,BATTISTUZ T,BHAT T N,et al.The protein data bank[J].Acta Crystallographica Section D:Biological Crystallography,2002,58(6):899-907. [2]BATEMAN A,MARTIN M J,ORCHARD S,et al.UniProt:the universal protein knowledgebase in 2023[J].Nucleic Acids Research,2022,51(D1):D523-D531. [3]PENG C X,LIANG F,XIA Y H,et al.Recent Advances and Challenges in Protein Structure Prediction[J].Journal of Chemical Information and Modeling,2023,64(1):76-95. [4]JUMPER J,EVANS R,PRITZEL A,et al.Highly accurate protein structure prediction with AlphaFold[J].Nature,2021,596(7873):583-589. [5]CHEN B,CHENG X,GENG Y,et al.xtrimopglm:Unified100b-scale pre-trained transformer for deciphering the language of protein[J].arXiv:2401.06199v1,2024. [6]BRYANT P,POZZATI G,ELOFSSON A.Improved prediction of protein-protein interactions using AlphaFold2[J].Nature Communications,2022,13(1):1265. [7]AKDEL M,PIRES D E V,PARDO E P,et al.A structural bio-logy community assessment of AlphaFold2 applications[J].Nature Structural & Molecular Biology,2022,29(11):1056-1067. [8]JISNA V A,JAYARAJ P B.Protein structure prediction:conventional and deep learning perspectives[J].The Protein Journal,2021,40(4):522-544. [9]PEARCE R,ZHANG Y.Toward the solution of the proteinstructure prediction problem[J].Journal of Biological Chemistry,2021,297(1). [10]KANDATHIL S M,GREENER J G,LAU A M,et al.Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins[J].Proceedings of the National Academy of Sciences,2022,119(4):e2113348119. [11]WEISSENOW K,HEINZINGER M,STEINEGGER M,et al.Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies[J].arXiv:2022.11.14.516473v2,2022. [12]ALQURAISHI M.End-to-end differentiable learning of protein structure[J].Cell Systems,2019,8(4):292-301.e3. [13]INGRAHAM J,RIESSELMAN A,SANDER C,et al.Learning protein structure with a differentiable simulator[C]//International Conference on Learning Representations.2018. [14]JONES D T,THORNTON J M.The impact of AlphaFold2 one year on[J].Nature Methods,2022,19(1):15-20. [15]WANG W,PENG Z,YANG J.Single-sequence protein structure prediction using supervised transformer protein language models[J].Nature Computational Science,2022,2(12):804-814. [16]LIN Z,AKIN H,RAO R,et al.Evolutionary-scale prediction of atomic-level protein structure with a language model[J].Science,2023,379(6637):1123-1130. [17]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017. [18]BERMAN H M.The protein data bank:a historical perspective[J].Acta Crystallographica Section A:Foundations of Crystallography,2008,64(1):88-95. [19]AL-LAZIKANI B,JUNG J,ANG Z,et al.Protein structure prediction[J].Current Opinion in Chemical Biology,2001,5(1):51-56. [20]PHAN H K,DANG T H.Protein structure prediction usingDeep Learning[R].VNU University of Engineering and Technology,2018. [21]TORRISI M,POLLASTRI G,LE Q.Deep learning methods in protein structure prediction[J].Computational and Structural Biotechnology Journal,2020,18:1301-1310. [22]SKWARK M J,RAIMONDI D,MICHEL M,et al.Improvedcontact predictions using the recognition of protein like contact patterns[J].PLoS Computational Biology,2014,10(11):e1003889. [23]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [24]SRIVASTAVA A,NAGAI T,et al.Role of computationalmethods in going beyond X-ray crystallography to explore protein structure and dynamics[J].International Journal of Molecular Sciences,2018,19(11):3401. [25]BILLETER M,WAGNER G,WÜTHRICH K.Solution NMRstructure determination of proteins revisited[J].Journal of Biomolecular NMR,2008,42:155-158. |
|