Computer Science ›› 2024, Vol. 51 ›› Issue (8): 117-123.doi: 10.11896/jsjkx.231100014

• Database & Big Data & Data Science • Previous Articles     Next Articles

Hohai Graphic Protein Data Bank and Prediction Model

WEI Xiangxiang, MENG Zhaohui   

  1. School of Computer and Software,Hohai University,Nanjing 211106,China
  • Received:2023-11-01 Revised:2024-03-05 Online:2024-08-15 Published:2024-08-13
  • About author:WEI Xiangxiang,born in 1999,master.His main research interests include artificial intelligence and neural network.
    MENG Zhaohui,born in 1968,associate professor.His main research interests include neural network and artificial intelligence.

Abstract: Protein is a kind of substance with spatial structure.The main goal of protein structure prediction is to extract effective information from existing large-scale protein datasets,so as to predict the structure of proteins in nature.At present,one of the problems in protein structure prediction experiments is the lack of data sets that can further reflect the spatial structure of proteins.Although the current mainstream PDB(protein data bank) is experimentally measured,it does not utilize the spatial characteristics of proteins,and there are problems of doping nucleic acid data and partial data is incomplete.In view of the above pro-blems,this paper studies the prediction of protein from the perspective of spatial structure.Based on the original PDB,the Hohai graphic protein data bank is proposed.The dataset expresses the spatial structure characteristics of proteins based on the graph structure.Based on the traditional Transformer network model,relevant protein structure prediction experiments are carried out on the new dataset,and the prediction accuracy of HohaiGPDB could reach 59.38%,which proves the research value of Hohai-GPDB.The HohaiGPDB could be used as a general data set for protein-related studies.

Key words: Hohai graphic protein data bank, Protein spatial structure, Protein structure prediction, Transformer model

CLC Number: 

  • TP391
[1]BERMAN H M,BATTISTUZ T,BHAT T N,et al.The protein data bank[J].Acta Crystallographica Section D:Biological Crystallography,2002,58(6):899-907.
[2]BATEMAN A,MARTIN M J,ORCHARD S,et al.UniProt:the universal protein knowledgebase in 2023[J].Nucleic Acids Research,2022,51(D1):D523-D531.
[3]PENG C X,LIANG F,XIA Y H,et al.Recent Advances and Challenges in Protein Structure Prediction[J].Journal of Chemical Information and Modeling,2023,64(1):76-95.
[4]JUMPER J,EVANS R,PRITZEL A,et al.Highly accurate protein structure prediction with AlphaFold[J].Nature,2021,596(7873):583-589.
[5]CHEN B,CHENG X,GENG Y,et al.xtrimopglm:Unified100b-scale pre-trained transformer for deciphering the language of protein[J].arXiv:2401.06199v1,2024.
[6]BRYANT P,POZZATI G,ELOFSSON A.Improved prediction of protein-protein interactions using AlphaFold2[J].Nature Communications,2022,13(1):1265.
[7]AKDEL M,PIRES D E V,PARDO E P,et al.A structural bio-logy community assessment of AlphaFold2 applications[J].Nature Structural & Molecular Biology,2022,29(11):1056-1067.
[8]JISNA V A,JAYARAJ P B.Protein structure prediction:conventional and deep learning perspectives[J].The Protein Journal,2021,40(4):522-544.
[9]PEARCE R,ZHANG Y.Toward the solution of the proteinstructure prediction problem[J].Journal of Biological Chemistry,2021,297(1).
[10]KANDATHIL S M,GREENER J G,LAU A M,et al.Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins[J].Proceedings of the National Academy of Sciences,2022,119(4):e2113348119.
[11]WEISSENOW K,HEINZINGER M,STEINEGGER M,et al.Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies[J].arXiv:2022.11.14.516473v2,2022.
[12]ALQURAISHI M.End-to-end differentiable learning of protein structure[J].Cell Systems,2019,8(4):292-301.e3.
[13]INGRAHAM J,RIESSELMAN A,SANDER C,et al.Learning protein structure with a differentiable simulator[C]//International Conference on Learning Representations.2018.
[14]JONES D T,THORNTON J M.The impact of AlphaFold2 one year on[J].Nature Methods,2022,19(1):15-20.
[15]WANG W,PENG Z,YANG J.Single-sequence protein structure prediction using supervised transformer protein language models[J].Nature Computational Science,2022,2(12):804-814.
[16]LIN Z,AKIN H,RAO R,et al.Evolutionary-scale prediction of atomic-level protein structure with a language model[J].Science,2023,379(6637):1123-1130.
[17]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017.
[18]BERMAN H M.The protein data bank:a historical perspective[J].Acta Crystallographica Section A:Foundations of Crystallography,2008,64(1):88-95.
[19]AL-LAZIKANI B,JUNG J,ANG Z,et al.Protein structure prediction[J].Current Opinion in Chemical Biology,2001,5(1):51-56.
[20]PHAN H K,DANG T H.Protein structure prediction usingDeep Learning[R].VNU University of Engineering and Technology,2018.
[21]TORRISI M,POLLASTRI G,LE Q.Deep learning methods in protein structure prediction[J].Computational and Structural Biotechnology Journal,2020,18:1301-1310.
[22]SKWARK M J,RAIMONDI D,MICHEL M,et al.Improvedcontact predictions using the recognition of protein like contact patterns[J].PLoS Computational Biology,2014,10(11):e1003889.
[23]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[24]SRIVASTAVA A,NAGAI T,et al.Role of computationalmethods in going beyond X-ray crystallography to explore protein structure and dynamics[J].International Journal of Molecular Sciences,2018,19(11):3401.
[25]BILLETER M,WAGNER G,WÜTHRICH K.Solution NMRstructure determination of proteins revisited[J].Journal of Biomolecular NMR,2008,42:155-158.
[1] WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin. Named Entity Recognition Approach of Judicial Documents Based on Transformer [J]. Computer Science, 2024, 51(6A): 230500164-9.
[2] WU Fengyuan, LIU Ming, YIN Xiaokang, CAI Ruijie, LIU Shengli. Remote Access Trojan Traffic Detection Based on Fusion Sequences [J]. Computer Science, 2024, 51(6): 434-442.
[3] XUE Fenghao, JIANG Haibo, TANG Dan. Review of Deep Learning Applications in Healthcare [J]. Computer Science, 2023, 50(4): 1-15.
[4] YANG Jin-cai, CAO Yuan, HU Quan, SHEN Xian-jun. Relation Classification of Chinese Causal Compound Sentences Based on Transformer Model and Relational Word Feature [J]. Computer Science, 2021, 48(6A): 295-298.
[5] LI Zhang-wei, XIAO Lu-qian, HAO Xiao-hu, ZHOU Xiao-gen, ZHANG Gui-jun. Multimodal Optimization Algorithm for Protein Conformation Space [J]. Computer Science, 2020, 47(7): 161-165.
[6] XIE Teng-yu,ZHOU Xiao-gen,HU Jun,ZHANG Gui-jun. Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm [J]. Computer Science, 2020, 47(1): 59-65.
[7] LI Zhang-wei, HAO Xiao-hu and ZHANG Gui-jun. Replica Exchange Based Local Enhanced Differential Evolution Searching Method in Ab-initio Protein Structure Prediction [J]. Computer Science, 2017, 44(5): 211-217.
[8] DONG Hui, HAO Xiao-hu and ZHANG Gui-jun. Local Enhancement Differential Evolution Searching Method for Protein Conformational Space [J]. Computer Science, 2015, 42(Z11): 22-26.
[9] HAO Xiao-hu, ZHANG Gui-jun, ZHOU Xiao-gen, CHENG Zheng-hua and ZHANG Qi-peng. Protein Conformational Space Optimization Algorithm Based on Fragment-assembly [J]. Computer Science, 2015, 42(3): 237-240.
[10] . [J]. Computer Science, 2008, 35(10): 243-245.
[11] LU Zhi-Peng, HUANG Wen-Qi (School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074). [J]. Computer Science, 2005, 32(11): 148-149.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!