Computer Science ›› 2021, Vol. 48 ›› Issue (8): 240-245.doi: 10.11896/jsjkx.200700076

• Artificial Intelligence • Previous Articles     Next Articles

Identifying Essential Proteins by Hybrid Deep Learning Model

LIU Wen-yang, GUO Yan-bu, LI Wei-hua   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-07-13 Revised:2020-09-19 Published:2021-08-10
  • About author:LIU Wen-yang,born in 1993,postgra-duate.His main research interests include deep learning and bioinformatics.(wyl20180901@163.com)LI Wei-hua,born in 1977,Ph.D,asso-ciate professor.Her main research interests include data mining and bio-informatics.
  • Supported by:
    Scientific Research Fundation of the Education Department of Yunnan Province,China(2019J0006),Innovative Research Team of Yunnan Province,China(2018HC019) and Yunnan University of Postgraduate Research and Innovation Foundation Project,China(2019152).

Abstract: Essential proteins are those proteins that are essential to the viability of the organism.The identification of essential proteins helps to understand the minimum requirements of cell life,discover disease-causing genes and drug targets,and is of great significance for the diagnosis and treatment of diseases and drug design.Existing methods show that integrating protein interaction networks and the relevant features of sequences can improve the accuracy and robustness of essential proteins identification.In this paper,gene expression profiles,protein interaction networks and subcellular location information are integrated,and a hybrid neural network model IEPHDL is designed.The IEPHDL model uses bidirectional gated recurrent unit to perform feature learning on gene expression profiles for the first time,and uses a deep neural network composed of multiple fully connected layers to perform deep relearning of three data features,to give full play to the advantages of bidirectional gated recurrent unit network,fully connected network and Node2vec in feature learning and representation,to achieve effective identification of essential proteins.Experiment results show that,IEPHDL has an accuracy of 88.7% for essential protein identification,an precision of 86.2%,and an AUC of 85.2%.The accuracy is 13%,8.9%,3.8% higher than the current optimal centrality method,machine learning method,and deep learning method in turn,and other indicators are also higher than the three methods.Finally,through experimental analysis,it is confirmed that the bidirectional gated recurrent unit network relies on its strong feature learning ability and plays a key role in essential protein identification.

Key words: Bidirectional gated recurrent unit network, Deep learning, Essential proteins, Node2vec, Protein interaction network

CLC Number: 

  • TP391
[1]LI M,ZHENG R,ZHANG H,et al.Effective identification of essential proteins based on priori knowledge,network topology and gene expressions [J].Methods,2014,67(3):325-333.
[2]ZHANG X,ACENCIO M L,LEMKE N.Predicting essentialgenes and proteins based on machine learning and network topological features:a comprehensive review[J].Frontiers in Physiology,2016,7:75.
[3]HONG H Y,LIU W.Efficient prediction algorithm of essential proteins based on PPI network [J].Computer Science,2016,43(S2):16-20,25.
[4]JEONG H,MASON S P,BARABASI A,et al.Lethality and centrality in protein networks [J].Nature,2001,411(6833):41-42.
[5]WUCHTY S,STADLER P F.Centers of complex networks[J].Journal of Theoretical Biology,2003,223(1):45-53.
[6]JOY M P,BROCK A,INGBER D E,et al.High-Betweenness Proteins in the Yeast Protein Interaction Network [J].BioMed Research International,2005,2005(2):96-103.
[7]HU J,ZHU H W,MAO Y M,Identification of essential proteins based on time-weighted PPI network [J].Computer Engineering and Applications,2019,942(23):155-167.
[8]TANG X,WANG J,ZHONG J,et al.Predicting essential proteins based on weighted degree centrality[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(2):407-418.
[9]LI M,ZHANG H,WANG J,et al.A new essential protein discovery method based on the integration of protein-proteinintera-ction and gene expression data [J].BMC Systems Biology,2012,6(1):15-30.
[10]ZENG M,LI M,FEI Z,et al.A deep learning framework foridentifying essential proteins by integrating multiple types of biological information [J].IEEE/ACM Trans Comput Biol Bioinform,2019,1(18):296-305.
[11]HONG H Y,LIU W.Research on essential Protein Recognition Method Based on Improved PSO Algorithm[J].Computer Scien-ce,2017,44(10):38-44.
[12]FAKOOR R,LADHAK F,NAZI A,et al.Using deep learning to enhance cancer diagnosis and classification [C]//Proceedings of the International Conference on Machine Learning.New York,USA:ACM,2013:1-6.
[13]ZENG M,LI M,WU F X,et al.DeepEP:a deep learning framework for identifying essential proteins [J].BMC Bioinformatics,2019,20(S16):506.
[14]CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [J].arXiv:1406.1078.2014.
[15]GUO Y,ZHOU D,NIE R,et al.DeepANF:A deep attentive neural framework with distributed representation for chromatin accessibility prediction [J].Neurocomputing,2020,379:305-318.
[16]LI G,LI M,WANG J,et al.Predicting essential proteins based on subcellular localization,orthology and PPI networks[J].BMC Bioinformatics,2016,17(8):279.
[17]KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNet Classification with Deep Convolutional Neural Networks[C]//Proceedings of the International Conference on Neural Information Processing Systems.2012:1106-1114.
[18]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization [J].arXiv:1412.6980,2014.
[19]STARK C,BREITKREUTZ B,REGULY T,et al.BioGRID:a general repository for interaction datasets [J].Nucleic Acids Research,2006,34(90001):535-539.
[20]TU B P,KUDLICKI A,ROWICKA M,et al.Logic of the Yeast Metabolic Cycle:Temporal Compartmentalization of Cellular Processes [J].Science,2005,310(5751):1152-1158.
[21]BINDER J X,PLETSCHER-FRANKILD S,TSAFOU K,et al.COMPARTMENTS:unification and visualization of protein subcellular localization evidence [J].Database,2014,2014(2):bau012.
[22]EPPIG J T,BLAKE J A,BULT C J,et al.The Mouse Genome Database (MGD):comprehensive resource for genetics and genomics of the laboratory mouse [J].Nucleic Acids Research,2012,40:881-886.
[23]CHERRY J M,ADLER C,BALL C A,et al.SGD:Saccharomyces Genome Database [J].Nucleic Acids Research,1998,26(1):73-79.
[24]MAGRANE M.UniProtKnowledgebase:a hub of integratedprotein data [J].Database,Volume 2011,2011(3):bar009.
[25]HARRIS T W,ANTOSHECHKIN I,BIERI T,et al.WormBase:a comprehensive resource for nematode research [J].Nucleic Acids Research,2010,38(2):463-467.
[26]MCQUILTON P,PIERRE S E S,THURMOND J.FlyBase 101-the basics of navigating FlyBase [J].Nucleic Acids Research,2012,40:706-714.
[27]WANG J,LI M,WANG H,et al.Identification of Essential Proteins Based on Edge Clustering Coefficient [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2012,9(4):1070-1080.
[28]LI M,WANG J,CHEN X,et al.A local average connectivity-based method for identifying essential proteins from the network level [J].Computational Biology and Chemistry,2011,35(3):143-150.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[9] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[10] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[11] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[12] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[13] LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[14] SUN Fu-quan, CUI Zhi-qing, ZOU Peng, ZHANG Kun. Brain Tumor Segmentation Algorithm Based on Multi-scale Features [J]. Computer Science, 2022, 49(6A): 12-16.
[15] KANG Yan, XU Yu-long, KOU Yong-qi, XIE Si-yu, YANG Xue-kun, LI Hao. Drug-Drug Interaction Prediction Based on Transformer and LSTM [J]. Computer Science, 2022, 49(6A): 17-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!