计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 240-245.doi: 10.11896/jsjkx.200700076

• 人工智能 • 上一篇    下一篇

识别关键蛋白质的混合深度学习模型

刘文洋, 郭延哺, 李维华   

  1. 云南大学信息学院 昆明650500
  • 收稿日期:2020-07-13 修回日期:2020-09-19 发布日期:2021-08-10
  • 通讯作者: 李维华(lywey@163.com)
  • 基金资助:
    云南省教育厅科学研究基金(2019J0006);云南省创新团队项目(2018HC019);云南大学研究生科研创新基金项目(2019152)

Identifying Essential Proteins by Hybrid Deep Learning Model

LIU Wen-yang, GUO Yan-bu, LI Wei-hua   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-07-13 Revised:2020-09-19 Published:2021-08-10
  • About author:LIU Wen-yang,born in 1993,postgra-duate.His main research interests include deep learning and bioinformatics.(wyl20180901@163.com)LI Wei-hua,born in 1977,Ph.D,asso-ciate professor.Her main research interests include data mining and bio-informatics.
  • Supported by:
    Scientific Research Fundation of the Education Department of Yunnan Province,China(2019J0006),Innovative Research Team of Yunnan Province,China(2018HC019) and Yunnan University of Postgraduate Research and Innovation Foundation Project,China(2019152).

摘要: 关键蛋白质是有机体生存不可或缺的蛋白质。关键蛋白质的识别有助于理解细胞生命的最低要求、发现致病基因和药物靶点,对疾病的诊治和药物设计等有重要意义。现有方法表明整合蛋白质互作网络和序列的相关特征可以提高对关键蛋白质的识别精度和鲁棒性。文中整合了基因表达谱、蛋白质互作网络和亚细胞位置信息,设计了一种混合神经网络模型IEPHDL。该模型首次使用双向门控循环单元对基因表达谱进行特征学习,使用由多个全连接层组成的深度神经网络对3种数据特征进行深度再学习,充分发挥双向门控循环单元网络、全连接网络、Node2vec在特征学习和表示方面的优势,实现对关键蛋白质的有效识别。实验表明,IEPHDL对关键蛋白质识别的准确率为88.7%,精确率为86.2%,AUC为85.2%,其准确率比当前最优的中心性方法、机器学习方法、深度学习方法依次高出13%,8.9%,3.8%,其他指标也均高于这三者。最后,通过实验分析,证实了双向门控循环单元网络依赖自身强大的特征学习能力,在关键蛋白质识别中起着关键作用。

关键词: Node2vec, 蛋白质互作网络, 关键蛋白质, 深度学习, 双向门控循环单元网络

Abstract: Essential proteins are those proteins that are essential to the viability of the organism.The identification of essential proteins helps to understand the minimum requirements of cell life,discover disease-causing genes and drug targets,and is of great significance for the diagnosis and treatment of diseases and drug design.Existing methods show that integrating protein interaction networks and the relevant features of sequences can improve the accuracy and robustness of essential proteins identification.In this paper,gene expression profiles,protein interaction networks and subcellular location information are integrated,and a hybrid neural network model IEPHDL is designed.The IEPHDL model uses bidirectional gated recurrent unit to perform feature learning on gene expression profiles for the first time,and uses a deep neural network composed of multiple fully connected layers to perform deep relearning of three data features,to give full play to the advantages of bidirectional gated recurrent unit network,fully connected network and Node2vec in feature learning and representation,to achieve effective identification of essential proteins.Experiment results show that,IEPHDL has an accuracy of 88.7% for essential protein identification,an precision of 86.2%,and an AUC of 85.2%.The accuracy is 13%,8.9%,3.8% higher than the current optimal centrality method,machine learning method,and deep learning method in turn,and other indicators are also higher than the three methods.Finally,through experimental analysis,it is confirmed that the bidirectional gated recurrent unit network relies on its strong feature learning ability and plays a key role in essential protein identification.

Key words: Bidirectional gated recurrent unit network, Deep learning, Essential proteins, Node2vec, Protein interaction network

中图分类号: 

  • TP391
[1]LI M,ZHENG R,ZHANG H,et al.Effective identification of essential proteins based on priori knowledge,network topology and gene expressions [J].Methods,2014,67(3):325-333.
[2]ZHANG X,ACENCIO M L,LEMKE N.Predicting essentialgenes and proteins based on machine learning and network topological features:a comprehensive review[J].Frontiers in Physiology,2016,7:75.
[3]HONG H Y,LIU W.Efficient prediction algorithm of essential proteins based on PPI network [J].Computer Science,2016,43(S2):16-20,25.
[4]JEONG H,MASON S P,BARABASI A,et al.Lethality and centrality in protein networks [J].Nature,2001,411(6833):41-42.
[5]WUCHTY S,STADLER P F.Centers of complex networks[J].Journal of Theoretical Biology,2003,223(1):45-53.
[6]JOY M P,BROCK A,INGBER D E,et al.High-Betweenness Proteins in the Yeast Protein Interaction Network [J].BioMed Research International,2005,2005(2):96-103.
[7]HU J,ZHU H W,MAO Y M,Identification of essential proteins based on time-weighted PPI network [J].Computer Engineering and Applications,2019,942(23):155-167.
[8]TANG X,WANG J,ZHONG J,et al.Predicting essential proteins based on weighted degree centrality[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(2):407-418.
[9]LI M,ZHANG H,WANG J,et al.A new essential protein discovery method based on the integration of protein-proteinintera-ction and gene expression data [J].BMC Systems Biology,2012,6(1):15-30.
[10]ZENG M,LI M,FEI Z,et al.A deep learning framework foridentifying essential proteins by integrating multiple types of biological information [J].IEEE/ACM Trans Comput Biol Bioinform,2019,1(18):296-305.
[11]HONG H Y,LIU W.Research on essential Protein Recognition Method Based on Improved PSO Algorithm[J].Computer Scien-ce,2017,44(10):38-44.
[12]FAKOOR R,LADHAK F,NAZI A,et al.Using deep learning to enhance cancer diagnosis and classification [C]//Proceedings of the International Conference on Machine Learning.New York,USA:ACM,2013:1-6.
[13]ZENG M,LI M,WU F X,et al.DeepEP:a deep learning framework for identifying essential proteins [J].BMC Bioinformatics,2019,20(S16):506.
[14]CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [J].arXiv:1406.1078.2014.
[15]GUO Y,ZHOU D,NIE R,et al.DeepANF:A deep attentive neural framework with distributed representation for chromatin accessibility prediction [J].Neurocomputing,2020,379:305-318.
[16]LI G,LI M,WANG J,et al.Predicting essential proteins based on subcellular localization,orthology and PPI networks[J].BMC Bioinformatics,2016,17(8):279.
[17]KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNet Classification with Deep Convolutional Neural Networks[C]//Proceedings of the International Conference on Neural Information Processing Systems.2012:1106-1114.
[18]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization [J].arXiv:1412.6980,2014.
[19]STARK C,BREITKREUTZ B,REGULY T,et al.BioGRID:a general repository for interaction datasets [J].Nucleic Acids Research,2006,34(90001):535-539.
[20]TU B P,KUDLICKI A,ROWICKA M,et al.Logic of the Yeast Metabolic Cycle:Temporal Compartmentalization of Cellular Processes [J].Science,2005,310(5751):1152-1158.
[21]BINDER J X,PLETSCHER-FRANKILD S,TSAFOU K,et al.COMPARTMENTS:unification and visualization of protein subcellular localization evidence [J].Database,2014,2014(2):bau012.
[22]EPPIG J T,BLAKE J A,BULT C J,et al.The Mouse Genome Database (MGD):comprehensive resource for genetics and genomics of the laboratory mouse [J].Nucleic Acids Research,2012,40:881-886.
[23]CHERRY J M,ADLER C,BALL C A,et al.SGD:Saccharomyces Genome Database [J].Nucleic Acids Research,1998,26(1):73-79.
[24]MAGRANE M.UniProtKnowledgebase:a hub of integratedprotein data [J].Database,Volume 2011,2011(3):bar009.
[25]HARRIS T W,ANTOSHECHKIN I,BIERI T,et al.WormBase:a comprehensive resource for nematode research [J].Nucleic Acids Research,2010,38(2):463-467.
[26]MCQUILTON P,PIERRE S E S,THURMOND J.FlyBase 101-the basics of navigating FlyBase [J].Nucleic Acids Research,2012,40:706-714.
[27]WANG J,LI M,WANG H,et al.Identification of Essential Proteins Based on Edge Clustering Coefficient [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2012,9(4):1070-1080.
[28]LI M,WANG J,CHEN X,et al.A local average connectivity-based method for identifying essential proteins from the network level [J].Computational Biology and Chemistry,2011,35(3):143-150.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[5] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[6] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰.
基于多源迁移学习的大坝裂缝检测
Dam Crack Detection Based on Multi-source Transfer Learning
计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124
[14] 楚玉春, 龚航, 王学芳, 刘培顺.
基于YOLOv4的目标检测知识蒸馏算法研究
Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4
计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204
[15] 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋.
改进Faster R-CNN的光学遥感飞机目标检测
Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN
计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!