计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 240-245.doi: 10.11896/jsjkx.200700076
刘文洋, 郭延哺, 李维华
LIU Wen-yang, GUO Yan-bu, LI Wei-hua
摘要: 关键蛋白质是有机体生存不可或缺的蛋白质。关键蛋白质的识别有助于理解细胞生命的最低要求、发现致病基因和药物靶点,对疾病的诊治和药物设计等有重要意义。现有方法表明整合蛋白质互作网络和序列的相关特征可以提高对关键蛋白质的识别精度和鲁棒性。文中整合了基因表达谱、蛋白质互作网络和亚细胞位置信息,设计了一种混合神经网络模型IEPHDL。该模型首次使用双向门控循环单元对基因表达谱进行特征学习,使用由多个全连接层组成的深度神经网络对3种数据特征进行深度再学习,充分发挥双向门控循环单元网络、全连接网络、Node2vec在特征学习和表示方面的优势,实现对关键蛋白质的有效识别。实验表明,IEPHDL对关键蛋白质识别的准确率为88.7%,精确率为86.2%,AUC为85.2%,其准确率比当前最优的中心性方法、机器学习方法、深度学习方法依次高出13%,8.9%,3.8%,其他指标也均高于这三者。最后,通过实验分析,证实了双向门控循环单元网络依赖自身强大的特征学习能力,在关键蛋白质识别中起着关键作用。
中图分类号:
[1]LI M,ZHENG R,ZHANG H,et al.Effective identification of essential proteins based on priori knowledge,network topology and gene expressions [J].Methods,2014,67(3):325-333. [2]ZHANG X,ACENCIO M L,LEMKE N.Predicting essentialgenes and proteins based on machine learning and network topological features:a comprehensive review[J].Frontiers in Physiology,2016,7:75. [3]HONG H Y,LIU W.Efficient prediction algorithm of essential proteins based on PPI network [J].Computer Science,2016,43(S2):16-20,25. [4]JEONG H,MASON S P,BARABASI A,et al.Lethality and centrality in protein networks [J].Nature,2001,411(6833):41-42. [5]WUCHTY S,STADLER P F.Centers of complex networks[J].Journal of Theoretical Biology,2003,223(1):45-53. [6]JOY M P,BROCK A,INGBER D E,et al.High-Betweenness Proteins in the Yeast Protein Interaction Network [J].BioMed Research International,2005,2005(2):96-103. [7]HU J,ZHU H W,MAO Y M,Identification of essential proteins based on time-weighted PPI network [J].Computer Engineering and Applications,2019,942(23):155-167. [8]TANG X,WANG J,ZHONG J,et al.Predicting essential proteins based on weighted degree centrality[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(2):407-418. [9]LI M,ZHANG H,WANG J,et al.A new essential protein discovery method based on the integration of protein-proteinintera-ction and gene expression data [J].BMC Systems Biology,2012,6(1):15-30. [10]ZENG M,LI M,FEI Z,et al.A deep learning framework foridentifying essential proteins by integrating multiple types of biological information [J].IEEE/ACM Trans Comput Biol Bioinform,2019,1(18):296-305. [11]HONG H Y,LIU W.Research on essential Protein Recognition Method Based on Improved PSO Algorithm[J].Computer Scien-ce,2017,44(10):38-44. [12]FAKOOR R,LADHAK F,NAZI A,et al.Using deep learning to enhance cancer diagnosis and classification [C]//Proceedings of the International Conference on Machine Learning.New York,USA:ACM,2013:1-6. [13]ZENG M,LI M,WU F X,et al.DeepEP:a deep learning framework for identifying essential proteins [J].BMC Bioinformatics,2019,20(S16):506. [14]CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [J].arXiv:1406.1078.2014. [15]GUO Y,ZHOU D,NIE R,et al.DeepANF:A deep attentive neural framework with distributed representation for chromatin accessibility prediction [J].Neurocomputing,2020,379:305-318. [16]LI G,LI M,WANG J,et al.Predicting essential proteins based on subcellular localization,orthology and PPI networks[J].BMC Bioinformatics,2016,17(8):279. [17]KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNet Classification with Deep Convolutional Neural Networks[C]//Proceedings of the International Conference on Neural Information Processing Systems.2012:1106-1114. [18]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization [J].arXiv:1412.6980,2014. [19]STARK C,BREITKREUTZ B,REGULY T,et al.BioGRID:a general repository for interaction datasets [J].Nucleic Acids Research,2006,34(90001):535-539. [20]TU B P,KUDLICKI A,ROWICKA M,et al.Logic of the Yeast Metabolic Cycle:Temporal Compartmentalization of Cellular Processes [J].Science,2005,310(5751):1152-1158. [21]BINDER J X,PLETSCHER-FRANKILD S,TSAFOU K,et al.COMPARTMENTS:unification and visualization of protein subcellular localization evidence [J].Database,2014,2014(2):bau012. [22]EPPIG J T,BLAKE J A,BULT C J,et al.The Mouse Genome Database (MGD):comprehensive resource for genetics and genomics of the laboratory mouse [J].Nucleic Acids Research,2012,40:881-886. [23]CHERRY J M,ADLER C,BALL C A,et al.SGD:Saccharomyces Genome Database [J].Nucleic Acids Research,1998,26(1):73-79. [24]MAGRANE M.UniProtKnowledgebase:a hub of integratedprotein data [J].Database,Volume 2011,2011(3):bar009. [25]HARRIS T W,ANTOSHECHKIN I,BIERI T,et al.WormBase:a comprehensive resource for nematode research [J].Nucleic Acids Research,2010,38(2):463-467. [26]MCQUILTON P,PIERRE S E S,THURMOND J.FlyBase 101-the basics of navigating FlyBase [J].Nucleic Acids Research,2012,40:706-714. [27]WANG J,LI M,WANG H,et al.Identification of Essential Proteins Based on Edge Clustering Coefficient [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2012,9(4):1070-1080. [28]LI M,WANG J,CHEN X,et al.A local average connectivity-based method for identifying essential proteins from the network level [J].Computational Biology and Chemistry,2011,35(3):143-150. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[5] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[6] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[8] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124 |
[14] | 楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204 |
[15] | 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121 |
|