计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 285-291.doi: 10.11896/jsjkx.201100117
牛富生, 郭延哺, 李维华, 刘文洋
NIU Fu-sheng, GUO Yan-bu, LI Wei-hua, LIU Wen-yang
摘要: 蛋白质可溶性在药物设计的研究中起着重要的作用,传统生物实验测试蛋白质可溶性费时费力,因此基于计算方法对可溶性进行预测成为一个重要的研究方向。针对传统可溶性预测模型不能充分表示蛋白质特征的问题,文中设计了一种基于多种蛋白质序列信息的神经网络模型PSPNet,并应用到蛋白质可溶性预测中。该模型首先使用氨基酸残基序列嵌入信息和氨基酸序列进化信息表示蛋白质序列;然后采用卷积神经网络提取氨基酸序列嵌入特征的局部关键信息;其次利用双向LSTM网络提取蛋白质序列远程依赖特征;最后利用注意力机制将该特征与氨基酸进化信息融合,并将包含了多种序列信息的融合特征用于蛋白质可溶性预测。实验结果表明,相比基准方法,该模型提高了蛋白质可溶性预测的精度,并具有良好的可扩展性。
中图分类号:
[1]ZAYAS J F.Solubility of Proteins[M].Springer Berlin Heidelberg,1997. [2]SMIALOWSKI P,MARTINGALIANO A J,MIKOLAJKA A,et al.Protein solubility[J].Bioinformatics,2007,23(19):2536-2542. [3]FROKJAER S,OTZEN D.Protein drug stability:a formulation challenge[J].Nature Reviews Drug Discovery,2005,4(4):298. [4]SUN X,LU Z H,XIE J M.Fundamentals of Bioinformatics[M].Tsinghua University Press,2005. [5]WILKINSON D L,HARRISON R G.Predicting the solubility of recombinant proteins in Escherichia coli[J].Bio/technology,1991,9(5):443-448. [6]SMIALOWSKI P,MARTIN-GALIANO A J,MIKOLAJKA A,et al.Protein solubility:sequence based prediction and experimental verification[J].Bioinformatics,2007,23(19):2536-2542. [7]AGOSTINI F,VENDRUSCOLO M,TARTAGLIA G G.Se-quence-Based Prediction of Protein Solubility[J].Journal of Molecular Biology,2012,421(2):237-241. [8]GUO Y B,LI W H,WANG B Y,et al.Protein secondary structure prediction based on convolutional long and short-term memory neural network[J].Pattern Recognition and Artificial Intelligence,2018,31(180):80-86. [9]XU G H.Structure and function of protein molecules[J].Bulletin of Biology,2010,45(3):24-25. [10]ROY S,MARTINEZ D,PLATERO H,et al.Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions[J].Plos One,2009,4(11):7813-7826. [11]CHEN M,JU J T,ZHOU G,et al.Multifaceted protein-protein interaction prediction based on Siamese residual RCNN[J].Bioinformatics,2019,35(14):305-314. [12]GUO Y,ZHOU D,NIE R,et al.DeepANF:A deep attentive neural framework with distributed representation for chromatin accessibility prediction[J].Neurocomputing,2019,37(9):305-318. [13]KAWASHIMA S.AAindex:amino acid index database[J].Nucleic Acids Research,2008,28(1):374. [14]SHEN H,CHOU K.PseAAC:A flexible web server for generating various kinds of protein pseudo amino acid composition[J].Analytical Biochemistry,2008,373(2):386-388. [15]WANG J,YANG B,REVOTE J,et al.POSSUM:a bioinforma-tics toolkit for generating numerical sequence feature descriptors based on PSSM profiles[J].Bioinformatics,2017,33(17):2756-2758. [16]JEONG J C,LIN X,CHEN X W.On Position-Specific Scoring Matrix for Protein Function Prediction[J].IEEE/ACMTran-sactions on Computational Biology & Bioinformatics,2011,8(2):308-315. [17]HUANG H,CHAROENKWAN P,KAO T,et al.Predictionand analysis of protein solubility using a novel scoring card method with dipeptide composition[J].BMC Bioinformatics,2012,13(17):1-14. [18]MAGNAN C N,RANDALL A,BALDI P.SOLpro:accurate sequence-based prediction of protein solubility[J].Bioinformatics,2009,25(17):2200-2207. [19]VAPNIK V N.The Nature of Statistical Learning Theory[M].Springer,1995. [20]SMIALOWSKI P,DOOSE G,TORKLER P,et al.PROSO II-a new method for protein solubility prediction[J].The FEBS journal,2012,279(12):2192-2200. [21]RAWI R,MALL R,KUNJI K,et al.PaRSnIP:sequence-based protein solubility prediction using gradient boosting machine[J].Bioinformatics,2018,34(7):1092-1098. [22]FRIEDMAN J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2001,29(5):1189-1232. [23]SHI L,WANG Y M,CAO Y J,et al.Car model recognition based on deep convolutional neural networks[J].Computer Science,2018,45(5):280-284. [24]ZENG Z,LI L,CHEN J.Bidirectional deep LSTM for sentiment classification[J].Computer Science,2018,4(5):213-217. [25]KHURANA S,RAWI R,KUNJI K,et al.DeepSol:a deep lear-ning framework for sequence-based protein solubility prediction[J].Bioinformatics,2018,34(15):2605-2613. [26]BOJANOWSKI P,GRAVE E,JOULIN A,et al.EnrichingWord Vectors with Subword Information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146. [27]ZHOU F Y,JIN L P,DONG J.A review of convolutional neural networks[J].Chinese Journal of Computers,2017(6):1229-1251. [28]JIANG A B,WANG W W.ReLU activation function optimization research[J].Sensors and Microsystems,2018,37(312):56-58. [29]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [30]ZHENG J H.Research on BP Neural Network Method forImage Data Compression[J].Computer Simulation,2001(2):33-36. [31]TIAN Q C,ZHANG R S.Overview of Biometric Recognition[J].Computer Application Research,2009(12):4401-4406. [32]WANG Y H,DING H W,LI B,et al.Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion[J].Computer Science,2021,48(3):206-213. [33]XIE T Y,ZHOU X G,HU J,et al.Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm[J].Computer Science,2020,47(1):59-65. [34]LI Y,LI Z X,TENG L,et al.Comment sentiment analysis and sentiment word detection based on attention mechanism[J].Computer Science,2020,47(1):186-192. [35]CHANG C C H,SONG J N,TEY B T,et al.Bioinformatics approaches for improved recombinant protein production in Escherichia coli:protein solubility prediction[J].Briefings in Bioinformatics,2014,15(6):953-962. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[4] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[5] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[6] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[7] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[8] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[9] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[10] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[11] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[12] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[13] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[14] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[15] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
|