计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 285-291.doi: 10.11896/jsjkx.201100117

• 人工智能 • 上一篇    下一篇

基于序列特征融合的蛋白质可溶性预测

牛富生, 郭延哺, 李维华, 刘文洋   

  1. 云南大学信息学院 昆明650500
  • 收稿日期:2020-11-16 修回日期:2021-06-29 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 李维华(lywey@163.com)
  • 作者简介:17839164754@163.com
  • 基金资助:
    国家自然科学基金项目(32060151);云南省教育厅科学研究基金(2019J0006);云南省创新团队项目(2018HC019);云南大学研究生科研创新基金项目(2020Z73)

Protein Solubility Prediction Based on Sequence Feature Fusion

NIU Fu-sheng, GUO Yan-bu, LI Wei-hua, LIU Wen-yang   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-11-16 Revised:2021-06-29 Online:2022-01-15 Published:2022-01-18
  • About author:NIU Fu-sheng,born in 1993,postgra-duate.His main research interests include deep learning and bioinformatics.
    LI Wei-hua,born in 1977,Ph.D,asso-ciate professor.Her main research inte-rests include data mining and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(32060151),Scientific Research Fundation of the Education Department of Yunnan Province,China(2019J0006),Innovative Research Team of Yunnan Province,China(2018HC019) and Yunnan University of Postgraduate Research and Innovation Foundation Project,China(2020Z73).

摘要: 蛋白质可溶性在药物设计的研究中起着重要的作用,传统生物实验测试蛋白质可溶性费时费力,因此基于计算方法对可溶性进行预测成为一个重要的研究方向。针对传统可溶性预测模型不能充分表示蛋白质特征的问题,文中设计了一种基于多种蛋白质序列信息的神经网络模型PSPNet,并应用到蛋白质可溶性预测中。该模型首先使用氨基酸残基序列嵌入信息和氨基酸序列进化信息表示蛋白质序列;然后采用卷积神经网络提取氨基酸序列嵌入特征的局部关键信息;其次利用双向LSTM网络提取蛋白质序列远程依赖特征;最后利用注意力机制将该特征与氨基酸进化信息融合,并将包含了多种序列信息的融合特征用于蛋白质可溶性预测。实验结果表明,相比基准方法,该模型提高了蛋白质可溶性预测的精度,并具有良好的可扩展性。

关键词: 蛋白质可溶性, 多特征融合, 深度学习, 注意力机制

Abstract: Protein solubility plays an important role in the research of drug design.Traditional biological experiments of detecting protein solubility are time-consuming and laborious.Identifying protein solubility based on computational methods has become an important research hot spot in bioinformatics.Aiming at the problem of insufficient representation of protein features by traditio-nal solubility prediction models,this paper designs a neural network model PSPNet based on protein sequence information and applies it to protein solubility prediction.PSPNet uses amino acid residue sequence embedding information and amino acid sequence evolution information to represent protein sequences.Then convolutional neural network is used to extract the local key information of amino acid sequence embedding features.Secondly,bidirectional LSTM network is used to extract the features of remote dependencies of protein sequences.Finally,the attention mechanism is used to fuse this feature and amino acid evolution information,and the fusion feature containing multiple sequence information is used in protein solubility prediction.The experimental results show that PASNet obtains the remarkable performance of protein solubility prediction compared with the benchmark me-thods and also has a good scalability.

Key words: Attention mechanism, Deep learning, Multi-feature fusion, Protein solubility

中图分类号: 

  • TP391
[1]ZAYAS J F.Solubility of Proteins[M].Springer Berlin Heidelberg,1997.
[2]SMIALOWSKI P,MARTINGALIANO A J,MIKOLAJKA A,et al.Protein solubility[J].Bioinformatics,2007,23(19):2536-2542.
[3]FROKJAER S,OTZEN D.Protein drug stability:a formulation challenge[J].Nature Reviews Drug Discovery,2005,4(4):298.
[4]SUN X,LU Z H,XIE J M.Fundamentals of Bioinformatics[M].Tsinghua University Press,2005.
[5]WILKINSON D L,HARRISON R G.Predicting the solubility of recombinant proteins in Escherichia coli[J].Bio/technology,1991,9(5):443-448.
[6]SMIALOWSKI P,MARTIN-GALIANO A J,MIKOLAJKA A,et al.Protein solubility:sequence based prediction and experimental verification[J].Bioinformatics,2007,23(19):2536-2542.
[7]AGOSTINI F,VENDRUSCOLO M,TARTAGLIA G G.Se-quence-Based Prediction of Protein Solubility[J].Journal of Molecular Biology,2012,421(2):237-241.
[8]GUO Y B,LI W H,WANG B Y,et al.Protein secondary structure prediction based on convolutional long and short-term memory neural network[J].Pattern Recognition and Artificial Intelligence,2018,31(180):80-86.
[9]XU G H.Structure and function of protein molecules[J].Bulletin of Biology,2010,45(3):24-25.
[10]ROY S,MARTINEZ D,PLATERO H,et al.Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions[J].Plos One,2009,4(11):7813-7826.
[11]CHEN M,JU J T,ZHOU G,et al.Multifaceted protein-protein interaction prediction based on Siamese residual RCNN[J].Bioinformatics,2019,35(14):305-314.
[12]GUO Y,ZHOU D,NIE R,et al.DeepANF:A deep attentive neural framework with distributed representation for chromatin accessibility prediction[J].Neurocomputing,2019,37(9):305-318.
[13]KAWASHIMA S.AAindex:amino acid index database[J].Nucleic Acids Research,2008,28(1):374.
[14]SHEN H,CHOU K.PseAAC:A flexible web server for generating various kinds of protein pseudo amino acid composition[J].Analytical Biochemistry,2008,373(2):386-388.
[15]WANG J,YANG B,REVOTE J,et al.POSSUM:a bioinforma-tics toolkit for generating numerical sequence feature descriptors based on PSSM profiles[J].Bioinformatics,2017,33(17):2756-2758.
[16]JEONG J C,LIN X,CHEN X W.On Position-Specific Scoring Matrix for Protein Function Prediction[J].IEEE/ACMTran-sactions on Computational Biology & Bioinformatics,2011,8(2):308-315.
[17]HUANG H,CHAROENKWAN P,KAO T,et al.Predictionand analysis of protein solubility using a novel scoring card method with dipeptide composition[J].BMC Bioinformatics,2012,13(17):1-14.
[18]MAGNAN C N,RANDALL A,BALDI P.SOLpro:accurate sequence-based prediction of protein solubility[J].Bioinformatics,2009,25(17):2200-2207.
[19]VAPNIK V N.The Nature of Statistical Learning Theory[M].Springer,1995.
[20]SMIALOWSKI P,DOOSE G,TORKLER P,et al.PROSO II-a new method for protein solubility prediction[J].The FEBS journal,2012,279(12):2192-2200.
[21]RAWI R,MALL R,KUNJI K,et al.PaRSnIP:sequence-based protein solubility prediction using gradient boosting machine[J].Bioinformatics,2018,34(7):1092-1098.
[22]FRIEDMAN J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2001,29(5):1189-1232.
[23]SHI L,WANG Y M,CAO Y J,et al.Car model recognition based on deep convolutional neural networks[J].Computer Science,2018,45(5):280-284.
[24]ZENG Z,LI L,CHEN J.Bidirectional deep LSTM for sentiment classification[J].Computer Science,2018,4(5):213-217.
[25]KHURANA S,RAWI R,KUNJI K,et al.DeepSol:a deep lear-ning framework for sequence-based protein solubility prediction[J].Bioinformatics,2018,34(15):2605-2613.
[26]BOJANOWSKI P,GRAVE E,JOULIN A,et al.EnrichingWord Vectors with Subword Information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146.
[27]ZHOU F Y,JIN L P,DONG J.A review of convolutional neural networks[J].Chinese Journal of Computers,2017(6):1229-1251.
[28]JIANG A B,WANG W W.ReLU activation function optimization research[J].Sensors and Microsystems,2018,37(312):56-58.
[29]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[30]ZHENG J H.Research on BP Neural Network Method forImage Data Compression[J].Computer Simulation,2001(2):33-36.
[31]TIAN Q C,ZHANG R S.Overview of Biometric Recognition[J].Computer Application Research,2009(12):4401-4406.
[32]WANG Y H,DING H W,LI B,et al.Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion[J].Computer Science,2021,48(3):206-213.
[33]XIE T Y,ZHOU X G,HU J,et al.Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm[J].Computer Science,2020,47(1):59-65.
[34]LI Y,LI Z X,TENG L,et al.Comment sentiment analysis and sentiment word detection based on attention mechanism[J].Computer Science,2020,47(1):186-192.
[35]CHANG C C H,SONG J N,TEY B T,et al.Bioinformatics approaches for improved recombinant protein production in Escherichia coli:protein solubility prediction[J].Briefings in Bioinformatics,2014,15(6):953-962.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[4] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[7] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[8] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[12] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[13] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[14] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[15] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!