Computer Science ›› 2022, Vol. 49 ›› Issue (1): 285-291.doi: 10.11896/jsjkx.201100117

• Artificial Intelligence • Previous Articles     Next Articles

Protein Solubility Prediction Based on Sequence Feature Fusion

NIU Fu-sheng, GUO Yan-bu, LI Wei-hua, LIU Wen-yang   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-11-16 Revised:2021-06-29 Online:2022-01-15 Published:2022-01-18
  • About author:NIU Fu-sheng,born in 1993,postgra-duate.His main research interests include deep learning and bioinformatics.
    LI Wei-hua,born in 1977,Ph.D,asso-ciate professor.Her main research inte-rests include data mining and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(32060151),Scientific Research Fundation of the Education Department of Yunnan Province,China(2019J0006),Innovative Research Team of Yunnan Province,China(2018HC019) and Yunnan University of Postgraduate Research and Innovation Foundation Project,China(2020Z73).

Abstract: Protein solubility plays an important role in the research of drug design.Traditional biological experiments of detecting protein solubility are time-consuming and laborious.Identifying protein solubility based on computational methods has become an important research hot spot in bioinformatics.Aiming at the problem of insufficient representation of protein features by traditio-nal solubility prediction models,this paper designs a neural network model PSPNet based on protein sequence information and applies it to protein solubility prediction.PSPNet uses amino acid residue sequence embedding information and amino acid sequence evolution information to represent protein sequences.Then convolutional neural network is used to extract the local key information of amino acid sequence embedding features.Secondly,bidirectional LSTM network is used to extract the features of remote dependencies of protein sequences.Finally,the attention mechanism is used to fuse this feature and amino acid evolution information,and the fusion feature containing multiple sequence information is used in protein solubility prediction.The experimental results show that PASNet obtains the remarkable performance of protein solubility prediction compared with the benchmark me-thods and also has a good scalability.

Key words: Attention mechanism, Deep learning, Multi-feature fusion, Protein solubility

CLC Number: 

  • TP391
[1]ZAYAS J F.Solubility of Proteins[M].Springer Berlin Heidelberg,1997.
[2]SMIALOWSKI P,MARTINGALIANO A J,MIKOLAJKA A,et al.Protein solubility[J].Bioinformatics,2007,23(19):2536-2542.
[3]FROKJAER S,OTZEN D.Protein drug stability:a formulation challenge[J].Nature Reviews Drug Discovery,2005,4(4):298.
[4]SUN X,LU Z H,XIE J M.Fundamentals of Bioinformatics[M].Tsinghua University Press,2005.
[5]WILKINSON D L,HARRISON R G.Predicting the solubility of recombinant proteins in Escherichia coli[J].Bio/technology,1991,9(5):443-448.
[6]SMIALOWSKI P,MARTIN-GALIANO A J,MIKOLAJKA A,et al.Protein solubility:sequence based prediction and experimental verification[J].Bioinformatics,2007,23(19):2536-2542.
[7]AGOSTINI F,VENDRUSCOLO M,TARTAGLIA G G.Se-quence-Based Prediction of Protein Solubility[J].Journal of Molecular Biology,2012,421(2):237-241.
[8]GUO Y B,LI W H,WANG B Y,et al.Protein secondary structure prediction based on convolutional long and short-term memory neural network[J].Pattern Recognition and Artificial Intelligence,2018,31(180):80-86.
[9]XU G H.Structure and function of protein molecules[J].Bulletin of Biology,2010,45(3):24-25.
[10]ROY S,MARTINEZ D,PLATERO H,et al.Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions[J].Plos One,2009,4(11):7813-7826.
[11]CHEN M,JU J T,ZHOU G,et al.Multifaceted protein-protein interaction prediction based on Siamese residual RCNN[J].Bioinformatics,2019,35(14):305-314.
[12]GUO Y,ZHOU D,NIE R,et al.DeepANF:A deep attentive neural framework with distributed representation for chromatin accessibility prediction[J].Neurocomputing,2019,37(9):305-318.
[13]KAWASHIMA S.AAindex:amino acid index database[J].Nucleic Acids Research,2008,28(1):374.
[14]SHEN H,CHOU K.PseAAC:A flexible web server for generating various kinds of protein pseudo amino acid composition[J].Analytical Biochemistry,2008,373(2):386-388.
[15]WANG J,YANG B,REVOTE J,et al.POSSUM:a bioinforma-tics toolkit for generating numerical sequence feature descriptors based on PSSM profiles[J].Bioinformatics,2017,33(17):2756-2758.
[16]JEONG J C,LIN X,CHEN X W.On Position-Specific Scoring Matrix for Protein Function Prediction[J].IEEE/ACMTran-sactions on Computational Biology & Bioinformatics,2011,8(2):308-315.
[17]HUANG H,CHAROENKWAN P,KAO T,et al.Predictionand analysis of protein solubility using a novel scoring card method with dipeptide composition[J].BMC Bioinformatics,2012,13(17):1-14.
[18]MAGNAN C N,RANDALL A,BALDI P.SOLpro:accurate sequence-based prediction of protein solubility[J].Bioinformatics,2009,25(17):2200-2207.
[19]VAPNIK V N.The Nature of Statistical Learning Theory[M].Springer,1995.
[20]SMIALOWSKI P,DOOSE G,TORKLER P,et al.PROSO II-a new method for protein solubility prediction[J].The FEBS journal,2012,279(12):2192-2200.
[21]RAWI R,MALL R,KUNJI K,et al.PaRSnIP:sequence-based protein solubility prediction using gradient boosting machine[J].Bioinformatics,2018,34(7):1092-1098.
[22]FRIEDMAN J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2001,29(5):1189-1232.
[23]SHI L,WANG Y M,CAO Y J,et al.Car model recognition based on deep convolutional neural networks[J].Computer Science,2018,45(5):280-284.
[24]ZENG Z,LI L,CHEN J.Bidirectional deep LSTM for sentiment classification[J].Computer Science,2018,4(5):213-217.
[25]KHURANA S,RAWI R,KUNJI K,et al.DeepSol:a deep lear-ning framework for sequence-based protein solubility prediction[J].Bioinformatics,2018,34(15):2605-2613.
[26]BOJANOWSKI P,GRAVE E,JOULIN A,et al.EnrichingWord Vectors with Subword Information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146.
[27]ZHOU F Y,JIN L P,DONG J.A review of convolutional neural networks[J].Chinese Journal of Computers,2017(6):1229-1251.
[28]JIANG A B,WANG W W.ReLU activation function optimization research[J].Sensors and Microsystems,2018,37(312):56-58.
[29]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[30]ZHENG J H.Research on BP Neural Network Method forImage Data Compression[J].Computer Simulation,2001(2):33-36.
[31]TIAN Q C,ZHANG R S.Overview of Biometric Recognition[J].Computer Application Research,2009(12):4401-4406.
[32]WANG Y H,DING H W,LI B,et al.Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion[J].Computer Science,2021,48(3):206-213.
[33]XIE T Y,ZHOU X G,HU J,et al.Contact Map-based Residue-pair Distances Restrained Protein Structure Prediction Algorithm[J].Computer Science,2020,47(1):59-65.
[34]LI Y,LI Z X,TENG L,et al.Comment sentiment analysis and sentiment word detection based on attention mechanism[J].Computer Science,2020,47(1):186-192.
[35]CHANG C C H,SONG J N,TEY B T,et al.Bioinformatics approaches for improved recombinant protein production in Escherichia coli:protein solubility prediction[J].Briefings in Bioinformatics,2014,15(6):953-962.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[4] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[5] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[6] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[7] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[8] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[9] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[12] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[13] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[14] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[15] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!