计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 164-172.doi: 10.11896/jsjkx.211200186

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于迁移学习和多视图特征融合提高RNA碱基相互作用预测

王晓飞, 樊学强, 李章维   

  1. 浙江工业大学信息工程学院 杭州 310023
  • 收稿日期:2021-12-16 修回日期:2022-05-11 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 李章维(lzw@zjut.edu.cn)
  • 作者简介:(xiaowangpluss@163.com)
  • 基金资助:
    国家自然科学基金(61573317)

Improving RNA Base Interactions Prediction Based on Transfer Learning and Multi-view Feature Fusion

WANG Xiaofei, FAN Xueqiang, LI Zhangwei   

  1. College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2021-12-16 Revised:2022-05-11 Online:2023-03-15 Published:2023-03-15
  • About author:WANG Xiaofei,born in 1995,postgra-duate.His main research interests include computer vision and bioinforma-tics.
    LI Zhangwei,born in1967,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include intelligent information processing and so on.
  • Supported by:
    National Natural Science Foundation of China(61573317).

摘要: RNA碱基相互作用对维持其三维结构的稳定具有重要作用,准确地预测碱基相互作用可以辅助RNA三维结构的预测。然而,用于预测RNA碱基相互作用的数据量少,导致模型未能充分地学习到数据的特征分布,以及数据存在的特性(对称特性和类别不平衡),都影响了模型的性能。针对模型不充分学习和数据特性问题,在深度学习的基础上,提出了一种高性能的RNA碱基相互作用预测方法tpRNA。tpRNA首次在RNA碱基相互作用预测任务中引入迁移学习以改善因数据量少而产生的模型不充分学习问题,并提出高效的损失函数和特征提取模块,充分发挥迁移学习和卷积神经网络在特征学习方面的优势,以缓解数据特性问题。结果表明,引入迁移学习能减小数据量少导致的模型偏差,提出的损失函数能优化模型的训练,特征提取模块能提取到更有效的特征。与最先进的方法相比,tpRNA在低质量输入特征的情形下具有显著的优势。

关键词: RNA碱基相互作用, 迁移学习, 数据特性, 损失函数, 卷积神经网络

Abstract: RNA base interactions play an important role in maintaining the stability of its three-dimensional structure,and accurate prediction of base interactions can help predict the three-dimensional structure of RNA.However,due to the small amount of data,the model could not effectively learn the feature distribution of the training data,and existing data characteristics(symmetry and class imbalance) affect the performance of the RNA base interactions prediction model.Aiming at the problems of insufficient model learning and data characteristics,a high-performance RNA base interactions prediction method called tpRNA is proposed based on deep learning.tpRNA introduces transfer learning in RNA base interactions prediction task to weak the influence of insufficient learning in the training process due to the small amount of data,and an efficient loss function and feature extraction module is proposed to give full play to the advantages of transfer learning and convolutional neural network in feature learning to alleviate the problem of data characteristics.Results show that transfer learning can reduce the model deviation caused by less data,the proposed loss function can optimize the model training,and the feature extraction module can extract more effective features.Compared with the state-of-the-art method,tpRNA also has significant advantages in the case of low-quality input features.

Key words: RNA base interactions, Transfer learning, Data characteristic, Loss function, Convolutional neural networks

中图分类号: 

  • TP301
[1]ZHANG T,SINGH J,LITFIN T,et al.RNAcmap:A Fully Automatic Pipeline for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis [J].Bioinformatics,2021,37(20):3494-3500.
[2]DE L E,LUTZ B,RATZ S,et al.Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction [J].Nucleic Acids Res,2015,43(21):10444-10455.
[3]SUN S,WANG W,PENG Z,et al.RNA inter-nucleotide 3Dcloseness prediction by deep residual neural networks [J].Bioinformatics,2021,37(8):1093-1098.
[4]LIU W Y,GUO Y B,LI W H.Identifying Essential Proteins by Hybrid Deep Learning Model [J].Computer Science,2021,48(8):240-245.
[5]WU Q,PENG Z,ANISHCHENKO I,et al.Protein contact prediction using metagenome sequence data and residual neural networks [J].Bioinformatics,2020,36(1):41-48.
[6]XIE L X,LI F,XIE J P,et al.Predicting Drug Molecular Properties Based on Ensembling Neural Networks Models[J].Compu-ter Science,2021,48(9):251-256.
[7]PAN S,YANG Q.A Survey on Transfer Learning[J].IEEETransactions on Knowledge and Data Engineering,2010,22(10):1345-1359.
[8]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778.
[9]MORCOS F,PAGNANI A,LUNT B,et al.Direct-couplinganalysis of residue coevolution captures native contacts across many protein families [J].Proceedings of the National Academy of Sciences of the United States of America,2011,108(49):E1293-E1301.
[10]MARKS D,COLWELL L,SHERIDAN R,et al.Protein 3Dstructure computed from evolutionary sequence variation [J].PLoS One,2011,6(12):e28766.
[11]EKBERG M,LOVKVIST C,LAN Y,et al.Improved contactprediction in proteins:using pseudolikelihoods to infer Potts models [J].Physical Review E Statistical Nonlinear & Soft Matter Physics,2013,87(1):012707.
[12]JIAN Y,WANG X,QIU J,et al.DIRECT:RNA contact predictions by integrating structural patterns [J].BMC Bioinforma-tics,2019,20(1):497.
[13]LI Y,HU J,ZHANG C,et al.ResPRE:high-accuracy proteincontact prediction by coupling precision matrix with deep resi-dual neural networks [J].Bioinformatics,2019,35(22):4647-4655.
[14]YU F,KOLTUN V,FUNKHOUSER T.Dilated Residual Networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE Press,2017:636-644.
[15]YANI I,DUNCAN R,ROBERTO C,et al.Deep Roots:Improving CNN Efficiency with Hierarchical Filter Groups[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE Press,2017:5977-5986.
[16]HU J,SHEN L,SUN G.Squeeze-and-Excitation Networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:7132-7141.
[17]OLAF R,PHILIPP F,THOMAS B.U-Net:Convolutional Networks for Biomedical Image Segmentation [J].Medical Image Computing and Computer-Assisted Intervention,2015,9351:234-241.
[18]SANDLER M,HOWARD A,ZHU M L,et al.MobileNetV2:Inverted Residuals and Linear Bottlenecks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:4510-4520.
[19]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceedings of IEEE International Conference on Computer Vision.Venice:IEEE Press,2017:2999-3007.
[20]BERMAN H M,WESTBROOK J,FENG Z,et al.The ProteinData Bank [J].Nucleic Acids Res,2000,28(1):235-242.
[21]REMMERT M,BIEGERTA,HAUSER A,et al.HHblits:lightning-fast iterative protein sequence searching by HMM-HMM alignment [J].Nature Methods,2011,9(2):173-175.
[22]The UniProt Consortium.UniProt:the universal protein know-ledge base [J].Nucleic Acids Res,2017,45(D1):D158-D169.
[23]JONES D T,SINGH T,KOSCIOLEK T,et al.MetaPSICOV:combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins [J].Bioinformatics,2015,31(7):999-1006.
[24]JONES D T,KANDATHIL S M.High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features [J].Bioinformatics,2018,34(19):3308-3315.
[25]ZHANG C X,ZHENG W,MORTUZA S M,et al.DeepMSA:constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins [J].Bioinformatics,2020,36(7):2105-2112.
[26]CHEN M C,LI Y,ZHU Y H,et al.SSCpred:Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolu-tional Network [J].Journal of Chemical Information and Mode-ling,2020,60(6):3295-3303.
[1] 李帅, 徐彬, 韩祎珂, 廖同鑫.
SS-GCN:情感增强和句法增强的方面级情感分析模型
SS-GCN:Aspect-based Sentiment Analysis Model with Affective Enhancement and Syntactic Enhancement
计算机科学, 2023, 50(3): 3-11. https://doi.org/10.11896/jsjkx.220700238
[2] 梅鹏程, 杨吉斌, 张强, 黄翔.
一种基于三维卷积的声学事件联合估计方法
Sound Event Joint Estimation Method Based on Three-dimension Convolution
计算机科学, 2023, 50(3): 191-198. https://doi.org/10.11896/jsjkx.220500259
[3] 胡中源, 薛羽, 查加杰.
演化循环神经网络研究综述
Survey on Evolutionary Recurrent Neural Networks
计算机科学, 2023, 50(3): 254-265. https://doi.org/10.11896/jsjkx.220600007
[4] 李俊林, 欧阳智, 杜逆索.
基于改进区域候选网络的场景文本检测
Scene Text Detection with Improved Region Proposal Network
计算机科学, 2023, 50(2): 201-208. https://doi.org/10.11896/jsjkx.211000191
[5] 曹金娟, 钱忠, 李培峰.
基于联合模型的端到端事件可信度识别
End-to-End Event Factuality Identification with Joint Model
计算机科学, 2023, 50(2): 292-299. https://doi.org/10.11896/jsjkx.211200108
[6] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[7] 方义秋, 张震坤, 葛君伟.
基于自注意力机制和迁移学习的跨领域推荐算法
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[8] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[9] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[10] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[11] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[12] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[13] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[14] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[15] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!