计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 285-291.doi: 10.11896/jsjkx.201100195

• 人工智能 • 上一篇    下一篇

核小体定位预测的集成学习方法

陈伟, 李杭, 李维华   

  1. 云南大学信息学院 昆明650500
  • 收稿日期:2020-11-26 修回日期:2021-04-19 出版日期:2022-02-15 发布日期:2022-02-23
  • 通讯作者: 李维华(lywey@163.com)
  • 作者简介:2810925735@qq.com
  • 基金资助:
    云南省教育厅科学研究基金(2019J0006);云南省创新团队项目(2018HC019)

Ensemble Learning Method for Nucleosome Localization Prediction

CHEN Wei, LI Hang, LI Wei-hua   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-11-26 Revised:2021-04-19 Online:2022-02-15 Published:2022-02-23
  • About author:CHEN Wei,born in 1997,postgraduate.His main research interests include deep learning and bioinformatics.
    LI Wei-hua,corresponding author,born in 1977,Ph.D,associate professor.Her main research interests include data mining and bioinformatics.
  • Supported by:
    Scientific Research Fundation of the Education Department of Yunnan Province China(2019J0006) and Innovative Research Team of Yunnan Province,China(2018HC019).

摘要: 核小体定位指DNA双螺旋相对于组蛋白的位置,并在DNA的转录阶段起着重要的调节作用。依靠生物实验的手段测得核小体定位会消耗大量的时间和资源,因此基于计算方法利用DNA序列进行核小体定位预测成为了一个重要的研究方向。针对核小体定位预测中单一模型和单一编码在DNA序列特征表示和学习方面的不足,文中提出了一种端到端的集成深度学习模型FuseENup,利用3种编码方式从多个维度表示DNA数据,利用不同的模型从不同维度提取数据中隐含的关键特征,构造了一种全新的DNA序列表征模型。在4种数据集上进行20倍交叉验证,相比当前针对核小体定位预测问题综合性能最优的模型CORENup,FuseENup的准确度(Accuracy)和精度(Precision)在HS数据集上提高了3%和9%,在DM数据集上提高了2%和6%,在E数据集上提高了1%和4%,相比其他的机器学习和深度学习基准模型,FuseENup具有更好的性能。实验结果表明,FuseENup能提高核小体定位的预测准确度,说明了该方法的有效性和科学性。

关键词: DNA序列编码, 核小体定位, 集成学习方法, 交叉验证, 深度学习

Abstract: Nucleosome localization refers to the position of DNA double helix relative to histone,and plays an important regulatory role in DNA transcription.It takes a lot of time and resources to detect nucleosome localization by biological experiments.Therefore,it is an important research direction to predict nucleosome localization by using DNA sequences based on computationalmethods.Aiming at the shortcomings of single model and single code in DNA sequence feature representation and learning in nucleosome location prediction,this paper proposes an end-to-end ensemble deep learning model FuseENup,which uses three coding methods to represent DNA data from multiple dimensions.Different models extract the key features hidden in the data from different dimensions,and construct a new DNA sequence representation model.Performing 20-fold cross-validation on the four data sets,compared to the current model CORENup with the best comprehensive performance for the nucleosome localization prediction problem,the accuracy and precision of FuseENup are improved by 3% and 9% on the HS data set,increases 2% and 6% on the DM data set,1% and 4% on the E data set.Compared with other machine learning and deep learning benchmark models,FuseENup has better performance.Experiments show that FuseENup can improve the prediction accuracy of nucleosomes localization,which shows the effectiveness and scientificity of the method.

Key words: Cross-validation, Deep learning, DNA sequence coding, Ensemble learning method, Nucleosome localization

中图分类号: 

  • TP183
[1]RIDGWAY P,ALMOUZNI G.Chromatin assembly and organization[J].Journal of Cell Science,2001,114(15):2711-2712.
[2]MASKELL D P,RENAULT L,SERRAO E,et al.Structuralbasis for retroviral integration into nucleosomes[J].Nature,2015,523(7560):366-369.
[3]TABERLAY P C,STATHAM A L,KELLY T K,et al.Reconfiguration of nucleosome-depleted regions at distal regulatory elementsaccompanies DNA methylation of enhancers and insulators in cancer[J].Genome Research,2014,24(9):1421-1432.
[4]COLE H A,CUI F,OCAMPO J,et al.Novel nucleosomal particles containing core histones and linker DNA but no histone H1[J].Nucleic Acids Research,2016,44(2):573-581.
[5]SHAHBAZIAN M D,GRUNSTEIN M.Functions of Site-Specific Histone Acetylation and Deacetylation[J].Annual Review of Biochemistry,2007,76:75-100.
[6]SCHNITZLER G R.Control of nucleosome positions by DNAsequence and remodeling machines[J].Cell Biochemistry and Biophysics,2008,51(2/3):67-80.
[7]ZHENG D S,TRYNDA J,SUN Z F,et al.NUCLIZE for quantifying epigenome:generating histone modification data at single-nucleosome resolution using genuine nucleosome positions[J].Bmc Genomics,2019,20(1):541-544.
[8]BUITRAGO D,CODO L,ILLA R,et al.Nucleosome Dynamics:a new tool for the dynamic analysis of nucleosome positioning[J].Nucleic Acids Research,2019,47(18):9511-9523.
[9]SATCHWELL S C,DREW H R,TRAVERS A A.Sequence Periodicities in Chicken Nucleosome Core DNA[J].Journal of Molecular Biology,1986,191(4):659-675.
[10]DREW H R,TRAVERS A A.DNA Bending and Its Relation to Nucleosome Positioning[J].Journal of Molecular Biology,1985,186(4):773-790.
[11]LOWMAN H,BINA M.Correlation between Dinucleotide Pe-riodicities and Nucleosome Positioning on Mouse Satellite DNA[J].Biopolymers,1990,30(9/10):861-876.
[12]ZIVKOVIĆ V,STANKOVIĆ A,TATJANA C,et al.Anti-dsDNA,Anti-Nucleosome and Anti-C1q Antibodies as Disease Activity Markers in Patients with Systemic Lupus Erythematosus[J].Srpski Arhiv za Celokupno Lekarstvo,2014,142:431-436.
[13]YANG J F,XU Z Z,SUI M S,et al.Co-Positivity for Anti-dsDNA,-Nucleosome and-Histone Antibodies in Lupus Nephritis is Indicative of High Serum Levels and Severe Nephropathy[J].Plos One,2015,10(10):0140441.
[14]OLIVEIRA R C,OLIVEIRA I S,SANTIAGO M B,et al.High Avidity dsDNA Autoantibodies in Brazilian Women with Systemic Lupus Erythematosus:Correlation with Active Disease and Renal Dysfunction[J].Journal of Immunology Research,2015,2015:814748.
[15]CHENG J,CHEN H,MEN J L.Correlation between anti-nucleosome antibodies and systemic lupus erythematosus[J].Anhui Medicine,2019,23(1):83-86.
[16]ZHANG D F,MA Q Y,YIN T M.Third-generation sequencing technology and its application[J].Chinese Journal of Bioengineering,2013,33(5):125-131.
[17]XING Y Q,LIU G Q,ZHAO X J,et al.An analysis and prediction of nucleosome positioning based on information content[J].Chromosome Research,2013,21(1):63-74.
[18]STRUHL K,SEGAL E.Determinants of nucleosome positio-ning[J].Nature Structural & Molecular Biology,2013,20(3):267-273.
[19]LIELEG C,KRIETENSTEIN N,WALKER M,et al.Nucleo-some positioning in yeasts:methods,maps,and mechanisms[J].Chromosoma,2015,124(2):131-151.
[20]MEHER P K,SAHU T K,RAO A R.Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier[J].Gene,2016,592(2):316-324.
[21]CHEN W,FENG P M,DING H,et al.Using deformation energy to analyze nucleosome positioning in genomes[J].Genomics,2016,107(2/3):69-75.
[22]BOSCO G L,RIZZO R,FIANNACA A,et al.A Deep Learning Model for Epigenomic Studies[C]//International Conference on Signal-image Technology & Internet-based Systems.2017.
[23]GANGI M A D,GAGLIO S,BUA C L,et al.A Deep Learning Network for Exploiting Positional Information in Nucleosome Related Sequences[J].Bioinformatics and Biomedical Enginee-ring,2017,10209(4):524-533.
[24]DI G M,LO B G,RIZZO R.Deep learning architectures for prediction of nucleosome positioning from sequences data[J].BMC Bioinformatics,2018,19(Suppl 14):418.
[25]ZHANG J,PENG W,WANG L.LeNup:learning nucleosome positioning from DNA sequences with improved convolutional neural networks[J].Bioinformatics,2018,34(10):1705-1712.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!