计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 64-71.doi: 10.11896/jsjkx.191100027

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于多特征融合的增强子-启动子相互作用预测综述

胡宇佳, 甘伟, 朱敏   

  1. 四川大学计算机学院 成都610065
  • 收稿日期:2019-11-05 出版日期:2020-05-15 发布日期:2020-05-19
  • 通讯作者: 朱敏(zhumin@scu.edu.cn)
  • 作者简介:543574831@qq.com
  • 基金资助:
    “十三五”国家科技重大专项(2018ZX10201002-002-004)

Enhancer-Promoter Interaction Prediction Based on Multi-feature Fusion

HU Yu-jia, GAN Wei, ZHU Min   

  1. College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2019-11-05 Online:2020-05-15 Published:2020-05-19
  • About author:HU Yu-jia,born in 1995,postgraduate,is a member of China Computer Federation.Her main research interests include data mining and bioinformatics.
    ZHU Min,born in 1971,Ph.D,professor,is a senior member of China Compu-ter Federation.Her main research inte-rests include bioinformatics,information visualization and visual analytics
  • Supported by:
    This work was supported by the National Major Scientific and Technologic Project During the Thirtieth Five-Year Plan (2018ZX10201002-002-004)

摘要: 研究增强子-启动子相互作用机理有助于人们理解基因调控关系,进而揭示与疾病相关的基因,为疾病诊疗提供新思路和新方法。传统的生物检测方法的实验成本高、耗时长,且受分辨率的限制,难以精确鉴定单个增强子-启动子的相互作用。通过计算方法来解决生物问题已成为近年来的研究热点,此类方法可以通过复杂的网络结构主动学习序列特征和空间结构,进而准确预测增强子-启动子的作用。首先介绍了传统生物实验检测方法的研究现状;然后从序列特征的角度出发,围绕多特征融合的基本思想,对统计学和深度学习方法在增强子-启动子相互作用预测上的应用进行归纳整理;最后对该领域的研究热点和挑战进行总结分析。

关键词: 多特征融合, 疾病诊疗, 序列特征, 应用综述, 增强子-启动子相互作用

Abstract: The study of the mechanism of Enhancer-Promoter Interaction is helpful to understand gene regulations,thus revealing specific genes that are relevant to diseases as well as providing new clinical methods and ideas for disease diagnosis and treatment.Compared to traditional biological analysis methods which are always more expensive,time-consuming and more difficult to precisely identify specific interactions due to limited resolution,computational methods to solve biological problems have become a hot research topic in recent years.This method can actively learn sequence features and spatial structures through complex network structures,so as to precisely and accurately predict the interactions of enhancers and promoters.This paper firstly introduces the research status of traditional biological detection methods.Then,from the perspective of sequence features,the application of statistics and deep learning method in the prediction of enhancer - promoter interaction is summarized and sorted out based on the basic idea of multi-feature fusion.Finally,the research hotspots and challenges in this field are summarized and analyzed.

Key words: Application overview, Disease diagnosis and treatment, Enhancer-promoter interaction, Multi-feature fusion, Sequence feature

中图分类号: 

  • TP391
[1]ESTELLER M.Non-coding RNAs in human disease[J].Nature Reviews Genetics,2011,12(12):861-874.
[2]YANG F.Research on piRNA and promoter based on sequence information[D].Harbin:Harbin Institute of Technology,2018.
[3]KARNUTA J M,SCACHERI P C.Enhancers:bridging the gap between gene control and human disease[J].Human Molecular Genetics,2018,27(R2):R219-R227.
[4]BLACKWOOD E M,KADONAGA J T.Going the Distance:A Current View of Enhancer Action[J].Science,1998,281(5373):60-63.
[5]PENNACCHIO L A,BICKMORE W,DEAN A,et al.Enhan-cers:five essential questions[J].Nature Reviews Genetics,2013,14(4):288.
[6]JIANG R.Walking on multiple disease-gene networks to prioritize candidate genes[J].Journal of Molecular Cell Biology,2015,7(3):214-230.
[7]DAVISON L J,WALLACE C,COOPER J D,et al.Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene[J].Human Molecular Genetics,2012,21(2):322-333.
[8]SMEMO S,TENA J J,KIM K H,et al.Obesity-associated va-riants within FTO form long-range functional connections with IRX3[J].Nature,2014,507(7492):371-375.
[9]MASTON G A,EVANS S K,GREEN M R.Transcriptionalregulatory elements in the human genome[J].Annual Review of Genomics & Human Genetics,2006,7(1):29.
[10]HE B,CHEN C,TENG L,et al.Global view of enhancer-promoter interactome in human cells[J].Proceedings of theNatio-nal Academy of Sciences of the United States of America,2014,111(21).
[11]YU Z,ZHAO Y X,YI Z L,et al.Research on folding diversity in statistical learning methods for RNA secondary structure prediction[J].International Journal of Biological Sciences,2018,14(8):872-882.
[12] DAVID R.The Elements of Statistical Learning:Data Mining,Inference,and Prediction[J].Journal of the American Statistical Association,2004,99(466):567-567.
[13]ROBERT C.Machine Learning,a Probabilistic Perspective[M]//Machine learning:a probabilistic perspective.2012.
[14]MICHALSKI R S,CARBONELL J G,MITCHELL T M.Machine Learning[M]//Symbolic Computation.1994:3-61.
[15]ANGERMUELLER C,PARNAMAA T,PARTS L,et al.Deep learning for computational biology[J].Molecular Systems Biology,2016,12(7):878.
[16]PRICE C M.Fluorescence in situ hybridization[J].Blood Reviews,1993,7(2):127-134.
[17]LI G,RUAN X,AUERBACH R K,et al.Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation[J].Cell,2012,148(1/2):84-98.
[18]DE WIT E,DE LAAT W.A decade of 3C technologies:insights into nuclear organization[J].Genes & Development,2012,26(1):11-24.
[19]HAKIM O,MISTELI T.SnapShot:Chromosome confirmation capture[J].Cell,2012,148(5):1068.e1.
[20]DEKKER J,RIPPE K,DEKKER M,et al.Capturing chromosome conformation[J].Science,2002,295(5558):1306-1311.
[21]SIMONIS M,KLOVS P,SPLINTER E,et al.Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C)[J].Nature Genetics,2006,38(11):1348-1354.
[22]DOSTIE J,RICHMOND T A,ARNAOUT R A,et al.Chromosome Conformation Capture Carbon Copy (5C):a massively parallel solution for mapping interactions between genomic elements[J].Genome Research,2006,16(10):1299-1309.
[23]RAO S S,HUNTLEY M H,DURAND N C,et al.A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping[J].Cell,2014,159(7):1665-1680.
[24]LIEBERMAN-AIDEN E,VAN BERKUM N L,WILLIAMS L,et al.Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome[J].Science,2009,326(5950):289-293.
[25]HEIDARI N,PHANSTIEL D H,He C,et al.Genome-wide map of regulatory interactions in the human genome[J].Genome Research,2014,24(12):1905-1917.
[26]FULLWOOD M J,RUAN Y.ChIP-Based Methods for the Identification of Long-Range Chromatin Interactions[J].Journal of Cellular Biochemistry,2009,107(1):30-39.
[27]HOFFMAN M M,BUSKE O J,WANG J,et al.Unsupervised pattern discovery in human chromatin structure through genomic segmentation[J].Nature Methods,2012,9(5):473-476.
[28]ERNST J,KELLIS M.ChromHMM:automating chromatin-state discovery and characterization[J].Nature Methods,2012,9(3):215-216.
[29]BERNSTEIN B E,STAMATOYANNOPOULOS J A,COS-TELLO J F,et al.The NIH roadmap Epigenomics mapping consortium[J].Nat Biotechnol,2010,28(10):1045-1048.
[30]HARRIS D M,HARRIS S H.Digital design and computer architecture[M].Chian Machine Press,2014.
[31]COMPEAU P E,PEVZNER P A,TESLER G,et al.How to apply de Bruijn graphs to genome assembly[J].Nature Biotechno-logy,2011,29(11):987-991.
[32]WELCH M,GOVINDARAJAN S,NESS J E,et al.Design Parameters to Control Synthetic Gene Expression in Escherichia coli[J].PLOS ONE,2009,4(9):e7002.
[33]GUSTAFSSON C,GOVINDARAJAN S,MINSHULL J.Codon bias and heterologous protein expression[J].Trends in Biotechnology,2004,22(7):346-353.
[34]ESCHKE K,TRIMPERT J,OSTERRIEDER N,et al.Attenuation of a very virulent Marek's disease herpesvirus (MDV) by codon pair bias deoptimization[J].PLOS Pathogens,2018,14(1).
[35]WHALEN S,TRUTY R M,POLLARD K S.Enhancer-promo-ter interactions are encoded by complex genomic signatures on looping chromatin[J].Nature Genetics,2016,48(5):488-496.
[36]JOHN S,SABO P J,THURMAN R E,et al.Chromatin accessibility pre-determines glucocorticoid receptor binding patterns[J].Nature Genetics,2011,43(3):264-268.
[37]LEE D.Discriminative prediction of mammalian enhancers from DNA sequence[J].Genome Research,2011,21(12):2167-2180.
[38]GHANDI M,LEE D,MOHAMMADNOORI M,et al.Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features[J].PLoS Computational Biology,2014,10(12):e1003711.
[39]SINGH S,YANG Y,POCZOS B,et al.Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks[J].Quantitative Biology,2019,7:122-137.
[40]LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural computation,1989,1(4):541-551.
[41]LECUN Y.Generalization and network design strategies[C]//Connectionism in Perspective.1989:143-155.
[42]Zhang W,Itoh K,Tanida J,et al.Parallel distributed processing model with local space-invariant interconnections and its optical architecture[J].Applied Optics,1990,29(32):4790-4797.
[43]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[44]GREFF K,SRIVASTAVA R K,KOUTNíK J,et al.LSTM:A Search Space Odyssey[J].IEEE Transactions on Neural Networks & Learning Systems,2016,28(10):2222-2232.
[45]SALEHINEJAD H,SANKAR S,BARFETT J,et al.Recent Advances in Recurrent Neural Networks[J].arXiv:1801.01078.
[46]YANG Y,ZHANG R,SINGH S,et al.Exploiting sequence-based features for predicting enhancer-promoter interactions[J].Bioinformatics,2017,33(14):i252-i260.
[47]MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient Estimation of Word Representations in Vector Space[C]//International Conference on Learning Representations.2013.
[48]GOLDBERG Y,LEVY O.word2vec Explained:derivingMikolov et al.'s negative-sampling word-embedding method[J].arXiv:1402.3722.
[49]LE Q V,MIKOLOV T.Distributed Representations of Sentences and Documents[C]//International Conference on Machine Learning.2014:1188-1196.
[50]ZENG W,WU M,JIANG R.Prediction of enhancer-promoterinteractions via natural language processing[J].BMC Genomics,2018,19(S2):84.
[51]ZHUANG Z,SHEN X,PAN W,et al.A Simple Convolutional Neural Network for Prediction of Enhancer-Promoter Interactions with DNA Sequence Data[J].Bioinformatics,2019,35(17):2899-2906.
[52]PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359.
[53]ØROM U A.Enhancer RNAs[M].New York:Humana Press,2017.
[54]XIE J H,SUN Y,WANG S,et al.Functional Identification of Enhancer and Its Research Progress in Agricultural Animals[J].Chinese Journal of Cell Biology,2019,41(7):1395-1400.
[55]BENABDALLAH N S,WILLIAMSON I,ILLINGWORTH RS,et al.Decreased enhancer-promoter proximity accompanying enhancer activation[J].Molecular cell,2019,76(3):473.
[56]WU Z Q,MI Z Y.Research progress of super enhancer in cancer[J].Hereditas,2019,41(1):41-51.
[57]HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEETransactions on Knowledge & Data Engineering,2009,21(9):1263-1284.
[58]CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:specialissue on learning from imbalanced data sets[J].Acm Sigkdd Explorations Newsletter,2004,6(1):1-6.
[59]KANG P,CHO S.EUS SVMs:Ensemble of Under-SampledSVMs for Data Imbalance Problems[C]//International Confe-rence on Neural Information Processing.2006.
[60]LU Y,CHEUNG Y M,TANG Y Y.Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence.2017.
[61]EISEN M B,SPELLMAN P T,BROWN P O,et al.Cluster analysis and display of genome-wide expression patterns[J].Proceedings of the National Academy of Sciences of the United States of America,1998,95(25):14863-14868.
[62]YE Y,ZHANG R,ZHENG W,et al.RIFS:a randomly restartedincremental feature selection algorithm[J].Scientific Reports,2017,7(1):13013.
[63]RAO H,SHI X,RODRIGUE A K,et al.Feature selection based on artificial bee colony and gradient boosting decision tree[J].Applied Soft Computing,2019,74:634-642.
[64]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[65]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762.
[66]VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008).ACM,2008.
[67]MOHAMED A,DAHL G E,HINTON G E,et al.Acoustic Modeling Using Deep Belief Networks[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):14-22.
[68]SRIVASTAVA N,SALAKHUTDINOV R R,HINTON G E.Modeling Documents with Deep Boltzmann Machines[J].arXiv:1309.6865.
[69]LAVALLE S M,BRANICKY M S.On the Relationship be-tween Classical Grid Search and Probabilistic Roadmaps[J].International Journal of Robotics Research,2003,23(23):673-692.
[70]REUNANEN J.Overfitting in Making Comparisons BetweenVariable Selection Methods[J].Journal of Machine Learning Research,2003,3(3):1371-1382.
[71]WRIGHT A H.Genetic Algorithms for Real Parameter Optimization[J].Foundations of Genetic Algorithms,1991,1:205-218.
[72]HAO S,WANG X,XIE J,et al.Rigid framework section para-meter optimization and optimization algorithm research[J].Transactions of the Canadian Society for Mechanical Enginee-ring,2019,43(8):398-404.
[1] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[2] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[3] 李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰.
基于多特征融合的重叠组套索脑功能超网络构建及分类
Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion
计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049
[4] 瞿中, 陈雯.
基于空洞卷积和多特征融合的混凝土路面裂缝检测
Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion
计算机科学, 2022, 49(3): 192-196. https://doi.org/10.11896/jsjkx.210100164
[5] 牛富生, 郭延哺, 李维华, 刘文洋.
基于序列特征融合的蛋白质可溶性预测
Protein Solubility Prediction Based on Sequence Feature Fusion
计算机科学, 2022, 49(1): 285-291. https://doi.org/10.11896/jsjkx.201100117
[6] 吕金娜, 邢春玉, 李莉.
基于多特征融合的细粒度视频人物关系抽取
Video Character Relation Extraction Based on Multi-feature Fusion and Fine-granularity Analysis
计算机科学, 2021, 48(4): 117-122. https://doi.org/10.11896/jsjkx.200800160
[7] 栾晓, 李晓双.
基于多特征融合的人脸活体检测算法
Face Anti-spoofing Algorithm Based on Multi-feature Fusion
计算机科学, 2021, 48(11A): 409-415. https://doi.org/10.11896/jsjkx.210100181
[8] 原晓佩, 陈小锋, 廉明.
基于Haar-like和LBP的多特征融合目标检测算法
Improved Multi-feature Fusion Algorithm for Target Detection Based on Haar-like and LBP
计算机科学, 2021, 48(11): 219-225. https://doi.org/10.11896/jsjkx.201100174
[9] 吴宏涛, 刘力源, 孟颖, 荣亚鹏, 李路凯.
动态多特征融合的道路遗洒物威胁度分析方法
Novel Threat Degree Analysis Method for Scattered ObJects in Road Traffic Based on Dynamic Multi-feature Fusion
计算机科学, 2020, 47(6A): 196-205. https://doi.org/10.11896/JsJkx.190900066
[10] 王晓, 邹泽伟, 李勃勃, 王静.
基于多特征融合的彩色图像声呐目标检测
Target Detection in Colorful Imaging Sonar Based on Multi-feature Fusion
计算机科学, 2019, 46(6A): 177-181.
[11] 金堃, 陈少昌.
步态识别现状与发展
Status and Development of Gait Recognition
计算机科学, 2019, 46(6A): 30-34.
[12] 曾凡智, 周燕, 余家豪, 罗粤, 邱腾达, 钱杰昌.
基于无监督学习的二维工程CAD模型端到端检索算法
End-to-End Retrieval Algorithm of Two-dimensional Engineering CAD Model Based on Unsupervised Learning
计算机科学, 2019, 46(12): 298-305. https://doi.org/10.11896/jsjkx.190900003
[13] 张玉雪,唐振民,钱彬,徐威.
基于稀疏表示和多特征融合的路面裂缝检测
Pavement Crack Detection Based on Sparse Representation and Multi-feature Fusion
计算机科学, 2018, 45(7): 271-277. https://doi.org/10.11896/j.issn.1002-137X.2018.07.047
[14] 陈嵘, 李鹏, 黄勇.
基于多特征融合的运动阴影去除算法
Moving Shadow Removal Algorithm Based on Multi-feature Fusion
计算机科学, 2018, 45(6): 291-295. https://doi.org/10.11896/j.issn.1002-137X.2018.06.051
[15] 张蕾,宫宁生,李金.
基于方向矢量的多特征融合粒子滤波人体跟踪算法研究
Research of Human Tracking Algorithm through Multi Feature Fusion Particle Filter Based on Direction Vector
计算机科学, 2015, 42(2): 296-300. https://doi.org/10.11896/j.issn.1002-137X.2015.02.063
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!