计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 64-71.doi: 10.11896/jsjkx.191100027

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于多特征融合的增强子-启动子相互作用预测综述

胡宇佳, 甘伟, 朱敏   

  1. 四川大学计算机学院 成都610065
  • 收稿日期:2019-11-05 出版日期:2020-05-15 发布日期:2020-05-19
  • 通讯作者: 朱敏(zhumin@scu.edu.cn)
  • 作者简介:543574831@qq.com
  • 基金资助:
    “十三五”国家科技重大专项(2018ZX10201002-002-004)

Enhancer-Promoter Interaction Prediction Based on Multi-feature Fusion

HU Yu-jia, GAN Wei, ZHU Min   

  1. College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2019-11-05 Online:2020-05-15 Published:2020-05-19
  • About author:HU Yu-jia,born in 1995,postgraduate,is a member of China Computer Federation.Her main research interests include data mining and bioinformatics.
    ZHU Min,born in 1971,Ph.D,professor,is a senior member of China Compu-ter Federation.Her main research inte-rests include bioinformatics,information visualization and visual analytics
  • Supported by:
    This work was supported by the National Major Scientific and Technologic Project During the Thirtieth Five-Year Plan (2018ZX10201002-002-004)

摘要: 研究增强子-启动子相互作用机理有助于人们理解基因调控关系,进而揭示与疾病相关的基因,为疾病诊疗提供新思路和新方法。传统的生物检测方法的实验成本高、耗时长,且受分辨率的限制,难以精确鉴定单个增强子-启动子的相互作用。通过计算方法来解决生物问题已成为近年来的研究热点,此类方法可以通过复杂的网络结构主动学习序列特征和空间结构,进而准确预测增强子-启动子的作用。首先介绍了传统生物实验检测方法的研究现状;然后从序列特征的角度出发,围绕多特征融合的基本思想,对统计学和深度学习方法在增强子-启动子相互作用预测上的应用进行归纳整理;最后对该领域的研究热点和挑战进行总结分析。

关键词: 增强子-启动子相互作用, 多特征融合, 序列特征, 应用综述, 疾病诊疗

Abstract: The study of the mechanism of Enhancer-Promoter Interaction is helpful to understand gene regulations,thus revealing specific genes that are relevant to diseases as well as providing new clinical methods and ideas for disease diagnosis and treatment.Compared to traditional biological analysis methods which are always more expensive,time-consuming and more difficult to precisely identify specific interactions due to limited resolution,computational methods to solve biological problems have become a hot research topic in recent years.This method can actively learn sequence features and spatial structures through complex network structures,so as to precisely and accurately predict the interactions of enhancers and promoters.This paper firstly introduces the research status of traditional biological detection methods.Then,from the perspective of sequence features,the application of statistics and deep learning method in the prediction of enhancer - promoter interaction is summarized and sorted out based on the basic idea of multi-feature fusion.Finally,the research hotspots and challenges in this field are summarized and analyzed.

Key words: Enhancer-promoter interaction, Multi-feature fusion, Sequence feature, Application overview, Disease diagnosis and treatment

中图分类号: 

  • TP391
[1] ESTELLER M.Non-coding RNAs in human disease[J].Nature Reviews Genetics,2011,12(12):861-874.
[2] YANG F.Research on piRNA and promoter based on sequence information[D].Harbin:Harbin Institute of Technology,2018.
[3] KARNUTA J M,SCACHERI P C.Enhancers:bridging the gap between gene control and human disease[J].Human Molecular Genetics,2018,27(R2):R219-R227.
[4] BLACKWOOD E M,KADONAGA J T.Going the Distance:A Current View of Enhancer Action[J].Science,1998,281(5373):60-63.
[5] PENNACCHIO L A,BICKMORE W,DEAN A,et al.Enhan-cers:five essential questions[J].Nature Reviews Genetics,2013,14(4):288.
[6] JIANG R.Walking on multiple disease-gene networks to prioritize candidate genes[J].Journal of Molecular Cell Biology,2015,7(3):214-230.
[7] DAVISON L J,WALLACE C,COOPER J D,et al.Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene[J].Human Molecular Genetics,2012,21(2):322-333.
[8] SMEMO S,TENA J J,KIM K H,et al.Obesity-associated va-riants within FTO form long-range functional connections with IRX3[J].Nature,2014,507(7492):371-375.
[9] MASTON G A,EVANS S K,GREEN M R.Transcriptionalregulatory elements in the human genome[J].Annual Review of Genomics & Human Genetics,2006,7(1):29.
[10] HE B,CHEN C,TENG L,et al.Global view of enhancer-promoter interactome in human cells[J].Proceedings of theNatio-nal Academy of Sciences of the United States of America,2014,111(21).
[11] YU Z,ZHAO Y X,YI Z L,et al.Research on folding diversity in statistical learning methods for RNA secondary structure prediction[J].International Journal of Biological Sciences,2018,14(8):872-882.
[12] DAVID R.The Elements of Statistical Learning:Data Mining,Inference,and Prediction[J].Journal of the American Statistical Association,2004,99(466):567-567.
[13] ROBERT C.Machine Learning,a Probabilistic Perspective[M]//Machine learning:a probabilistic perspective.2012.
[14] MICHALSKI R S,CARBONELL J G,MITCHELL T M.Machine Learning[M]//Symbolic Computation.1994:3-61.
[15] ANGERMUELLER C,PARNAMAA T,PARTS L,et al.Deep learning for computational biology[J].Molecular Systems Biology,2016,12(7):878.
[16] PRICE C M.Fluorescence in situ hybridization[J].Blood Reviews,1993,7(2):127-134.
[17] LI G,RUAN X,AUERBACH R K,et al.Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation[J].Cell,2012,148(1/2):84-98.
[18] DE WIT E,DE LAAT W.A decade of 3C technologies:insights into nuclear organization[J].Genes & Development,2012,26(1):11-24.
[19] HAKIM O,MISTELI T.SnapShot:Chromosome confirmation capture[J].Cell,2012,148(5):1068.e1.
[20] DEKKER J,RIPPE K,DEKKER M,et al.Capturing chromosome conformation[J].Science,2002,295(5558):1306-1311.
[21] SIMONIS M,KLOVS P,SPLINTER E,et al.Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C)[J].Nature Genetics,2006,38(11):1348-1354.
[22] DOSTIE J,RICHMOND T A,ARNAOUT R A,et al.Chromosome Conformation Capture Carbon Copy (5C):a massively parallel solution for mapping interactions between genomic elements[J].Genome Research,2006,16(10):1299-1309.
[23] RAO S S,HUNTLEY M H,DURAND N C,et al.A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping[J].Cell,2014,159(7):1665-1680.
[24] LIEBERMAN-AIDEN E,VAN BERKUM N L,WILLIAMS L,et al.Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome[J].Science,2009,326(5950):289-293.
[25] HEIDARI N,PHANSTIEL D H,He C,et al.Genome-wide map of regulatory interactions in the human genome[J].Genome Research,2014,24(12):1905-1917.
[26] FULLWOOD M J,RUAN Y.ChIP-Based Methods for the Identification of Long-Range Chromatin Interactions[J].Journal of Cellular Biochemistry,2009,107(1):30-39.
[27] HOFFMAN M M,BUSKE O J,WANG J,et al.Unsupervised pattern discovery in human chromatin structure through genomic segmentation[J].Nature Methods,2012,9(5):473-476.
[28] ERNST J,KELLIS M.ChromHMM:automating chromatin-state discovery and characterization[J].Nature Methods,2012,9(3):215-216.
[29] BERNSTEIN B E,STAMATOYANNOPOULOS J A,COS-TELLO J F,et al.The NIH roadmap Epigenomics mapping consortium[J].Nat Biotechnol,2010,28(10):1045-1048.
[30] HARRIS D M,HARRIS S H.Digital design and computer architecture[M].Chian Machine Press,2014.
[31] COMPEAU P E,PEVZNER P A,TESLER G,et al.How to apply de Bruijn graphs to genome assembly[J].Nature Biotechno-logy,2011,29(11):987-991.
[32] WELCH M,GOVINDARAJAN S,NESS J E,et al.Design Parameters to Control Synthetic Gene Expression in Escherichia coli[J].PLOS ONE,2009,4(9):e7002.
[33] GUSTAFSSON C,GOVINDARAJAN S,MINSHULL J.Codon bias and heterologous protein expression[J].Trends in Biotechnology,2004,22(7):346-353.
[34] ESCHKE K,TRIMPERT J,OSTERRIEDER N,et al.Attenuation of a very virulent Marek's disease herpesvirus (MDV) by codon pair bias deoptimization[J].PLOS Pathogens,2018,14(1).
[35] WHALEN S,TRUTY R M,POLLARD K S.Enhancer-promo-ter interactions are encoded by complex genomic signatures on looping chromatin[J].Nature Genetics,2016,48(5):488-496.
[36] JOHN S,SABO P J,THURMAN R E,et al.Chromatin accessibility pre-determines glucocorticoid receptor binding patterns[J].Nature Genetics,2011,43(3):264-268.
[37] LEE D.Discriminative prediction of mammalian enhancers from DNA sequence[J].Genome Research,2011,21(12):2167-2180.
[38] GHANDI M,LEE D,MOHAMMADNOORI M,et al.Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features[J].PLoS Computational Biology,2014,10(12):e1003711.
[39] SINGH S,YANG Y,POCZOS B,et al.Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks[J].Quantitative Biology,2019,7:122-137.
[40] LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural computation,1989,1(4):541-551.
[41] LECUN Y.Generalization and network design strategies[C]//Connectionism in Perspective.1989:143-155.
[42] Zhang W,Itoh K,Tanida J,et al.Parallel distributed processing model with local space-invariant interconnections and its optical architecture[J].Applied Optics,1990,29(32):4790-4797.
[43] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[44] GREFF K,SRIVASTAVA R K,KOUTNíK J,et al.LSTM:A Search Space Odyssey[J].IEEE Transactions on Neural Networks & Learning Systems,2016,28(10):2222-2232.
[45] SALEHINEJAD H,SANKAR S,BARFETT J,et al.Recent Advances in Recurrent Neural Networks[J].arXiv:1801.01078.
[46] YANG Y,ZHANG R,SINGH S,et al.Exploiting sequence-based features for predicting enhancer-promoter interactions[J].Bioinformatics,2017,33(14):i252-i260.
[47] MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient Estimation of Word Representations in Vector Space[C]//International Conference on Learning Representations.2013.
[48] GOLDBERG Y,LEVY O.word2vec Explained:derivingMikolov et al.'s negative-sampling word-embedding method[J].arXiv:1402.3722.
[49] LE Q V,MIKOLOV T.Distributed Representations of Sentences and Documents[C]//International Conference on Machine Learning.2014:1188-1196.
[50] ZENG W,WU M,JIANG R.Prediction of enhancer-promoterinteractions via natural language processing[J].BMC Genomics,2018,19(S2):84.
[51] ZHUANG Z,SHEN X,PAN W,et al.A Simple Convolutional Neural Network for Prediction of Enhancer-Promoter Interactions with DNA Sequence Data[J].Bioinformatics,2019,35(17):2899-2906.
[52] PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359.
[53] ØROM U A.Enhancer RNAs[M].New York:Humana Press,2017.
[54] XIE J H,SUN Y,WANG S,et al.Functional Identification of Enhancer and Its Research Progress in Agricultural Animals[J].Chinese Journal of Cell Biology,2019,41(7):1395-1400.
[55] BENABDALLAH N S,WILLIAMSON I,ILLINGWORTH RS,et al.Decreased enhancer-promoter proximity accompanying enhancer activation[J].Molecular cell,2019,76(3):473.
[56] WU Z Q,MI Z Y.Research progress of super enhancer in cancer[J].Hereditas,2019,41(1):41-51.
[57] HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEETransactions on Knowledge & Data Engineering,2009,21(9):1263-1284.
[58] CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:specialissue on learning from imbalanced data sets[J].Acm Sigkdd Explorations Newsletter,2004,6(1):1-6.
[59] KANG P,CHO S.EUS SVMs:Ensemble of Under-SampledSVMs for Data Imbalance Problems[C]//International Confe-rence on Neural Information Processing.2006.
[60] LU Y,CHEUNG Y M,TANG Y Y.Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence.2017.
[61] EISEN M B,SPELLMAN P T,BROWN P O,et al.Cluster analysis and display of genome-wide expression patterns[J].Proceedings of the National Academy of Sciences of the United States of America,1998,95(25):14863-14868.
[62] YE Y,ZHANG R,ZHENG W,et al.RIFS:a randomly restartedincremental feature selection algorithm[J].Scientific Reports,2017,7(1):13013.
[63] RAO H,SHI X,RODRIGUE A K,et al.Feature selection based on artificial bee colony and gradient boosting decision tree[J].Applied Soft Computing,2019,74:634-642.
[64] DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[65] VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762.
[66] VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008).ACM,2008.
[67] MOHAMED A,DAHL G E,HINTON G E,et al.Acoustic Modeling Using Deep Belief Networks[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):14-22.
[68] SRIVASTAVA N,SALAKHUTDINOV R R,HINTON G E.Modeling Documents with Deep Boltzmann Machines[J].arXiv:1309.6865.
[69] LAVALLE S M,BRANICKY M S.On the Relationship be-tween Classical Grid Search and Probabilistic Roadmaps[J].International Journal of Robotics Research,2003,23(23):673-692.
[70] REUNANEN J.Overfitting in Making Comparisons BetweenVariable Selection Methods[J].Journal of Machine Learning Research,2003,3(3):1371-1382.
[71] WRIGHT A H.Genetic Algorithms for Real Parameter Optimization[J].Foundations of Genetic Algorithms,1991,1:205-218.
[72] HAO S,WANG X,XIE J,et al.Rigid framework section para-meter optimization and optimization algorithm research[J].Transactions of the Canadian Society for Mechanical Enginee-ring,2019,43(8):398-404.
[1] 吴宏涛, 刘力源, 孟颖, 荣亚鹏, 李路凯. 动态多特征融合的道路遗洒物威胁度分析方法[J]. 计算机科学, 2020, 47(6A): 196-205.
[2] 金堃, 陈少昌. 步态识别现状与发展[J]. 计算机科学, 2019, 46(6A): 30-34.
[3] 王晓, 邹泽伟, 李勃勃, 王静. 基于多特征融合的彩色图像声呐目标检测[J]. 计算机科学, 2019, 46(6A): 177-181.
[4] 曾凡智, 周燕, 余家豪, 罗粤, 邱腾达, 钱杰昌. 基于无监督学习的二维工程CAD模型端到端检索算法[J]. 计算机科学, 2019, 46(12): 298-305.
[5] 张玉雪,唐振民,钱彬,徐威. 基于稀疏表示和多特征融合的路面裂缝检测[J]. 计算机科学, 2018, 45(7): 271-277.
[6] 陈嵘, 李鹏, 黄勇. 基于多特征融合的运动阴影去除算法[J]. 计算机科学, 2018, 45(6): 291-295.
[7] 张蕾,宫宁生,李金. 基于方向矢量的多特征融合粒子滤波人体跟踪算法研究[J]. 计算机科学, 2015, 42(2): 296-300.
[8] 洪朝群,陈旭辉,王晓栋,李士锦,吴克寿. 基于GPU并行加速的多特征融合的超图降维方法[J]. 计算机科学, 2015, 42(11): 90-93.
[9] 柴艳妹,韩文英,刘灿涛,李海峰. 融合理论在步态识别中的应用研究[J]. 计算机科学, 2012, 39(12): 272-277.
[10] 杨娟,李颖,刘鸿飞. 大规模移动自组织网络分层优化策略研究[J]. 计算机科学, 2011, 38(3): 115-119.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .