计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 64-71.doi: 10.11896/jsjkx.191100027
胡宇佳, 甘伟, 朱敏
HU Yu-jia, GAN Wei, ZHU Min
摘要: 研究增强子-启动子相互作用机理有助于人们理解基因调控关系,进而揭示与疾病相关的基因,为疾病诊疗提供新思路和新方法。传统的生物检测方法的实验成本高、耗时长,且受分辨率的限制,难以精确鉴定单个增强子-启动子的相互作用。通过计算方法来解决生物问题已成为近年来的研究热点,此类方法可以通过复杂的网络结构主动学习序列特征和空间结构,进而准确预测增强子-启动子的作用。首先介绍了传统生物实验检测方法的研究现状;然后从序列特征的角度出发,围绕多特征融合的基本思想,对统计学和深度学习方法在增强子-启动子相互作用预测上的应用进行归纳整理;最后对该领域的研究热点和挑战进行总结分析。
中图分类号:
[1]ESTELLER M.Non-coding RNAs in human disease[J].Nature Reviews Genetics,2011,12(12):861-874. [2]YANG F.Research on piRNA and promoter based on sequence information[D].Harbin:Harbin Institute of Technology,2018. [3]KARNUTA J M,SCACHERI P C.Enhancers:bridging the gap between gene control and human disease[J].Human Molecular Genetics,2018,27(R2):R219-R227. [4]BLACKWOOD E M,KADONAGA J T.Going the Distance:A Current View of Enhancer Action[J].Science,1998,281(5373):60-63. [5]PENNACCHIO L A,BICKMORE W,DEAN A,et al.Enhan-cers:five essential questions[J].Nature Reviews Genetics,2013,14(4):288. [6]JIANG R.Walking on multiple disease-gene networks to prioritize candidate genes[J].Journal of Molecular Cell Biology,2015,7(3):214-230. [7]DAVISON L J,WALLACE C,COOPER J D,et al.Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene[J].Human Molecular Genetics,2012,21(2):322-333. [8]SMEMO S,TENA J J,KIM K H,et al.Obesity-associated va-riants within FTO form long-range functional connections with IRX3[J].Nature,2014,507(7492):371-375. [9]MASTON G A,EVANS S K,GREEN M R.Transcriptionalregulatory elements in the human genome[J].Annual Review of Genomics & Human Genetics,2006,7(1):29. [10]HE B,CHEN C,TENG L,et al.Global view of enhancer-promoter interactome in human cells[J].Proceedings of theNatio-nal Academy of Sciences of the United States of America,2014,111(21). [11]YU Z,ZHAO Y X,YI Z L,et al.Research on folding diversity in statistical learning methods for RNA secondary structure prediction[J].International Journal of Biological Sciences,2018,14(8):872-882. [12] DAVID R.The Elements of Statistical Learning:Data Mining,Inference,and Prediction[J].Journal of the American Statistical Association,2004,99(466):567-567. [13]ROBERT C.Machine Learning,a Probabilistic Perspective[M]//Machine learning:a probabilistic perspective.2012. [14]MICHALSKI R S,CARBONELL J G,MITCHELL T M.Machine Learning[M]//Symbolic Computation.1994:3-61. [15]ANGERMUELLER C,PARNAMAA T,PARTS L,et al.Deep learning for computational biology[J].Molecular Systems Biology,2016,12(7):878. [16]PRICE C M.Fluorescence in situ hybridization[J].Blood Reviews,1993,7(2):127-134. [17]LI G,RUAN X,AUERBACH R K,et al.Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation[J].Cell,2012,148(1/2):84-98. [18]DE WIT E,DE LAAT W.A decade of 3C technologies:insights into nuclear organization[J].Genes & Development,2012,26(1):11-24. [19]HAKIM O,MISTELI T.SnapShot:Chromosome confirmation capture[J].Cell,2012,148(5):1068.e1. [20]DEKKER J,RIPPE K,DEKKER M,et al.Capturing chromosome conformation[J].Science,2002,295(5558):1306-1311. [21]SIMONIS M,KLOVS P,SPLINTER E,et al.Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C)[J].Nature Genetics,2006,38(11):1348-1354. [22]DOSTIE J,RICHMOND T A,ARNAOUT R A,et al.Chromosome Conformation Capture Carbon Copy (5C):a massively parallel solution for mapping interactions between genomic elements[J].Genome Research,2006,16(10):1299-1309. [23]RAO S S,HUNTLEY M H,DURAND N C,et al.A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping[J].Cell,2014,159(7):1665-1680. [24]LIEBERMAN-AIDEN E,VAN BERKUM N L,WILLIAMS L,et al.Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome[J].Science,2009,326(5950):289-293. [25]HEIDARI N,PHANSTIEL D H,He C,et al.Genome-wide map of regulatory interactions in the human genome[J].Genome Research,2014,24(12):1905-1917. [26]FULLWOOD M J,RUAN Y.ChIP-Based Methods for the Identification of Long-Range Chromatin Interactions[J].Journal of Cellular Biochemistry,2009,107(1):30-39. [27]HOFFMAN M M,BUSKE O J,WANG J,et al.Unsupervised pattern discovery in human chromatin structure through genomic segmentation[J].Nature Methods,2012,9(5):473-476. [28]ERNST J,KELLIS M.ChromHMM:automating chromatin-state discovery and characterization[J].Nature Methods,2012,9(3):215-216. [29]BERNSTEIN B E,STAMATOYANNOPOULOS J A,COS-TELLO J F,et al.The NIH roadmap Epigenomics mapping consortium[J].Nat Biotechnol,2010,28(10):1045-1048. [30]HARRIS D M,HARRIS S H.Digital design and computer architecture[M].Chian Machine Press,2014. [31]COMPEAU P E,PEVZNER P A,TESLER G,et al.How to apply de Bruijn graphs to genome assembly[J].Nature Biotechno-logy,2011,29(11):987-991. [32]WELCH M,GOVINDARAJAN S,NESS J E,et al.Design Parameters to Control Synthetic Gene Expression in Escherichia coli[J].PLOS ONE,2009,4(9):e7002. [33]GUSTAFSSON C,GOVINDARAJAN S,MINSHULL J.Codon bias and heterologous protein expression[J].Trends in Biotechnology,2004,22(7):346-353. [34]ESCHKE K,TRIMPERT J,OSTERRIEDER N,et al.Attenuation of a very virulent Marek's disease herpesvirus (MDV) by codon pair bias deoptimization[J].PLOS Pathogens,2018,14(1). [35]WHALEN S,TRUTY R M,POLLARD K S.Enhancer-promo-ter interactions are encoded by complex genomic signatures on looping chromatin[J].Nature Genetics,2016,48(5):488-496. [36]JOHN S,SABO P J,THURMAN R E,et al.Chromatin accessibility pre-determines glucocorticoid receptor binding patterns[J].Nature Genetics,2011,43(3):264-268. [37]LEE D.Discriminative prediction of mammalian enhancers from DNA sequence[J].Genome Research,2011,21(12):2167-2180. [38]GHANDI M,LEE D,MOHAMMADNOORI M,et al.Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features[J].PLoS Computational Biology,2014,10(12):e1003711. [39]SINGH S,YANG Y,POCZOS B,et al.Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks[J].Quantitative Biology,2019,7:122-137. [40]LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural computation,1989,1(4):541-551. [41]LECUN Y.Generalization and network design strategies[C]//Connectionism in Perspective.1989:143-155. [42]Zhang W,Itoh K,Tanida J,et al.Parallel distributed processing model with local space-invariant interconnections and its optical architecture[J].Applied Optics,1990,29(32):4790-4797. [43]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [44]GREFF K,SRIVASTAVA R K,KOUTNíK J,et al.LSTM:A Search Space Odyssey[J].IEEE Transactions on Neural Networks & Learning Systems,2016,28(10):2222-2232. [45]SALEHINEJAD H,SANKAR S,BARFETT J,et al.Recent Advances in Recurrent Neural Networks[J].arXiv:1801.01078. [46]YANG Y,ZHANG R,SINGH S,et al.Exploiting sequence-based features for predicting enhancer-promoter interactions[J].Bioinformatics,2017,33(14):i252-i260. [47]MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient Estimation of Word Representations in Vector Space[C]//International Conference on Learning Representations.2013. [48]GOLDBERG Y,LEVY O.word2vec Explained:derivingMikolov et al.'s negative-sampling word-embedding method[J].arXiv:1402.3722. [49]LE Q V,MIKOLOV T.Distributed Representations of Sentences and Documents[C]//International Conference on Machine Learning.2014:1188-1196. [50]ZENG W,WU M,JIANG R.Prediction of enhancer-promoterinteractions via natural language processing[J].BMC Genomics,2018,19(S2):84. [51]ZHUANG Z,SHEN X,PAN W,et al.A Simple Convolutional Neural Network for Prediction of Enhancer-Promoter Interactions with DNA Sequence Data[J].Bioinformatics,2019,35(17):2899-2906. [52]PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359. [53]ØROM U A.Enhancer RNAs[M].New York:Humana Press,2017. [54]XIE J H,SUN Y,WANG S,et al.Functional Identification of Enhancer and Its Research Progress in Agricultural Animals[J].Chinese Journal of Cell Biology,2019,41(7):1395-1400. [55]BENABDALLAH N S,WILLIAMSON I,ILLINGWORTH RS,et al.Decreased enhancer-promoter proximity accompanying enhancer activation[J].Molecular cell,2019,76(3):473. [56]WU Z Q,MI Z Y.Research progress of super enhancer in cancer[J].Hereditas,2019,41(1):41-51. [57]HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEETransactions on Knowledge & Data Engineering,2009,21(9):1263-1284. [58]CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:specialissue on learning from imbalanced data sets[J].Acm Sigkdd Explorations Newsletter,2004,6(1):1-6. [59]KANG P,CHO S.EUS SVMs:Ensemble of Under-SampledSVMs for Data Imbalance Problems[C]//International Confe-rence on Neural Information Processing.2006. [60]LU Y,CHEUNG Y M,TANG Y Y.Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence.2017. [61]EISEN M B,SPELLMAN P T,BROWN P O,et al.Cluster analysis and display of genome-wide expression patterns[J].Proceedings of the National Academy of Sciences of the United States of America,1998,95(25):14863-14868. [62]YE Y,ZHANG R,ZHENG W,et al.RIFS:a randomly restartedincremental feature selection algorithm[J].Scientific Reports,2017,7(1):13013. [63]RAO H,SHI X,RODRIGUE A K,et al.Feature selection based on artificial bee colony and gradient boosting decision tree[J].Applied Soft Computing,2019,74:634-642. [64]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//North American Chapter of the Association for Computational Linguistics.2019:4171-4186. [65]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762. [66]VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008).ACM,2008. [67]MOHAMED A,DAHL G E,HINTON G E,et al.Acoustic Modeling Using Deep Belief Networks[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):14-22. [68]SRIVASTAVA N,SALAKHUTDINOV R R,HINTON G E.Modeling Documents with Deep Boltzmann Machines[J].arXiv:1309.6865. [69]LAVALLE S M,BRANICKY M S.On the Relationship be-tween Classical Grid Search and Probabilistic Roadmaps[J].International Journal of Robotics Research,2003,23(23):673-692. [70]REUNANEN J.Overfitting in Making Comparisons BetweenVariable Selection Methods[J].Journal of Machine Learning Research,2003,3(3):1371-1382. [71]WRIGHT A H.Genetic Algorithms for Real Parameter Optimization[J].Foundations of Genetic Algorithms,1991,1:205-218. [72]HAO S,WANG X,XIE J,et al.Rigid framework section para-meter optimization and optimization algorithm research[J].Transactions of the Canadian Society for Mechanical Enginee-ring,2019,43(8):398-404. |
[1] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[2] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[3] | 李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰. 基于多特征融合的重叠组套索脑功能超网络构建及分类 Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion 计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049 |
[4] | 瞿中, 陈雯. 基于空洞卷积和多特征融合的混凝土路面裂缝检测 Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion 计算机科学, 2022, 49(3): 192-196. https://doi.org/10.11896/jsjkx.210100164 |
[5] | 牛富生, 郭延哺, 李维华, 刘文洋. 基于序列特征融合的蛋白质可溶性预测 Protein Solubility Prediction Based on Sequence Feature Fusion 计算机科学, 2022, 49(1): 285-291. https://doi.org/10.11896/jsjkx.201100117 |
[6] | 吕金娜, 邢春玉, 李莉. 基于多特征融合的细粒度视频人物关系抽取 Video Character Relation Extraction Based on Multi-feature Fusion and Fine-granularity Analysis 计算机科学, 2021, 48(4): 117-122. https://doi.org/10.11896/jsjkx.200800160 |
[7] | 栾晓, 李晓双. 基于多特征融合的人脸活体检测算法 Face Anti-spoofing Algorithm Based on Multi-feature Fusion 计算机科学, 2021, 48(11A): 409-415. https://doi.org/10.11896/jsjkx.210100181 |
[8] | 原晓佩, 陈小锋, 廉明. 基于Haar-like和LBP的多特征融合目标检测算法 Improved Multi-feature Fusion Algorithm for Target Detection Based on Haar-like and LBP 计算机科学, 2021, 48(11): 219-225. https://doi.org/10.11896/jsjkx.201100174 |
[9] | 吴宏涛, 刘力源, 孟颖, 荣亚鹏, 李路凯. 动态多特征融合的道路遗洒物威胁度分析方法 Novel Threat Degree Analysis Method for Scattered ObJects in Road Traffic Based on Dynamic Multi-feature Fusion 计算机科学, 2020, 47(6A): 196-205. https://doi.org/10.11896/JsJkx.190900066 |
[10] | 王晓, 邹泽伟, 李勃勃, 王静. 基于多特征融合的彩色图像声呐目标检测 Target Detection in Colorful Imaging Sonar Based on Multi-feature Fusion 计算机科学, 2019, 46(6A): 177-181. |
[11] | 金堃, 陈少昌. 步态识别现状与发展 Status and Development of Gait Recognition 计算机科学, 2019, 46(6A): 30-34. |
[12] | 曾凡智, 周燕, 余家豪, 罗粤, 邱腾达, 钱杰昌. 基于无监督学习的二维工程CAD模型端到端检索算法 End-to-End Retrieval Algorithm of Two-dimensional Engineering CAD Model Based on Unsupervised Learning 计算机科学, 2019, 46(12): 298-305. https://doi.org/10.11896/jsjkx.190900003 |
[13] | 张玉雪,唐振民,钱彬,徐威. 基于稀疏表示和多特征融合的路面裂缝检测 Pavement Crack Detection Based on Sparse Representation and Multi-feature Fusion 计算机科学, 2018, 45(7): 271-277. https://doi.org/10.11896/j.issn.1002-137X.2018.07.047 |
[14] | 陈嵘, 李鹏, 黄勇. 基于多特征融合的运动阴影去除算法 Moving Shadow Removal Algorithm Based on Multi-feature Fusion 计算机科学, 2018, 45(6): 291-295. https://doi.org/10.11896/j.issn.1002-137X.2018.06.051 |
[15] | 张蕾,宫宁生,李金. 基于方向矢量的多特征融合粒子滤波人体跟踪算法研究 Research of Human Tracking Algorithm through Multi Feature Fusion Particle Filter Based on Direction Vector 计算机科学, 2015, 42(2): 296-300. https://doi.org/10.11896/j.issn.1002-137X.2015.02.063 |
|