计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 64-71.doi: 10.11896/jsjkx.191100027
胡宇佳, 甘伟, 朱敏
HU Yu-jia, GAN Wei, ZHU Min
摘要: 研究增强子-启动子相互作用机理有助于人们理解基因调控关系,进而揭示与疾病相关的基因,为疾病诊疗提供新思路和新方法。传统的生物检测方法的实验成本高、耗时长,且受分辨率的限制,难以精确鉴定单个增强子-启动子的相互作用。通过计算方法来解决生物问题已成为近年来的研究热点,此类方法可以通过复杂的网络结构主动学习序列特征和空间结构,进而准确预测增强子-启动子的作用。首先介绍了传统生物实验检测方法的研究现状;然后从序列特征的角度出发,围绕多特征融合的基本思想,对统计学和深度学习方法在增强子-启动子相互作用预测上的应用进行归纳整理;最后对该领域的研究热点和挑战进行总结分析。
中图分类号:
[1] | ESTELLER M.Non-coding RNAs in human disease[J].Nature Reviews Genetics,2011,12(12):861-874. |
[2] | YANG F.Research on piRNA and promoter based on sequence information[D].Harbin:Harbin Institute of Technology,2018. |
[3] | KARNUTA J M,SCACHERI P C.Enhancers:bridging the gap between gene control and human disease[J].Human Molecular Genetics,2018,27(R2):R219-R227. |
[4] | BLACKWOOD E M,KADONAGA J T.Going the Distance:A Current View of Enhancer Action[J].Science,1998,281(5373):60-63. |
[5] | PENNACCHIO L A,BICKMORE W,DEAN A,et al.Enhan-cers:five essential questions[J].Nature Reviews Genetics,2013,14(4):288. |
[6] | JIANG R.Walking on multiple disease-gene networks to prioritize candidate genes[J].Journal of Molecular Cell Biology,2015,7(3):214-230. |
[7] | DAVISON L J,WALLACE C,COOPER J D,et al.Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene[J].Human Molecular Genetics,2012,21(2):322-333. |
[8] | SMEMO S,TENA J J,KIM K H,et al.Obesity-associated va-riants within FTO form long-range functional connections with IRX3[J].Nature,2014,507(7492):371-375. |
[9] | MASTON G A,EVANS S K,GREEN M R.Transcriptionalregulatory elements in the human genome[J].Annual Review of Genomics & Human Genetics,2006,7(1):29. |
[10] | HE B,CHEN C,TENG L,et al.Global view of enhancer-promoter interactome in human cells[J].Proceedings of theNatio-nal Academy of Sciences of the United States of America,2014,111(21). |
[11] | YU Z,ZHAO Y X,YI Z L,et al.Research on folding diversity in statistical learning methods for RNA secondary structure prediction[J].International Journal of Biological Sciences,2018,14(8):872-882. |
[12] | DAVID R.The Elements of Statistical Learning:Data Mining,Inference,and Prediction[J].Journal of the American Statistical Association,2004,99(466):567-567. |
[13] | ROBERT C.Machine Learning,a Probabilistic Perspective[M]//Machine learning:a probabilistic perspective.2012. |
[14] | MICHALSKI R S,CARBONELL J G,MITCHELL T M.Machine Learning[M]//Symbolic Computation.1994:3-61. |
[15] | ANGERMUELLER C,PARNAMAA T,PARTS L,et al.Deep learning for computational biology[J].Molecular Systems Biology,2016,12(7):878. |
[16] | PRICE C M.Fluorescence in situ hybridization[J].Blood Reviews,1993,7(2):127-134. |
[17] | LI G,RUAN X,AUERBACH R K,et al.Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation[J].Cell,2012,148(1/2):84-98. |
[18] | DE WIT E,DE LAAT W.A decade of 3C technologies:insights into nuclear organization[J].Genes & Development,2012,26(1):11-24. |
[19] | HAKIM O,MISTELI T.SnapShot:Chromosome confirmation capture[J].Cell,2012,148(5):1068.e1. |
[20] | DEKKER J,RIPPE K,DEKKER M,et al.Capturing chromosome conformation[J].Science,2002,295(5558):1306-1311. |
[21] | SIMONIS M,KLOVS P,SPLINTER E,et al.Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C)[J].Nature Genetics,2006,38(11):1348-1354. |
[22] | DOSTIE J,RICHMOND T A,ARNAOUT R A,et al.Chromosome Conformation Capture Carbon Copy (5C):a massively parallel solution for mapping interactions between genomic elements[J].Genome Research,2006,16(10):1299-1309. |
[23] | RAO S S,HUNTLEY M H,DURAND N C,et al.A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping[J].Cell,2014,159(7):1665-1680. |
[24] | LIEBERMAN-AIDEN E,VAN BERKUM N L,WILLIAMS L,et al.Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome[J].Science,2009,326(5950):289-293. |
[25] | HEIDARI N,PHANSTIEL D H,He C,et al.Genome-wide map of regulatory interactions in the human genome[J].Genome Research,2014,24(12):1905-1917. |
[26] | FULLWOOD M J,RUAN Y.ChIP-Based Methods for the Identification of Long-Range Chromatin Interactions[J].Journal of Cellular Biochemistry,2009,107(1):30-39. |
[27] | HOFFMAN M M,BUSKE O J,WANG J,et al.Unsupervised pattern discovery in human chromatin structure through genomic segmentation[J].Nature Methods,2012,9(5):473-476. |
[28] | ERNST J,KELLIS M.ChromHMM:automating chromatin-state discovery and characterization[J].Nature Methods,2012,9(3):215-216. |
[29] | BERNSTEIN B E,STAMATOYANNOPOULOS J A,COS-TELLO J F,et al.The NIH roadmap Epigenomics mapping consortium[J].Nat Biotechnol,2010,28(10):1045-1048. |
[30] | HARRIS D M,HARRIS S H.Digital design and computer architecture[M].Chian Machine Press,2014. |
[31] | COMPEAU P E,PEVZNER P A,TESLER G,et al.How to apply de Bruijn graphs to genome assembly[J].Nature Biotechno-logy,2011,29(11):987-991. |
[32] | WELCH M,GOVINDARAJAN S,NESS J E,et al.Design Parameters to Control Synthetic Gene Expression in Escherichia coli[J].PLOS ONE,2009,4(9):e7002. |
[33] | GUSTAFSSON C,GOVINDARAJAN S,MINSHULL J.Codon bias and heterologous protein expression[J].Trends in Biotechnology,2004,22(7):346-353. |
[34] | ESCHKE K,TRIMPERT J,OSTERRIEDER N,et al.Attenuation of a very virulent Marek's disease herpesvirus (MDV) by codon pair bias deoptimization[J].PLOS Pathogens,2018,14(1). |
[35] | WHALEN S,TRUTY R M,POLLARD K S.Enhancer-promo-ter interactions are encoded by complex genomic signatures on looping chromatin[J].Nature Genetics,2016,48(5):488-496. |
[36] | JOHN S,SABO P J,THURMAN R E,et al.Chromatin accessibility pre-determines glucocorticoid receptor binding patterns[J].Nature Genetics,2011,43(3):264-268. |
[37] | LEE D.Discriminative prediction of mammalian enhancers from DNA sequence[J].Genome Research,2011,21(12):2167-2180. |
[38] | GHANDI M,LEE D,MOHAMMADNOORI M,et al.Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features[J].PLoS Computational Biology,2014,10(12):e1003711. |
[39] | SINGH S,YANG Y,POCZOS B,et al.Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks[J].Quantitative Biology,2019,7:122-137. |
[40] | LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural computation,1989,1(4):541-551. |
[41] | LECUN Y.Generalization and network design strategies[C]//Connectionism in Perspective.1989:143-155. |
[42] | Zhang W,Itoh K,Tanida J,et al.Parallel distributed processing model with local space-invariant interconnections and its optical architecture[J].Applied Optics,1990,29(32):4790-4797. |
[43] | HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. |
[44] | GREFF K,SRIVASTAVA R K,KOUTNíK J,et al.LSTM:A Search Space Odyssey[J].IEEE Transactions on Neural Networks & Learning Systems,2016,28(10):2222-2232. |
[45] | SALEHINEJAD H,SANKAR S,BARFETT J,et al.Recent Advances in Recurrent Neural Networks[J].arXiv:1801.01078. |
[46] | YANG Y,ZHANG R,SINGH S,et al.Exploiting sequence-based features for predicting enhancer-promoter interactions[J].Bioinformatics,2017,33(14):i252-i260. |
[47] | MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient Estimation of Word Representations in Vector Space[C]//International Conference on Learning Representations.2013. |
[48] | GOLDBERG Y,LEVY O.word2vec Explained:derivingMikolov et al.'s negative-sampling word-embedding method[J].arXiv:1402.3722. |
[49] | LE Q V,MIKOLOV T.Distributed Representations of Sentences and Documents[C]//International Conference on Machine Learning.2014:1188-1196. |
[50] | ZENG W,WU M,JIANG R.Prediction of enhancer-promoterinteractions via natural language processing[J].BMC Genomics,2018,19(S2):84. |
[51] | ZHUANG Z,SHEN X,PAN W,et al.A Simple Convolutional Neural Network for Prediction of Enhancer-Promoter Interactions with DNA Sequence Data[J].Bioinformatics,2019,35(17):2899-2906. |
[52] | PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359. |
[53] | ØROM U A.Enhancer RNAs[M].New York:Humana Press,2017. |
[54] | XIE J H,SUN Y,WANG S,et al.Functional Identification of Enhancer and Its Research Progress in Agricultural Animals[J].Chinese Journal of Cell Biology,2019,41(7):1395-1400. |
[55] | BENABDALLAH N S,WILLIAMSON I,ILLINGWORTH RS,et al.Decreased enhancer-promoter proximity accompanying enhancer activation[J].Molecular cell,2019,76(3):473. |
[56] | WU Z Q,MI Z Y.Research progress of super enhancer in cancer[J].Hereditas,2019,41(1):41-51. |
[57] | HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEETransactions on Knowledge & Data Engineering,2009,21(9):1263-1284. |
[58] | CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:specialissue on learning from imbalanced data sets[J].Acm Sigkdd Explorations Newsletter,2004,6(1):1-6. |
[59] | KANG P,CHO S.EUS SVMs:Ensemble of Under-SampledSVMs for Data Imbalance Problems[C]//International Confe-rence on Neural Information Processing.2006. |
[60] | LU Y,CHEUNG Y M,TANG Y Y.Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence.2017. |
[61] | EISEN M B,SPELLMAN P T,BROWN P O,et al.Cluster analysis and display of genome-wide expression patterns[J].Proceedings of the National Academy of Sciences of the United States of America,1998,95(25):14863-14868. |
[62] | YE Y,ZHANG R,ZHENG W,et al.RIFS:a randomly restartedincremental feature selection algorithm[J].Scientific Reports,2017,7(1):13013. |
[63] | RAO H,SHI X,RODRIGUE A K,et al.Feature selection based on artificial bee colony and gradient boosting decision tree[J].Applied Soft Computing,2019,74:634-642. |
[64] | DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//North American Chapter of the Association for Computational Linguistics.2019:4171-4186. |
[65] | VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762. |
[66] | VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008).ACM,2008. |
[67] | MOHAMED A,DAHL G E,HINTON G E,et al.Acoustic Modeling Using Deep Belief Networks[J].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):14-22. |
[68] | SRIVASTAVA N,SALAKHUTDINOV R R,HINTON G E.Modeling Documents with Deep Boltzmann Machines[J].arXiv:1309.6865. |
[69] | LAVALLE S M,BRANICKY M S.On the Relationship be-tween Classical Grid Search and Probabilistic Roadmaps[J].International Journal of Robotics Research,2003,23(23):673-692. |
[70] | REUNANEN J.Overfitting in Making Comparisons BetweenVariable Selection Methods[J].Journal of Machine Learning Research,2003,3(3):1371-1382. |
[71] | WRIGHT A H.Genetic Algorithms for Real Parameter Optimization[J].Foundations of Genetic Algorithms,1991,1:205-218. |
[72] | HAO S,WANG X,XIE J,et al.Rigid framework section para-meter optimization and optimization algorithm research[J].Transactions of the Canadian Society for Mechanical Enginee-ring,2019,43(8):398-404. |
[1] | 吴宏涛, 刘力源, 孟颖, 荣亚鹏, 李路凯. 动态多特征融合的道路遗洒物威胁度分析方法[J]. 计算机科学, 2020, 47(6A): 196-205. |
[2] | 金堃, 陈少昌. 步态识别现状与发展[J]. 计算机科学, 2019, 46(6A): 30-34. |
[3] | 王晓, 邹泽伟, 李勃勃, 王静. 基于多特征融合的彩色图像声呐目标检测[J]. 计算机科学, 2019, 46(6A): 177-181. |
[4] | 曾凡智, 周燕, 余家豪, 罗粤, 邱腾达, 钱杰昌. 基于无监督学习的二维工程CAD模型端到端检索算法[J]. 计算机科学, 2019, 46(12): 298-305. |
[5] | 张玉雪,唐振民,钱彬,徐威. 基于稀疏表示和多特征融合的路面裂缝检测[J]. 计算机科学, 2018, 45(7): 271-277. |
[6] | 陈嵘, 李鹏, 黄勇. 基于多特征融合的运动阴影去除算法[J]. 计算机科学, 2018, 45(6): 291-295. |
[7] | 张蕾,宫宁生,李金. 基于方向矢量的多特征融合粒子滤波人体跟踪算法研究[J]. 计算机科学, 2015, 42(2): 296-300. |
[8] | 洪朝群,陈旭辉,王晓栋,李士锦,吴克寿. 基于GPU并行加速的多特征融合的超图降维方法[J]. 计算机科学, 2015, 42(11): 90-93. |
[9] | 柴艳妹,韩文英,刘灿涛,李海峰. 融合理论在步态识别中的应用研究[J]. 计算机科学, 2012, 39(12): 272-277. |
[10] | 杨娟,李颖,刘鸿飞. 大规模移动自组织网络分层优化策略研究[J]. 计算机科学, 2011, 38(3): 115-119. |
|