计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 35-39.doi: 10.11896/jsjkx.200600057
郭茂祖1,2, 杨帅1,2, 赵玲玲3
GUO Mao-zu1,2, YANG Shuai1,2, ZHAO Ling-ling3
摘要: RNA-Seq技术凭借测序成本低、精度高、覆盖范围广等优点,已经成为了转录组分析的重要方法,为研究基因表达模式、疾病的生物标志物探测、作物抗逆性研究和分子育种等提供了新的手段。然而,RNA-Seq产生的海量数据也给数据分析带来了挑战,如何有效地对RNA-Seq数据进行处理和分析成为了生物信息学研究的热点。文中对基于RNA-Seq技术的转录组分析流程进行介绍,包括RNA-Seq数据预处理、差异表达分析和高层分析。其中,RNA-Seq数据预处理即对原始测序数据进行质控和定量计算;差异表达分析则是对基因进行筛选,通常基于统计学或机器学习两种方法;高层分析是对差异基因进一步处理,通过富集分析等手段确定基因功能和调控网络。最后,对基于RNA-Seq的转录组分析方法的发展进行了探讨。
中图分类号:
[1] SANGER F,NICKLEN S,COULSON A R.DNA sequencingwith chain-terminating inhibitors [J].Proceedings of the National Academy of Sciences,1978,74(12):5463-5467. [2] MARCEL M,MICHAEL E,ALTMAN W E,et al.Genome sequencing in microfabricated high-density picolitre reactors [J].Nature,2005,437:158-160. [3] MUTZ K O,HEILKENBRINKER A,LÖNNE M,et al.Transcriptome analysis using next-generation sequencing [J].Current Opinion in Biotechnology,2013,24(1):22-30. [4] MOROZOVA O,HIRST M,MARRA M A.Applications ofNew Sequencing Technologies for Transcriptome Analysis [J].Annual Review of Genomics & Human Genetics,2009,10(1):135-151. [5] SEKHON R S,ROMAN B,HIRSCH C N,et al.Maize Gene Atlas Developed by RNA Sequencing and Comparative Evaluation of Transcriptomes Based on RNA Sequencing and Microarrays [J].Plos One,2013,8(4):e61005. [6] WANG Z,GERSTEIN M,SNYDER M.RNA-Seq:a revolutionary tool for transcriptomics [J].Nature Reviews Genetics,2010,10(1):57-63. [7] KUKURBA K R,MONTGOMERY S B.RNA Sequencing andAnalysis [J].Cold Spring Harbor Protocols,2015,2015(11):951. [8] COCK P J,FIELDS C J.The Sanger FASTQ file format for sequences with quality scores,and the Solexa/Illumina FASTQ variants [J].Nucleic acids research,2010,38(6):1767-1771. [9] TRAPNELL C,SALZBERG S L.How to map billions of short reads onto genomes [J].Nature Biotechnology,2009,27(5):455-457. [10] SMITH T F,WATERMAN M S.Identification of common molecular subsequences [J].Journal of Molecular Biology,1981,147(1):195-197. [11] WAGNER G P,KIN K,LYNCH V J.Measurement of mRNA abundance using RNA-seq data:RPKM measure is inconsistent among samples [J].Theory Biosci,2012,131(4):281-285. [12] HWANG S G,KIM K H,LEE B M,et al.Transcriptome analysis for identifying possible gene regulations during maize root emergence and formation at the initial growth stage [J].Genes & Genomics,2018,40(7):755-766. [13] SHI Y,JIANG H,FRANK E S.rSeqDiff:Detecting Differential Isoform Expression from RNA-Seq Data Using Hierarchical Likelihood Ratio Test [J].Plos One,2013,8(11):e79448. [14] JOHN C M,CHRISTOPHER E M,SHRIKANT M M,et al.RNA-seq:an assessment of technical reproducibility and comparison with gene expression arrays [J].Genome Research,2008,18(9):1509-1517. [15] SMYTH G K.edgeR:a Bioconductor package for differential expression analysis of digital gene expression data [J].Bioinformatics,2010,26(1):139. [16] LOVE M I,HUBER W,ANDERS S.Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 [J].Genome Biology,2014,15(12):550. [17] LUND S P,NETTLETON D,MCCARTHY D J,et al.Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates [J].Statistical Applications in Genetics & Molecular Biology,2012,11(5). [18] REEB P,JUAN S.Evaluating statistical analysismodels forRNA sequencing experiments [J].Frontiers in Genetics,2013,4:178. [19] CHEN Y K,HUSE S S,LIN L M.Differential expression ofp53,p63 and p73 proteins in human buccal squamous-cell carcinomas [J].Clinical Otolaryngology,2003,28(5):451-455. [20] DAVID J,TORRES,JUDY L,et al.Self-Contained Statistical Analysis of Gene Sets[J].Plos One,2016,11(10):e0163918. [21] FU R,WANG P,MA W P,et al.A statistical method for detecting differentially expressed SNVs based on next-generation RNA-seq data [J].Biometrics,2017,73(1):42-51. [22] XU M Q,CHEN L.An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq [J].Briefings in Bioinformatics,2018,19(1):1. [23] ZHAO X,DOU J,CAO J L,et al.Uncovering the potential differentially expressed miRNAs as diagnostic biomarkers for hepatocellular carcinoma based on machine learning in The Cancer Genome Atlas database [J].Oncology Reports,2020,43(6):1771-1784. [24] BAI Y F.Screening of sugar chain related genes in hepatocellular carcinoma based on network analysis and machine learning [D].Harbin:Harbin Institute of Technology,2019. [25] LEE D D,SEUNG H S.Learning the parts of objects by non-negative matrix factorization [J].Nature,1999,401(6755):788. [26] KONG W,MOU X Y,HU X H.Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data [J].BMC Bioinforma-tics,2011,12(5). [27] WANG Y,JIA Y D.Fisher non-negative matrix factorization for learning local features [C]//Asian Conference of Computer Vision.2004:27-30. [28] JIA Z L,ZHANG X,GUAN N Y,et al.Gene ranking of RNA-seq data via discriminant non-negative matrix factorization [J].PloS One,2015,10(9):e0137782. [29] GUYON I,WESTON J,BARNHILL S,et al.Gene selection for cancer classification using support vector machines [J].Machine Learning,2002,46(1/2/3):389-422. [30] ZHANG X G,LU X,SHI Q,et al.Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data[J].BMC Bioinformatics,2006,7(1):197. [31] WANG W,LIU H.Genetic algorithm and support vector machine-based gene microarray analysis [J].Journal of Clinical Rehabilitative Tissue Engineering Research,2010,14(17):3099-3103. [32] ASHBURNER M M,BALL C A C,BLAKE J A J,et al.Gene Ontology:tool for the unification of biology.The Gene Ontology Consortium [J].Nature Genetics,2000,25(1):25-29. [33] SIPKO V D,URMO V,ADRIAAN V D G,et al.Gene co-expression analysis for functional classification and gene-disease predictions [J].Briefings in Bioinformatics,2018,19(4):575-592. [34] ZHOU M,ZHAO H Q,XU W Y,et al.Discovery and validation of immune-associated long non-coding RNA biomarkers associated with clinically molecular subtype and prognosis in diffuse large B cell lymphoma [J].Molecular Cancer,2017,16(1):16. |
[1] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[2] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[3] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[4] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[5] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[6] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[7] | 李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208 |
[8] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[9] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[10] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[11] | 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮. 一种基于异质模型融合的 Android 终端恶意软件检测方法 Android Malware Detection Method Based on Heterogeneous Model Fusion 计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103 |
[12] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
[13] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
[14] | 李野, 陈松灿. 基于物理信息的神经网络:最新进展与展望 Physics-informed Neural Networks:Recent Advances and Prospects 计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158 |
[15] | 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江. 基于眼前节相干光断层扫描成像的核性白内障分类算法 Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image 计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085 |
|