Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 35-39.doi: 10.11896/jsjkx.200600057

• Artificial Intelligence • Previous Articles     Next Articles

Transcriptome Analysis Method Based on RNA-Seq

GUO Mao-zu1,2, YANG Shuai1,2, ZHAO Ling-ling3   

  1. 1 School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China
    2 Beijing Key Laboratory of Intelligent Processing for Building Big Data,Beijing 100044,China
    3 School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:GUO Mao-zu,born in 1966,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine learning,smart city,bioinformatics,etc.
    ZHAO Ling-ling,born in 1980,Ph.D supervisor.Her main research interests include machine learning,smart city,bioinformatics,etc.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61532014,61871020),Key Project of Science and Technology Plan of Beijing Municipal Commission of Education (KZ201810016019),Beijing University High-level Innovation Team Building Plan Project (IDHT20190506),National Key R&D Program of China (2016YFC0901902-5) and 2020 Graduate Innovation Project (PG202005).

Abstract: RNA-Seq technology has become an important method of transcriptome analysis because of its advantages of low cost,high precision and wide coverage.It provides new +means for the study of gene expression patterns,disease biomarker detection,crop stress resistance research and molecular breeding.However,the massive data generated by RNA-Seq also brings challenges to data analysis.How to effectively process and analyze RNA-Seq data has become a hot topic in bioinformatics research.The paper introduces the transcriptome analysis process based on RNA-Seq technology,including RNA-Seq data preprocessing,differential expression analysis and high-level analysis.RNA-Seq data preprocessing is to perform quality control and quantitative calculations on the original sequencing data,and differential expression analysis is to screen genes,usually based on statistics or machine learning.High-level analysis is to further process the differential genes and determine gene function and regulatory network through enrichment analysis and other means.Finally,the development prospects of RNA-Seq-based transcriptome analysis me-thods are discussed.

Key words: Differential expression analysis, Enrichment analysis, Machine learning, RNA-Seq, Transcriptome analysis

CLC Number: 

  • TP391
[1] SANGER F,NICKLEN S,COULSON A R.DNA sequencingwith chain-terminating inhibitors [J].Proceedings of the National Academy of Sciences,1978,74(12):5463-5467.
[2] MARCEL M,MICHAEL E,ALTMAN W E,et al.Genome sequencing in microfabricated high-density picolitre reactors [J].Nature,2005,437:158-160.
[3] MUTZ K O,HEILKENBRINKER A,LÖNNE M,et al.Transcriptome analysis using next-generation sequencing [J].Current Opinion in Biotechnology,2013,24(1):22-30.
[4] MOROZOVA O,HIRST M,MARRA M A.Applications ofNew Sequencing Technologies for Transcriptome Analysis [J].Annual Review of Genomics & Human Genetics,2009,10(1):135-151.
[5] SEKHON R S,ROMAN B,HIRSCH C N,et al.Maize Gene Atlas Developed by RNA Sequencing and Comparative Evaluation of Transcriptomes Based on RNA Sequencing and Microarrays [J].Plos One,2013,8(4):e61005.
[6] WANG Z,GERSTEIN M,SNYDER M.RNA-Seq:a revolutionary tool for transcriptomics [J].Nature Reviews Genetics,2010,10(1):57-63.
[7] KUKURBA K R,MONTGOMERY S B.RNA Sequencing andAnalysis [J].Cold Spring Harbor Protocols,2015,2015(11):951.
[8] COCK P J,FIELDS C J.The Sanger FASTQ file format for sequences with quality scores,and the Solexa/Illumina FASTQ variants [J].Nucleic acids research,2010,38(6):1767-1771.
[9] TRAPNELL C,SALZBERG S L.How to map billions of short reads onto genomes [J].Nature Biotechnology,2009,27(5):455-457.
[10] SMITH T F,WATERMAN M S.Identification of common molecular subsequences [J].Journal of Molecular Biology,1981,147(1):195-197.
[11] WAGNER G P,KIN K,LYNCH V J.Measurement of mRNA abundance using RNA-seq data:RPKM measure is inconsistent among samples [J].Theory Biosci,2012,131(4):281-285.
[12] HWANG S G,KIM K H,LEE B M,et al.Transcriptome analysis for identifying possible gene regulations during maize root emergence and formation at the initial growth stage [J].Genes & Genomics,2018,40(7):755-766.
[13] SHI Y,JIANG H,FRANK E S.rSeqDiff:Detecting Differential Isoform Expression from RNA-Seq Data Using Hierarchical Likelihood Ratio Test [J].Plos One,2013,8(11):e79448.
[14] JOHN C M,CHRISTOPHER E M,SHRIKANT M M,et al.RNA-seq:an assessment of technical reproducibility and comparison with gene expression arrays [J].Genome Research,2008,18(9):1509-1517.
[15] SMYTH G K.edgeR:a Bioconductor package for differential expression analysis of digital gene expression data [J].Bioinformatics,2010,26(1):139.
[16] LOVE M I,HUBER W,ANDERS S.Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 [J].Genome Biology,2014,15(12):550.
[17] LUND S P,NETTLETON D,MCCARTHY D J,et al.Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates [J].Statistical Applications in Genetics & Molecular Biology,2012,11(5).
[18] REEB P,JUAN S.Evaluating statistical analysismodels forRNA sequencing experiments [J].Frontiers in Genetics,2013,4:178.
[19] CHEN Y K,HUSE S S,LIN L M.Differential expression ofp53,p63 and p73 proteins in human buccal squamous-cell carcinomas [J].Clinical Otolaryngology,2003,28(5):451-455.
[20] DAVID J,TORRES,JUDY L,et al.Self-Contained Statistical Analysis of Gene Sets[J].Plos One,2016,11(10):e0163918.
[21] FU R,WANG P,MA W P,et al.A statistical method for detecting differentially expressed SNVs based on next-generation RNA-seq data [J].Biometrics,2017,73(1):42-51.
[22] XU M Q,CHEN L.An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq [J].Briefings in Bioinformatics,2018,19(1):1.
[23] ZHAO X,DOU J,CAO J L,et al.Uncovering the potential differentially expressed miRNAs as diagnostic biomarkers for hepatocellular carcinoma based on machine learning in The Cancer Genome Atlas database [J].Oncology Reports,2020,43(6):1771-1784.
[24] BAI Y F.Screening of sugar chain related genes in hepatocellular carcinoma based on network analysis and machine learning [D].Harbin:Harbin Institute of Technology,2019.
[25] LEE D D,SEUNG H S.Learning the parts of objects by non-negative matrix factorization [J].Nature,1999,401(6755):788.
[26] KONG W,MOU X Y,HU X H.Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data [J].BMC Bioinforma-tics,2011,12(5).
[27] WANG Y,JIA Y D.Fisher non-negative matrix factorization for learning local features [C]//Asian Conference of Computer Vision.2004:27-30.
[28] JIA Z L,ZHANG X,GUAN N Y,et al.Gene ranking of RNA-seq data via discriminant non-negative matrix factorization [J].PloS One,2015,10(9):e0137782.
[29] GUYON I,WESTON J,BARNHILL S,et al.Gene selection for cancer classification using support vector machines [J].Machine Learning,2002,46(1/2/3):389-422.
[30] ZHANG X G,LU X,SHI Q,et al.Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data[J].BMC Bioinformatics,2006,7(1):197.
[31] WANG W,LIU H.Genetic algorithm and support vector machine-based gene microarray analysis [J].Journal of Clinical Rehabilitative Tissue Engineering Research,2010,14(17):3099-3103.
[32] ASHBURNER M M,BALL C A C,BLAKE J A J,et al.Gene Ontology:tool for the unification of biology.The Gene Ontology Consortium [J].Nature Genetics,2000,25(1):25-29.
[33] SIPKO V D,URMO V,ADRIAAN V D G,et al.Gene co-expression analysis for functional classification and gene-disease predictions [J].Briefings in Bioinformatics,2018,19(4):575-592.
[34] ZHOU M,ZHAO H Q,XU W Y,et al.Discovery and validation of immune-associated long non-coding RNA biomarkers associated with clinically molecular subtype and prognosis in diffuse large B cell lymphoma [J].Molecular Cancer,2017,16(1):16.
[1] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[4] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[5] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[6] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[7] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[8] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[9] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[10] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[11] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[12] XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.
[13] LI Ye, CHEN Song-can. Physics-informed Neural Networks:Recent Advances and Prospects [J]. Computer Science, 2022, 49(4): 254-262.
[14] YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
[15] ZHANG Xiao-qing, FANG Jian-sheng, XIAO Zun-jie, CHEN Bang, Risa HIGASHITA, CHEN Wan, YUAN Jin, LIU Jiang. Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image [J]. Computer Science, 2022, 49(3): 204-210.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!