计算机科学 ›› 2018, Vol. 45 ›› Issue (12): 308-312.doi: 10.11896/j.issn.1002-137X.2018.12.049

• 交叉与前沿 • 上一篇    

基于RNA-Seq数据集的转录组从头拼接算法

武思文, 李静, 张少强   

  1. (天津师范大学计算机与信息工程学院 天津300387)
  • 收稿日期:2017-10-30 出版日期:2018-12-15 发布日期:2019-02-25
  • 作者简介:武思文(1994-),女,硕士生,CCF会员,主要研究方向为生物信息计算;李 静(1990-),女,硕士,主要研究方向为生物信息计算;张少强(1976-),男,教授,主要研究方向为生物信息计算,E-mail:zhangshaoqiang@tjnu.edu.cn(通信作者)。
  • 基金资助:
    本文受国家自然科学基金(61572358),天津自然科学基金(16JCYBJC23600)资助。

De Novo Transcriptome Assembly Algorithm Based on RNA-Seq Datasets

WU Si-wen, LI Jing, ZHANG Shao-qiang   

  1. (College of Computer and Information Engineering,Tianjin Normal University,Tianjin 300387,China)
  • Received:2017-10-30 Online:2018-12-15 Published:2019-02-25

摘要: 转录组拼接是基因组测序与功能注解问题的一个重要组成部分。为了提高转录组拼接的精度和效率,文中提出了一种新的转录组从头拼接算法StepLink。该算法的主要创新点是提出了最左k-mer(长度为k的短序)和右k-mer的概念,并运用双重哈希表来存储相邻的每对k-mer,使得拼接更加迅速、准确。应用该算法对SRA数据库中人、狗和老鼠的测序数据分别进行拼接,结果表明该算法比其他已有算法更高效。

关键词: K-mer, RNA-Seq, 从头拼接算法, 转录组

Abstract: Transcriptome assembly is an important part of genome sequencing and function annotations.To improve the precision and efficiency of transcriptome assembly,this paper presented a new de novo transcriptome assembly algorithm called StepLink.The main innovations of this algorithm are presenting two concepts,namely leftmost k-mer (short sequence of length k) and right k-mer,and using the hash of hashes table to store the k-mer pairs,which makes the assembly more quickly and accurately.This algorithm was used to assemble the datasets of human,dog and mouse in the SRA databases respectively.The experimental results suggest that the proposed algorithm has higher efficiency than other existing algorithms.

Key words: De novo assembly algorithm, K-mer, RNA-Seq, Transcriptome

中图分类号: 

  • TP301.6
[1]YU A M.Research on the sugar and terpenoid metabolism du-ring the AmomumvillosumLour.fruit development using RNA-Seq [D].Guangzhou:Guangzhou University of Chinese Medicine,2014.(in Chinese)
于安民.基于RNA-Seq的阳春砂果实发育过程中糖和萜类代谢的研究[D].广州:广州中医药大学,2014.
[2]QI Y X,LIU Y B,RONG W H.RNA-Seq and its applications:a new technology for transcriptome [J].Herditas,2011,33 (11):1191-1202.(in Chinese)
祁云霞,刘永斌,荣威恒.转录组研究新技术:RNA-Seq及其应用[J].遗传,2011,33(11):1191-1202.
[3]LU Z Y.Research on assembly algorithm for next new generation sequencing technology [D].Nanjing:Southeast University,2011.(in Chinese)
卢志远.面向新一代测序技术的拼接算法研究[D].南京:东南大学,2011.
[4]PERTEA G.Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching du-ring cell differentiation [J].Nature Biotechnology,2010,28(5):511-515.
[5]MINGFU S,CARL K.Accurate assembly of transcripts through phase-preserving graph decomposition [J].Nature Biotechnology,2017,35(12):1167-1169.
[6]LIU J T,YU T,JIANG T,et al.TransComb:genome-guided transcriptome assembly via combing junctions in splicing graphs [J].Genome Biology,2016,17(1):213.
[7]PERTEA M,PERTEA GM,ANTONESCU C M,et al.StringTie enables improved reconstruction of a transcriptome from RNA-Seq reads [J].Nature Biotechnology,2015,33(3):290-295.
[8]MARETTY L,SIBBESEN J A,KROGH A,et al.Bayesiantranscriptome assembly [J].Genome Biology,2014,15(10):501.
[9]SCHULZ M H,ZERBINO D R,Vingron M,et al.Oases:robust de novo RNA-seq assembly across the dynamic range of expression levels [J].Bioinformatics,2012,28(8):1086-1092.
[10]XIE Y,WU G,TANG J,et al.SOAPdenovo-Trans:de novotranscriptome assembly with short RNA-Seq reads [J].Bioinformatics,2014,30(12):1660.
[11]PENG Y,LEUNG H C,YIU S M,et al.IDBA-Tran:a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels [J].Bioinformatics,2013,29(13):326-334.
[12]GRABHERR M G,HAAS B J,YASSOUR M,et al.Trinity:reconstructing a full-length transcriptome without a genome from RNA-Seq data [J].Nature Biotechnology,2011,29(7):644-652.
[13]CHANG Z.De novo transcriptome assembly from RNA-Seq[D].Jinan:Shandong University,2014.(in Chinese)
常征.基于RNA测序技术的转录组从头拼接算法研究[D].济南:山东大学,2014.
[14]ZHENG C,LI G,LIU J,et al.Bridger:a new framework for de novo transcriptome assembly using RNA-seq data [J].Genome Biology,2015,16(1):30.
[15]XIONG X J.Introduction to NCBI’s SRA database [J].Chemistry of Life,2010(6):959-963.(in Chinese)
熊筱晶.NCBI高通量测序数据库SRA介绍[J].生命的化学,2010(6):959-963.
[1] 郭茂祖, 杨帅, 赵玲玲.
基于RNA-Seq的转录组分析方法
Transcriptome Analysis Method Based on RNA-Seq
计算机科学, 2020, 47(11A): 35-39. https://doi.org/10.11896/jsjkx.200600057
[2] 董改芳,付学良,李宏慧.
多序列星比对算法的改进及其在Spark中的并行化研究
Improvement of Multiple Sequence Center Star Method and Its Parallelization in Spark
计算机科学, 2017, 44(10): 55-58. https://doi.org/10.11896/j.issn.1002-137X.2017.10.010
[3] 王磊 张祖平 陈建二.
DNA片段拼接中重复序列算法研究

计算机科学, 2006, 33(7): 164-166.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!