Computer Science ›› 2019, Vol. 46 ›› Issue (5): 36-43.doi: 10.11896/j.issn.1002-137X.2019.05.005

Previous Articles     Next Articles

Survey on Sequence Assembly Algorithms in High-throughput Sequencing

ZHOU Wei-xing, SHI Hai-he   

  1. (College of Computer Information and Engineering,Jiangxi Normal University,Nanchang 330022,China)
  • Received:2018-08-09 Revised:2018-12-13 Published:2019-05-15

Abstract: High-throughput sequencing technology is a new sequencing method developed after the first generation sequencing technology,also known as next-generation sequencing technology.Different from the automatic and semi-automatic capillary sequencing method based on Sanger,the high-throughput sequencing technology adopts the parallel sequencing technology based on pyrosequencing.It not only conquers the shortcomings of high cost,low throughput and low speed of the first generation sequencing technology,but also meets the demands of the rapid development of modern molecular biology and genomics with low cost,high throughput and fast speed.Compared with the first generation sequencing data,high-throughput sequencing data are characterized by short lengths,uneven coverage and low accuracy,and the third-generation sequencing technology adopts more efficient single molecular real-time sequencing and Nanopore sequencing technology as well as the principle of sequencing and synthesis,which has the advantages of high throughput,low cost and long sequencing data.Therefore,in order to obtain complete genome sequence,a technique is needed to assemble short sequencing reads into a complte single-stranded sequence of genes.In this case, the sequence assembly algorithm was proposed.Firstly,the development background of sequence assembly algorithms and the related concepts of high-throughput sequencing technology were introduced,and the advantages of high-throughput sequencing technology on sequence assembly were analyzed.Secondly,by summarizing the development of sequence assembly algorithms.The sequence assembly algorithms were illustrated,according to the algorithm classifications,respectively,by greedy strategy,Overlap-Layout-Consensus (OLC) strategy and De Bruijn Graph (DBG) strategy.Finally,the research direction and development trend of sequence assembly algorithms were discussed.

Key words: De bruijn graph, Greedy, High-throughput sequencing, Overlap-layout-consensus, Sequence assembly algorithms

CLC Number: 

  • TP301.6
[1]WARD R M,SCHMIEDER R,HIGHNAM G,et al.Big data challenges and opportunities in high-throughput sequencing[J].Systems Biomedicine,2013,1(1):29-34.
[2]PEARSON W R Y,LIPMAN D J.Improved tools for biological sequence analysis[J].Proceedings of the National Academy of Sciences of the United Sates of America,1988,85(46):16138-16143.
[3]ALTSCHUL S F,MADDEN T L,SCHÄFFER A A,et al.
Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[J].Nucleic Acids Research,1997,25(17):3389-3402.
[4]LARKIN M A,BLACKSHIELDS G,BROWN N P,et al.Clus-tal W and Clustal X version 2.0[J].Bioinformatics,2007,23(21):2947-2948.
[5]DARLING A E,MAU B,PERNA N T.progressive Mauve:Mul-tiple Genome Alignment with Gene Gain,Loss and Rearrangement[J].PLOS One,2010,5(6):e11147.
[6]BLANCHETTE M,KENT W J,RIEMER C,et al.Aligningmultiple genomic sequences with the threaded blockset aligner[J].Genome Research,2004,14(4):708-715.
[7]CANTOR C R.Orchestrating the Human Genome Project[J].Science,1990,248(4951):49-51.
[8]CONSORTIUM T G P.A global reference for human genetic variation,the 1000 Genomes Project Consortium[J].Nature,2015,526:68-74.
[9]CONSORTIUM T U.The UK10K project identifies rare va-riants in health and disease[J].Nature,2015,526(7571):82-90.
[10]WATSON M.Illuminating the future of DNA sequencing[J].Genome Biology,2014,15(2):108.
[11]Sanger F,Nicklen S,Coulson A R.DNA sequencing with chain-terminating inhibitors[J].Proceedings of the National Academy of Sciences of the United States of America,1977,74(12):5463-5467.
[12]MOROZOVA O.Applications of next-generation sequencingtechnologies in functional genomics[J].Genomics,2008,92(5):255-264.
[13]WOOLEY J C,GODZIK A,FRIEDBERG I.A Primer on Meta-genomics[J].PLOS Computational Biology,2010,6(2):e1000667.
[14]WU X,ZHU X,WU G Q,et al.Data Mining with Big Data[J].IEEE Transactions on Knowledge & Data Engineering,2014,26(1):97-107.
[15]FONSECA N A,RUNG J,BRAZMA A,et al.Tools for mapping high-throughput sequencing data[J].Bioinformatics,2012,28(24):3169-3177.
[16]NIEDRINGHAUS T P,MILANOVA D,KERBY M B,et al.Landscape of Next-Generation Sequencing Technologies[J].Analytical Chemistry,2011,83(12):4327-4341.
[17]HARRIS T D,BUZBY P R,BABCOCK H,et al.Single-molecule DNA sequencing of a viral genome[J].Science,2008,320(5872):106-109.
[18]FLUSBERG B A,WEBSTER D,LEE J,et al.Direct detection of DNA methylation during single-molecule,real-time sequencing[J].Nature Methods,2010,7(6):461-465.
[19]RUSK N.Cheap third-generation sequencing[J].Nature Me-thods,2009,6(4):244-244.
[20]CHIN C S,PELUSO P,SEDLAZECK F J,et al.Phased Diploid Genome Assembly with Single Molecule Real-Time Sequencing[J].Nature Methods,2016,13(12):1050-1054.
[21]HEEREMA S J,DEKKER C.Graphene nanodevices for DNA sequencing[J].Nature Nanotechnology,2016,11(2):127-136.
[22]HEATHER J M,CHAIN B.The sequence of sequencers:The history of sequencing DNA[J].Genomics,2016,107(1):1-8.
[23]SHENDURE J,JI H.Next-generation DNA sequencing[J].Nature Biotechnology,2008,26(10):1135-1145.
[24]JACKMAN S D,VANDERVALK B P,MOHAMADI H,et al.ABySS 2.0:resource-efficient assembly of large genomes using a Bloom filter[J].Genome Research,2017,27(5):768-777.
[25]ZIMIN A V,STEVENS K A,CREPEAU M W,et al.An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing[J].GIGA Science,2017,6(1):1-4.
[26]MOSTOVOY Y,LEVYSAKIN M,LAM J,et al.A hybrid approach for de novo human genome sequence assembly and phasing[J].Nature Methods,2016,13(7):587-590.
[27]LIU Y,BERTIL S,MASKELL D L.Parallelized short read assembly of large genomes using de Bruijn graphs[J].BMC Bioinformatics,2011,12(1):354.
[28]WARREN R L,SUTTON G G,JONES S J M,et al.Assembling millions of short DNA sequences using SSAKE[J].Bioinforma-tics,2007,23(4):500-501.
[29]JECK W R,REINHARDT J A,BALTRUS D A,et al.Extending assembly of short DNA sequences to handle error[J].Bioinformatics,2007,23(21):2942-2944.
[30]DOHM J C,LOTTAZ C,BORODINA T,et al.SHARCGS,a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing[J].Genome Research,2007,17(11):1697-1706.
[31]FARRANT G K,HOEBEKE M,PARTENSKY F,et al.WiseScaffolder:an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data[J].BMC Bioinformatics,2015,16(1):281.
[32]CAO M D,NGUYEN S H,GANESAMOORTHY D,et al.Scaffolding and completing genome assemblies in real-time withna-nopore sequencing[J].Nature Communications,2017,8:14515.
[33]HIEU T N,ZIAUR R M,HE L,et al.Complete De NovoAssembly of Monoclonal Antibody Sequences[J].Scientific Reports,2016,6:31730.
[34]MIN L,LIAO Z,HE Y,et al.ISEA:Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2017,14(4):916-925.
[35]MYERS E W,SUTTON G G,DELCHER A L,et al.A Whole-Genome Assembly of Drosophila[J].Science,2000,287(5461):2196-2204.
[36]DE L B M,MCCOMBIE W R.Assembling genomic DNA se-quences with PHRAP[J].Current Protocols in Bioinformatics,2007,17(1):11.4.1-11.4.15.
[37]MARGULIES M,EGHOLM M,ALTMAN W E,et al.Genome sequencing in microfabricated high-density picolitre reactors[J].Nature,2005,437(7057):376-380.
[38]KAMATH G M,SHOMORONY I,XIA F,et al.HINGE:long-read assembly achieves optimal repeat resolution[J].Genome Research,2017,27(5):747-756.
[39]MYERS G.Efficient Local Alignment Discovery amongst Noisy Long Reads[M]∥Algorithms in Bioinformatics.Berlin:Sprin-ger,2014:52-67.
[40]KOREN S,WALENZ B P,BERLIN K,et al.Canu:scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J].Genome Research,2017,27(5):722-736.
[41]YE C,HILL C M,WU S,et al.DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies[J].Scientific Reports,2016,6(X):31900.
[42]YE C,MA Z S,CANNON C H,et al.Exploiting sparseness in de novo genome assembly[J].BMC Bioinformatics,2012,13(S6):S1.
[43]LI H.Minimap and miniasm:fast mapping and de novo assembly for noisy long sequences[J].Bioinformatics,2015,32(14):2103-2110.
[44]VASER R,SOVIC' I,NAGARAJAN N,et al.Fast and accurate de novo genome assembly from long uncorrected reads[J].Genome Research,2017,27(5):737-746.
[45]JANSEN H J,LIEM M,JONGRAADSEN S A,et al.Rapid de novo assembly of the European eel genome from nanopore sequencing reads[J].Scientific Reports,2017,7(1):7213.
[46]BAAIJENS J A, AZE A, RIVALS E,et al.De novo assembly of viral quasispecies using overlap graphs[J].Genome Research,2017,27(5):835-848.
[47]PENG G,JI P,ZHAO F.A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes[J].Genome Biology,2016,17(1):232.
[48]CAMERON D L,SCHROEDER J,PENINGTON J S,et al.
GRIDSS:sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly[J].Genome Research,2017,27(12):1-11.
[49]WEISENFELD N I,KUMAR V,SHAH P,et al.Direct determination of diploid genome sequences[J].Genome Research,2017,27(5):757-767.
[50]BUTLER J,MACCALLUM I,KLEBER M,et al.ALLPATHS:de novo assembly of whole-genome shotgun microreads[J].Genome Research,2008,18(5):810-820.
[51]PEVZNER P A,TANG H,WATERMAN M S.An Eulerianpath approach to DNA fragment assembly[J].Proceedings of the National Academy of Sciences of the United Sates of America,2001,98(17):9748-9753.
[52]GNERRE S,JAFFE D B.High-quality draft assemblies of mammalian genomes from massively parallel sequence data[J].Proceedings of the National Academy of Sciences of the United Sates of America,2011,108(4):1513-1518.
[53]LI R,ZHU H,RUAN J,et al.De novo assembly of human genomes with massively parallel short read sequencing[J].Genome Research,2010,20(2):265-272.
[54]LUO R,LIU B,XIE Y,et al.SOAPdenovo2:an empirically improved memory-efficient short-read de novo assembler[J].GIGA Science,2012,1(1):18.
[55]XIE Y,WU G,TANG J,et al.SOAPdenovo-Trans:de novotranscriptome assembly with short RNA-Seq reads[J].Bioinformatics,2014,30(12):1660-1666.
[56]PENG Y,LEUNG H C M,YIU S M,et al.IDBA-UD:a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth[J].Bioinformatics,2012,28(11):1420-1428.
[57]CHITSAZ H,YEE-GREENBAUM J L,TESLER G,et al.Efficient de novo assembly of single-cell bacterial genomes from short-read data sets[J].Nature Biotechnology,2011,29(10):915-921.
[58]PENG Y,LEUNG H C M,YIU S M,et al.IDBA-tran:a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels[J].Bioinformatics,2013,29(13):326-334.
[59]NURK S,MELESHKO D,KOROBEYNIKOV A,et al.metaSPAdes:a new versatile metagenomics assembler[J].Genome Research,2017,27(5):824-834[60]BANKEVICH A,NURK S,ANTIPOV D,et al.SPAdes:A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing[J].Journal of Computational Biology,2012,19(5):455-477.
[61]PRJIBELSKI A D,VASILINETC I,BANKEVICH A,et al.ExSPAnder:a universal repeat resolver for DNA fragment assembly[J].Bioinformatics,2014,30(12):293-301.
[62]SAFONOVA Y,BANKEVICH A,PEVZNER P A.dipSPAdes:Assembler for Highly Polymorphic Diploid Genomes[C]∥International Conference on Research in Computational Molecular Biology.New York:Springer-Verlag,2014:265-279.
[63]ANTIPOV D,KOROBEYNIKOV A,MCLEAN J S,et al.hybridSPAdes:an algorithm for hybrid assembly of short and long reads[J].Bioinformatics,2015,32(7):1009-1015.
[64]VASILINETC I,PRJIBELSKI A D,GUREVICH A,et al.Assembling short reads from jumping libraries with large insert sizes[J].Bioinformatics,2015,31(20):3262-3268.
[1] ZHANG Chong-yu, CHEN Yan-ming, LI Wei. Task Offloading Online Algorithm for Data Stream Edge Computing [J]. Computer Science, 2022, 49(7): 263-270.
[2] LIU Zhang-hui, ZHENG Hong-qiang, ZHANG Jian-shan, CHEN Zhe-yi. Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems [J]. Computer Science, 2022, 49(6A): 619-627.
[3] LI Xiao-dong, YU Zhi-yong, HUANG Fang-wan, ZHU Wei-ping, TU Chun-yu, ZHENG Wei-nan. Participant Selection Strategies Based on Crowd Sensing for River Environmental Monitoring [J]. Computer Science, 2022, 49(5): 371-379.
[4] ZHANG Xin-ming, LI Shuang-qian, LIU Yan, MAO Wen-tao, LIU Shang-wang, LIU Guo-qi. Coyote Optimization Algorithm Based on Information Sharing and Static Greed Selection [J]. Computer Science, 2020, 47(5): 217-224.
[5] SUN Zhi-qiang, WAN Liang, DING Hong-wei. Android Malware Detection Method Based on Deep Autoencoder Network [J]. Computer Science, 2020, 47(4): 298-304.
[6] HU Jun-qin, ZHANG Jia-jun, HUANG Yin-hao, CHEN Xing, LIN Bing. Computation Offloading Scheduling Technology for DNN Applications in Edge Environment [J]. Computer Science, 2020, 47(10): 247-255.
[7] LIAO Yong, YANG Xin-yi, XIA Mao-han, WANG Bo, LI Shou-zhi, SHEN Xuan-fan. Improved Tomlinson-Harashima Precoding Based on Greedy Algorithm in High-speed Mobile Scenarios [J]. Computer Science, 2019, 46(8): 121-126.
[8] HUANG De-ling,YAN Yu-song,PENG Da-qin. Geographic Routing Protocol Based on Prediction for Urban Vehicular Ad Hoc Networks [J]. Computer Science, 2019, 46(7): 74-80.
[9] LI Zhuo, XU Zhe, CHEN Xin, LI Shu-qin. Location-related Online Multi-task Assignment Algorithm for Mobile Crowd Sensing [J]. Computer Science, 2019, 46(6): 102-106.
[10] ZHENG Fei-feng, JIANG Juan, MEI Qi-huang. Study on Stowage Optimization in Minimum Container Transportation Cost [J]. Computer Science, 2019, 46(6): 239-245.
[11] YU Jian-jun, WU Chun-ming. Offline Static Virtual Network Mapping Algorithm Based on Tabu Search Genetic Optimization [J]. Computer Science, 2019, 46(12): 114-119.
[12] GAO Si-qi,XING Yu-xuan,XIAO Nong,LIU Fang. Greedy Frog Leaping Algorithm for 01 Knapsack Problem [J]. Computer Science, 2018, 45(7): 73-77.
[13] CHEN Jin-yin, HU Ke-ke,LI Yu-wei. Research on UAV Multi-point Navigation Algorithm Based on MB-RRT* [J]. Computer Science, 2018, 45(6A): 85-90.
[14] ZHU Rui-chao,QIAN Wen-hua,PU Yuan-yuan, XU Dan. Texture Synthesis Based on Self-similarity Matching [J]. Computer Science, 2018, 45(6A): 215-219.
[15] HU Qing-cheng, ZHANG Yong, XING Chun-xiao. K-clique Heuristic Algorithm for Influence Maximization in Social Network [J]. Computer Science, 2018, 45(6): 32-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!