Computer Science ›› 2015, Vol. 42 ›› Issue (9): 235-239.doi: 10.11896/j.issn.1002-137X.2015.09.045

CGDNA:An Ensemble De Novo Genome Assembly Algorithm Based on Clustering Graph

XU Kui, CHEN Ke, XU Jun, TIAN Jia-lin, LIU Hao and WANG Yu-fan   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The ultimate goal of genome sequencing is to determine the complete DNA sequence of an organism,which is the basis for genetic research and disease diagnosis.In general,genome sequencing can be divided into two steps:first,generating and determining the DNA fragments experimentally;second,assembling the fragments into full genome through computational method.Although the Sanger technology successfully resolves the human genome,it is replaced by the next generation of sequencing technology due to its high cost.The next generation of sequencing technology has the merits of high throughput,high coverage and low cost and accompanies with short reads and more errors as a byproduct,which brings more challenge to the assembly algorithms.Since it is reported that the assembly results by different algorithms are complementary and none of the assembly algorithms consistently outperforms the remaining algorithms,this study aimed at integrating the assembly results produced by multiple algorithms.In this study,we proposed an algorithm based on clustering graph.Through building index,mapping of reads,clustering of contigs and building of clustering graph,the proposed algorithm outperforms any of the single algorithm.The experimental results demonstrate that by implementing the CGDNA algorithm,two standard metrics(the largest scaffold and scaffold N50) are increased by 50% when compared to the state-of-the-art algorithms,i.e.,Velvet,ABySS,and SOAPdenovo.Moreover,the performance of CGDNA algorithm should be further improved when more base algorithms are added.The proposed algorithm largely improves the quality of assembly result,reduces the difficulty of genetic analysis and accelerates the genome research.

Key words: De novo genome assembly,Ensemble algorithm,Clustering graph,Indexing,Read mapping

