摘要: 由于高通量测序技术产生了海量基因读段数据,并行的基因读段定位算法成为近年来的研究热点。对基因匹配算法进行研究,提出了一种基于MapReduce的基因读段定位改进算法,并且通过在读段定位过程中融入生物信息以及利用Hadoop分布式缓存机制,在一定程度上降低了算法的复杂度。在拟南芥菜基因数据集上进行的实验表明,该算法能够有效提高算法执行效率,减少算法执行时间。
[1] Jiang H,Wong W H.SeqMap:mapping massive amount of oligonucleotides to the genome[J].Bioinformatics,2008,24(20):2395-2396 [2] Langmead B,Trapnell C,Pop M.Ultrafast and memory-efficient alignment of short DNA sequences to the human genome[J].Genome Biol,2009,10(3):25 [3] Wang K,Singh D,Zeng Z.MapSplice:accurate mapping of RNA-seq reads for splice junction discovery[J].Nucleic Acids Res,2010,38(18):178 [4] 王曦,汪小我,王立坤,等.新一代高通量RNA测序数据的处理与分析[J].生物化学与生物物理进展,2010,37(8):834-846 Wang Xi,Wang Xiao-wo,Wang Li-kun,et al.A new generation of high-throughput RNA sequencing data processing and analysis[J].Progress in Biochemistry and Biophysics,2010,37(8):834-846 [5] Homer N,Merriman B,Nelson S F.BFAST:an alignment tool for large scale genome resequencing[J].PLoS One,2009,4(11):7767 [6] Smith T F,Waterman M S.Identification of common molecular subsequences[J].J Mol Biol,1981,147(1):195-197 [7] Dean J,Ghemawat S.MapReduce:Simplified data processing on large clusters[J].ACM,2008,1(1):137-150 [8] 杨晓亮.MapReduce并行计算应用案例及其执行框架性能优化研究[D].南京:南京大学,2012 Yang X L.The Application Case Study of MapReduce Parallel Computation and the Optimization of its Runtime Framework[D].Nanjing:Nanjing University,2012 [9] Schatz M C.CloudBurst:highly sensitive read mapping with Map-Reduce[J].Bioinformatics,2009,25(11):1363-1369 [10] 涂金金,杨明,郭丽娜.基于MapReduce的基因读段定位算法[J].模式识别与人工智能,2014,7(3):206-212 Tu J J,Yang M,Guo L N.Gene Read Mapping Algorithms Based on MapReduce[J].Pattern Recognition and Artificial Intelligence,2014,7(3):206-212 [11] 刘鹏.实战Hadoop:开启通向云计算的捷径[M].北京:电子工业出版社,2011 Liu Peng.Hadoop:open the shortcut to the cloud computing[M].Beijing:Electronics Industry Press,2011 [12] 王立坤.RNA-seq数据的处理与应用[D].吉林:吉林大学,2012 Wang Li-kun.Processing and Application of RNA-seq Data[D].Jilin:Jilin University,2012 |
No related articles found! |
|