计算机科学 ›› 2015, Vol. 42 ›› Issue (8): 82-85.

• 2014’江苏省人工智能学术会议 • 上一篇    下一篇

基于MapReduce的基因读段定位改进算法

涂金金,杨明,郭丽娜   

  1. 南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61272222,61003116),江苏省自然科学基金重点重大专项(BK2011005),江苏省自然科学基金(BK2011782),江苏省普通高校研究生科研创新计划项目(CXLX12_0415)资助

Improved Gene Read Mapping Algorithm Based on MapReduce

TU Jin-jin, YANG Ming and GUO Li-na   

  • Online:2018-11-14 Published:2018-11-14

摘要: 由于高通量测序技术产生了海量基因读段数据,并行的基因读段定位算法成为近年来的研究热点。对基因匹配算法进行研究,提出了一种基于MapReduce的基因读段定位改进算法,并且通过在读段定位过程中融入生物信息以及利用Hadoop分布式缓存机制,在一定程度上降低了算法的复杂度。在拟南芥菜基因数据集上进行的实验表明,该算法能够有效提高算法执行效率,减少算法执行时间。

关键词: 读段定位,MapReduce,SeqMap

Abstract: Parallel read mapping algorithms become a hotspot in recent years,since the high-throughput sequence technology generates massive reads.Genetic matching algorithm was studied and an improved gene read mapping algorithm which could reduce the complexity of the algorithm by using Hadoop distributed cache mechanism and integrating biological information was proposed.The experimental results on the Arabidopsis gene data sets show that the proposed improved algorithm can effectively improve the algorithm efficiency and reduce the algorithm running time.

Key words: Read mapping,MapReduce,SeqMap

[1] Jiang H,Wong W H.SeqMap:mapping massive amount of oligonucleotides to the genome[J].Bioinformatics,2008,24(20):2395-2396
[2] Langmead B,Trapnell C,Pop M.Ultrafast and memory-efficient alignment of short DNA sequences to the human genome[J].Genome Biol,2009,10(3):25
[3] Wang K,Singh D,Zeng Z.MapSplice:accurate mapping of RNA-seq reads for splice junction discovery[J].Nucleic Acids Res,2010,38(18):178
[4] 王曦,汪小我,王立坤,等.新一代高通量RNA测序数据的处理与分析[J].生物化学与生物物理进展,2010,37(8):834-846 Wang Xi,Wang Xiao-wo,Wang Li-kun,et al.A new generation of high-throughput RNA sequencing data processing and analysis[J].Progress in Biochemistry and Biophysics,2010,37(8):834-846
[5] Homer N,Merriman B,Nelson S F.BFAST:an alignment tool for large scale genome resequencing[J].PLoS One,2009,4(11):7767
[6] Smith T F,Waterman M S.Identification of common molecular subsequences[J].J Mol Biol,1981,147(1):195-197
[7] Dean J,Ghemawat S.MapReduce:Simplified data processing on large clusters[J].ACM,2008,1(1):137-150
[8] 杨晓亮.MapReduce并行计算应用案例及其执行框架性能优化研究[D].南京:南京大学,2012 Yang X L.The Application Case Study of MapReduce Parallel Computation and the Optimization of its Runtime Framework[D].Nanjing:Nanjing University,2012
[9] Schatz M C.CloudBurst:highly sensitive read mapping with Map-Reduce[J].Bioinformatics,2009,25(11):1363-1369
[10] 涂金金,杨明,郭丽娜.基于MapReduce的基因读段定位算法[J].模式识别与人工智能,2014,7(3):206-212 Tu J J,Yang M,Guo L N.Gene Read Mapping Algorithms Based on MapReduce[J].Pattern Recognition and Artificial Intelligence,2014,7(3):206-212
[11] 刘鹏.实战Hadoop:开启通向云计算的捷径[M].北京:电子工业出版社,2011 Liu Peng.Hadoop:open the shortcut to the cloud computing[M].Beijing:Electronics Industry Press,2011
[12] 王立坤.RNA-seq数据的处理与应用[D].吉林:吉林大学,2012 Wang Li-kun.Processing and Application of RNA-seq Data[D].Jilin:Jilin University,2012

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!