计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 75-79.doi: 10.11896/j.issn.1002-137X.2017.01.014

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

基于枚举策略的三倍体个体单体型重建算法

张倩,吴璟莉   

  1. 广西师范大学计算机科学与信息工程学院 桂林541004,广西师范大学计算机科学与信息工程学院 桂林541004;广西师范大学广西多源信息挖掘与安全重点实验室 桂林541004;广西区域多源信息集成与智能处理协同创新中心 桂林541004
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61363035,61502111),广西自然科学基金项目(2015GXNSFAA139288,2013GXNSFBA019263,2012GXNSFAA053219),“八桂学者”工程专项,广西多源信息挖掘与安全重点实验室系统性研究基金项目(14-A-03-02,15-A-03-02)资助

Triploid Individual Haplotype Reconstruction Algorithm Based on Enumeration Strategy

ZHANG Qian and WU Jing-li   

  • Online:2018-11-13 Published:2018-11-13

摘要: 求解三倍体个体单体型对于探索三倍体物种的遗传特性和表型差异等方面的研究具有重要的推动作用。针对带基因型信息的最少错误更正(MEC/GI)模型,提出了一种基于枚举策略的三倍体个体单体型重建算法EHTR。该算法依次重建3条单体型上的每一个单核苷酸多态性位点取值,对于给定位点,首先根据其基因型取值枚举该位点的3种单体型取值情况,然后选择片段支持度最高的取值作为该位点的重建值,算法的总时间复杂度为O(mn+mlogm+cnl)。采用CELSIM和MetaSim两种测序片段模拟生成器生成实验测试数据,在片段覆盖率、错误率、单片段长度、单体型长度和单体型海明距离等参数的不同设置下,对算法EHTR,GTIHR,W-GA和Q-PSO的重建率和运行时间进行对比分析。实验结果显示,算法EHTR在不同的参数设置下均能以更短的运行时间获得更高的重建率。

关键词: 序列分析,三倍体,单体型,基因型,最少错误更正,枚举

Abstract: Determining triploid individual haplotype plays an important role in promoting the study of exploring genetic traits and phenotypic differences of triploid species.In this paper,based on the minimum error correction with genotype information (MEC/GI) model,an enumeration strategy based enumeration haplotyping triploid (EHTR) algorithm was proposed for solving triploid individual haplotype reconstruction problem.The EHTR algorithm reconstructs the SNP sites of the three haplotypes one after another.When reconstructing a given SNP site,it enumerates three kinds of SNP values in terms of the genotype of the site,and chooses one with the most high support degree coming from the SNP fragments that are covering the corresponding SNP site.The total time complexity is O(mn+mlogm+cnl).In the experiments,two kinds of simulators CELSIM and MetaSim were invoked to generate SNP fragments.The reconstruction rate and running time were compared and analyzed among algorithms EHTR,GTIHR,W-GA and Q-PSO with different parameters setting,such as fragment coverage,error rate,single fragment length,haplotype length and haplotype hamming distance.Under different parameter setting,the EHTR algorithm can obtain higher reconstruction rate in shorter running time,which is proved by a number of experiments.

Key words: Sequence analysis,Triploid,Haplotype,Genotype,Minimum error correction,Enumeration

[1] AYEH K O.Expressed sequence tags(ESTs) and single nucleotide polymorphisms(SNPs):Emerging molecular marker tools for improving agronomic traits in plant biotechnology[J].African Journal of Biotechnology,2008,7(4):331-341.
[2] OLLITRAULT P,TEROL J,GARCIA-LOR A,et al.SNP mi-ning in C.clementina BAC and sequences:transferability in the Citrus genus(Rutaceae),phylogenetic inferences and perspectives for genetic mapping [J].BMC Genomics,2012,13(13):2-9.
[3] CUENCA J,ALEZA P,NAVARRO L,et al.Assignment ofSNP allelic configuration in polyploids using competitive allele-specific PCR:application to citrus triploid progeny [J].Annals of Botany,2013,111(4):731-742.
[4] HAYWARD A,MASON A S,DALTON-MORGAN J,et al.SNP discovery and applications in Brassica napus [J].Journal of Plant Biotechnology,2012,39(1):49-61.
[5] ADESOYE A,MMEKA E,VROH B.Single Nucleotide Polymorphism Markers Discovery in Musa Spp(Plantain Landraces,AAB Genome) and its Potentials for Use in Gibberellic Acid and Parthenocarpy Trait Mapping [J].Journal of Plant Molecular Biology & Biotechnology,2012,3(1):9-21.
[6] WANG R S,WU L Y,LI Z P,et al.Haplotype reconstruction from SNP fragments by minimum error correction [J].Bioinformatics,2005,21(10):2456-2462.
[7] LI Z P,WU L Y,ZHAO Y Y,et al.A dynamic programming algorithm for the k- haplotyping problem [J].Acta Mathematicae Applicate Sinica,English Series,2006,22(3):405-412.
[8] WU J L,WANG J X,CHEN J E.A parthenogenetic algorithm for single individual SNP haplotyping [J].Engineering Applications of Artificial Intelligence,2009,22(3):401-406.
[9] QIAN W Y,YANG Y J,YANG N N,et al.Particle swarm optimization for SNP haplotype reconstruction problem [J].Applied Mathematics and Computation,2008,196(1):266-272.
[10] WU Jing-li,WANG Zhao-can.Genetic Algorithm for SolvingTriploid Individual Haplotype Reconstruction Problem [J].Journal of Chinese Computer Systems,2014,35(4):840-844.(in Chinese) 吴璟莉,王兆灿.求解三倍体个体单体型重建问题的遗传算法[J].小型微型计算机系统,2014,35(4):840-844.
[11] WU Jing-li.Research on the combinatorial optimization problem in detection of genetic diversities [D].Changsha:Central South University,2008.(in Chinese) 吴璟莉.遗传多态性检测中组合优化问题的研究[D].长沙:中南大学,2008.
[12] LIPPERT R,SCHWARTZA R,LANCIA G,et al.Algorithmic strategies for the SNPs haplotype assembly problem [J].Brie-fings in Bioinformatics,2002(3):23-31.
[13] FILIPPO G.A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem [J].Bioinformatics,2010,26(18):2217-2225.
[14] MYERS G.A dataset generator for whole genome shotgun sequencing [C]∥Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology.California:AAAI Press,1999:202-210.
[15] RICHTER D C,OTT F,et al.MetaSim-A Sequencing Simulator for Genomics and Metagenomics [J].PLOS ONE,2008,3(10):e3373.
[16] PANCONESI A,SOZIO M.Fast hare:a fast heuristic for single individual SNP haplotype reconstruction[C]∥Proceedings of 4th Workshop on Algorithms in Bioinformatics.Heiderberg:Springer-Verlag,2004:266-277.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!