Computer Science ›› 2017, Vol. 44 ›› Issue (1): 80-83.doi: 10.11896/j.issn.1002-137X.2017.01.015

Previous Articles     Next Articles

Integrated Feature Mining Based Approach for Calling Genomic Deletions

ZHANG Xiao-dong, LING Cheng and GAO Jing-yang   

  • Online:2018-11-13 Published:2018-11-13

Abstract: With the application and development of next generation sequencing technology,methods of calling genomic deletions based on sequencing have proliferated.However,using a single method to call deletions has limitation in application and insufficiency of precision and sensitivity.To solve these problems,an integrated approach for calling deletions was proposed based on feature mining according to combining multiple theory and machine learning algorithm.First,different callers are used for calling deletions.These results are merged as aninitial result set of deletions.Then,according to variety of detection strategies,features of the initial result set of deletions are extracted based on next generation sequencing data.Finally,to obtain the final result set of calling deletions,a machine learning model is trained to distinguish false positive deletions from initial call set.The experimental results show that compared with a single caller such as Pindel and SVseq2,the proposed approach has higher precision and sensitivity simultaneously.Compared with directly merging multiple deletion call sets,the proposed approach can significantly improve the precision with slight loss of sensitivity.

Key words: Deletion,Feature mining,Integrated detection

[1] EICHLER E E,NICKERSON D A,ALTSHULER D,et al.Completing the map of human genetic variation[J].Nature,2007,447(7141):161-165.
[2] CONRAD D F,PINTO D,REDON R,et al.Origins and functional impact of copy number variation in the human genome[J].Nature,2010,464(7289):704-712.
[3] PAK C H,DANKO T,ZHANG Y,et al.Human neuropsychia-tric disease modeling using conditional deletion reveals synaptic transmission defects caused by heterozygous mutations in NRXN1[J].Cell Stem Cell,2015,17(3):316-328.
[4] LEE M Y,WON H S,BAEK J W,et al.Variety of prenatally diag-nosed congenital heart disease in 22q11.2 deletion syndrome[J].Obstetrics & Gynecology Science,2014,57(1):11-16.
[5] ALKAN C,COE B P,EICHLER E E.Genome structural variation discovery and genotyping[J].Nature Reviews Genetics,2011,12(5):363-376.
[6] YE K,SCHULZ M H,LONG Q,et al.Pindel:a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads[J].Bioinformatics,2009,25(21):2865-2871.
[7] ZHANG J,WANG J,WU Y.An improved approach for accu-rate and efficient calling of structural variations with low-coverage sequence data[J].BMC Bioinformatics,2012,13(Suppl 6):1-11.
[8] RAUSCH T,ZICHNER T,SCHLATTL A,et al.DELLY:st-ructural variant discovery by integrated paired-end and split-read analysis[J].Bioinformatics,2012,28(18):i333-i339.
[9] CHEN K,WALLIS J W,MCLELLAN M D,et al.BreakDancer:an algorithm for high-resolution mapping of genomic structural variation[J].Nature Methods,2009,6(9):677-681.
[10] ABYZOV A,URBAN A E,SNYDER M,et al.CNVnator:anapproach to discover,genotype,and characterize typical and atypical CNVs from family and population genome sequencing[J].Genome Research,2011,21(6):974-984.
[11] HORMOZDIARI F,HAJIRASOULIHA I,DAO P,et al.Next-generation Variation Hunter:combinatorial algorithms for transposon insertion discovery[J].Bioinformatics,2010,26(12):i350-i357.
[12] LI H,DURBIN R.Fast and accurate short read alignment with Burrows-Wheeler transform[J].Bioinformatics,2009,25(14):1754-1760.
[13] LI H,HANDSAKER B,WYSOKER A,et al.The sequence alignment/map format and SAMtools[J].Bioinformatics,2009,25(16):2078-2079.
[14] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology (TIST),2011,2(3):389-396.
[15] 1000 Genomes Project Consortium.An integrated map of genetic variation from 1092 human genomes[J].Nature,2012,491(7422):56-65.

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .