计算机科学 ›› 2016, Vol. 43 ›› Issue (5): 261-264.doi: 10.11896/j.issn.1002-137X.2016.05.049
张懿璞,茹锋,王飚
ZHANG Yi-pu, RU Feng and WANG Biao
摘要: 寻找序列中的最近子串对挖掘基因中特殊的功能位点和了解基因调控关系有着重要意义。提出了一种新的基于种子集求精的改进的期望最大化算法SCEM来对基因序列中的最近子串进行寻找。通过对输入序列聚类,将数据集分解为若干种子集,再使用改进的期望最大化算法对各种子集进行求精,SCEM最终可寻找到序列中的最近子串。真实数据和模拟数据实验表明,SCEM算法可以寻找到真实的最近子串,与随机投影等流行算法相比也能保证较高的性能和效率,并且可以有效解决较长的最近子串寻找问题。
[1] Park P J.ChIP-seq:advantages and challenges of a maturing technology[J].Nat Rev Genet,2009,10(10):669-80 [2] Zhang Yi-pu,Wang Ping.A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets[C]∥BioMed Research International,2015.2015 [3] Bailey T L,Elkan C.Fitting a mixture model by expectation maximization to discover motifs in bipolymers[C]∥ Second International Conference on Intelligent Systems for Molecurlar Bio-logy.1994:28-36 [4] Zambelli F,Pesole G,Pavesi G.Motif discovery and transcription factor binding sites before and after the next-generation sequencing era[J].Briefings in Bioinformatics,2013,14(2):225-237 [5] Zhang Yi-pu,Huo H,Yu Qiang.A Heuristic Cluster-based EM Algorithm for the Planted(l,d) Problem[J].Journal of Bioinformatics and Computational Biology,2013,11(4):1350009 [6] Pevzner P A,Sze S H.Combinatorial approaches to finding subtle signals in DNA sequences[C]∥ISMB.2000:269-278 [7] Reid J,Wernisch L.STEME:efficient EM to find motifs in large data sets[J].Nucleic Acids Research,2011,39(18):126 [8] Buhler J,Tompa M.Finding motifs using random projections[J].Journal of Computational Biology,2002,9(2):225-242 [9] Chen Kun,Zhang Xiao-jun.MCI Clustering Algorithm SolvingPlanted (1,d) Motif Identification[J].Journal of Henan University(Natural Science),2015,45(1):102-107(in Chinese) 陈昆,张小骏.MCL聚类算法求解植入(l,d)模体识别问题[J].河南大学学报(自然科学版),2015,45(1):102-107 [10] Sun De-cai,Wang Xiao-xia.Research on Filtering Algorithm of Approximate String Matching[J].Computer Technology and Development,2015(4):171-176(in Chinese) 孙德才,王晓霞.近似串匹配过滤算法研究[J].计算机技术与发展,2015(4):171-176 [11] Xu Yong-kang,Yang Guang-lu,Lu Song-feng,et al.Approxi-mate string matching algorithm based on compressed suffixarray[J].Computer Engineering and Applications,2015,51(23):139-142(in Chinese) 胥永康,杨光露,路松峰,等.基于压缩后缀数组的近似字符串匹配算法[J].计算机工程与应用,2015,1(23):139-142 [12] Chan H L,Lam T W,Sung W K,et al.Compressed indexes for approximate string matching[J].Algorithmica,2010,58(2):263-281 [13] Atallah M J,Grigorescu E,Wu Y.A lower-variance randomized algorithm for approximate string matching[J].Information Processing Letters,2013,113(18):690-692 |
No related articles found! |
|