Computer Science ›› 2016, Vol. 43 ›› Issue (5): 261-264.doi: 10.11896/j.issn.1002-137X.2016.05.049

Fast Seed Set Refinement Algorithm for Closest Substrings Discovery

ZHANG Yi-pu, RU Feng and WANG Biao   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Finding the closest substrings in sequences is crucial for mining the specific functional sites in gene and understanding the gene regulatory relationship.This paper proposed a novel algorithm namely SCEM based on the improved expectation maximization algorithm of the seed sets refinement.SCEM divides the dataset into a series of seed sets by clustering the input sequences,then uses the improved EM algorithm to refine these seed sets and finds the closest substrings finally.Experiments using the real datasets and the simulated datasets demonstrat our algorithm can find the real closest substrings and has the high performance and efficiency comparing with the popular algorithm such as Random Projection.Moreover,SCEM also can solve the long closest substrings finding problem effectively.

Key words: Closest substring,Seed set,Cluster refinement,Expectation maximization

