计算机科学 ›› 2017, Vol. 44 ›› Issue (2): 270-274.doi: 10.11896/j.issn.1002-137X.2017.02.045
孙焘,朱晓明
SUN Tao and ZHU Xiao-ming
摘要: 多条序列的最长公共子序列可以代表多条序列的公共信息,其在诸多领域里有着重要的应用,如信息检索、基因序列匹配等。求解多条序列的最长公共子序列是著名的NP难问题,本质为多解问题。一些近似算法虽然时间复杂度较低,但只能求出单解,对于有多解的序列集合,求得的结果信息量损失较大。因此提出一个新的近似算法来解决最长公共子序列问题。算法引入了代数结构“格”,通过动态规划求解出两条序列的公共格,并递归求解当前格与当前序列的公共格。公共格中的路径保存了多条公共子序列使得最终求解出的最长公共子序列为多个。对算法的相关定理给出了理论证明,并通过实验验证了算法的正确性。
[1] MAIER D.The complexity of some problem on subsequencesand superseuqences[J].Journal of the ACM (JACM), 1978,25 (2):322-366. [2] BERGROTH L,HAKONEN H,RAITA T.A survey of longest common subsequence algorithms[C]∥Processings.Seventh International Symposium on String Processing and Information Retrieval,2000(SPIRE 2000).IEEE,2000:39-48. [3] ATTWOOD T,FINDLAY J.Fingerprinting g-protein-coupledreceptors[J].Protein engineering,1994,7(2):195-203. [4] SANKOFF D,BLANCHETTE M.Phylogenetic invariants forgenome rearrangements[J].Journal of Computational Biology,1999,6(3/4):431-445. [5] SHUKLA A,AGARWAL S.A relative position based algorithm to find out the longest common subsequence from multiple biological sequences[C]∥Proceedings of the 2010 International Conference on Computer and Communication Technology (ICCCT).2010:496-502. [6] RIZVI S,AGARWAL P.A new index-based parallel algorithm for finding longest common subsequence in multiple dna sequences[C]∥International Conference in Cognitive Systems.Citeseer,2005. [7] HAKATA K,IMAI H.Algorithms for the longest commonsubsequence problem for multiple strings based on geometric maxima[J].Optimization Methods and Software,1998,10(2):233-260. [8] BOURQUE G,PEVZNER P A.Genome-scale evolution:reconstructing gene orders in the ancestral species[J].Genome research,2002,12(1):26-36. [9] SHERIDAN R P,VENKATARAGHAVAN R.A systematicsearch for protein signature sequences[J].Proteins-Structure Function and Bioinformatics,1992,14(1):16-28. |
No related articles found! |
|