计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 244-248.doi: 10.11896/j.issn.1002-137X.2015.04.050

• 人工智能 • 上一篇    下一篇

一种带有通配符和长度约束模式匹配问题的动态剪枝算法

王海平,戴 玮,郭 丹   

  1. 合肥工业大学计算机与信息学院 合肥230009,陆军军官学院基础部 合肥230031,合肥工业大学计算机与信息学院 合肥230009
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金:港澳学者合作研究基金项目(61229301),国家自然科学基金项目(60828005),博士后面上基金项目(2012M511403),安徽省自然科学基金(2013AKZR0082)资助

Dynamic Pruning Algorithm for Pattern Matching with Wildcards and Length Constraints

WANG Hai-ping, DAI Wei and GUO Dan   

  • Online:2018-11-14 Published:2018-11-14

摘要: 近年来,随着生物信息学、信息检索等领域的发展,串模式匹配问题被不断扩展。其中,具有代表性的是在模式中引入可变长度的通配符而形成带有通配符的模式匹配(PMWL)。该问题定义的灵活性给用户提供了方便,却也造成了求解上的困难。因此,如何在多项式时间内得到更好的匹配解成为研究的焦点。提出了一种启发式的小兵算法。小兵算法通过将PMWL问题转化为路径搜索问题,并借鉴动态剪枝思想,在算法搜索的过程中动态地将不可能的匹配位置剪枝,从而提高解的质量。实验在真实DNA序列上进行,并人工生成了196个模式。结果表明,相比于目前最有效的SAIL算法,小兵算法在绝大多数的尾部有重复字符的模式中可以获得更好的匹配解。

关键词: 模式匹配,通配符,剪枝,约束

Abstract: Recently,with the development of bioinformatics,network security and information retrieval,wildcards are playing an increasingly key role in pattern ma-tching problems.Among these,the challenging problem of pattern ma-tching with wildcards and length constraints (PMWL) further extends the flexibility of pattern matching,thus bringing convenience to the user but producing the expense of higher computational complexity.Therefore,how to get a better matching solution in polynomial time has been well studied and developed.Based on the previous work,we presented a heuristic Pawn algorithm for PMWL.After transforming the PMWL problem into path search problem,a dynamic strategy is adopted with a pruning procedure in an alternating iterative manner.Experiments demonstrate that,comparing with the state-of-the-art algorithm,Pawn algorithm can improve the number of matching occurrences in most cases of 196 artificial patterns with certain characteristic and the DNA sequences.

Key words: Pattern matching,Wildcard,Pruning,Constraints

[1] Navarro G,Raffinot M.Flexible pattern matching in strings:Practical on-line search algorithms for texts and biological sequences[M].Cambridge,UK:Cambridge University Press,2002
[2] Han J,Cheng H,Xin D,et al.Frequent pattern mining:CurrentStatus and Future Directions[J].Data Mining and Knowledge Discovery,2007,15(1):55-86
[3] Fischer M J,Paterson M S.String matching and other products[R].Complexity of computation,1974:113-125
[4] Cole J R,Chai B,Marsh T L,et al.The ribosomal database project(RDP-11):Sequences and tools for high-throughput rRNA analysis[J].Nucleic Acids Research,2005,33 (1):294-296
[5] 徐小双,冯玉才,王锋,等.路径分区编码优化小枝查询[J].计算机科学,2010,37(3):182-204
[6] Xie F,Wu X D,Hu X G,et al.Sequential Pattern Mining with Wildcards[C]∥22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI).2010:241-247
[7] Guo D,Hu X G,Xie F,et al.Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph[J].Applied Intelligence,2013,39:57-74
[8] Bille P,Gortz I,Vildhoj H,et al.String matching with variable length gaps[J].Theoretical Computer Science,2012,443:25-34
[9] Chen G,Wu X D,Zhu X Q,et al.Efficient String Matching with Wildcards and Length Constraints[J].Knowledge and Information Systems,2006,10(4):399-419
[10] Min F,Wu X D,Lu Z.Pattern matching with independent wildcard gaps[C]∥Eighth IEEE International Conference on Dependable,Autonomic and Secure Computing (DASC-2009).2009:194-199
[11] Zhang M,Kao B,Cheung D,et al.Mining periodic patterns with gap requirement from sequences[C]∥Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data.2005:622-633
[12] Zhu X Q,Wu X D.Mining complex patterns across sequences with gap requirements[C]∥Proceedings of 2007 International Joint Conference on Artificial Intelligence.2007:2934-2940
[13] Guo D,Hong X L,Hu X G,et al.A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards[J].Cybernetics and Systems,2011,42(6):382-401
[14] 王海平,胡学钢,谢飞,等.模式特征对带有通配符和长度约束的模式匹配问题的影响[J].模式识别与人工智能,2012,25(6):1013-1021
[15] 运正佳,李铁男,杨晓春.支持带有通配符的字符串匹配算法[J].计算机科学与探索,2010,4(11):984-995
[16] Muthukrishnan S,Palem K.Non-standard stringology:algo-rithms and complexity[C]∥Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing.1994:770-779
[17] Manber U,Baeza-Yates R.An algorithm for string matchingwith a sequence of don’t cares[J].Information Processing Letters,1991,37(3):133-136
[18] Zhu X Q,Wu X D.Discovering relational patterns across multiple databases[C]∥Proceedings of IEEE 23rd International Conference on Data Engineering.2007:726-735
[19] Zhang M,Kao B,Cheung D W,et al.Mining periodic patterns with gap requirement from sequences[C]∥Proceedings of ACM SIGMOD.2005:623-633

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!