计算机科学 ›› 2014, Vol. 41 ›› Issue (9): 269-273.doi: 10.11896/j.issn.1002-137X.2014.09.051

• 人工智能 • 上一篇    下一篇

带通配符的模式匹配问题及其解空间特征分析

项泰宁,郭丹,王海平,胡学钢   

  1. 合肥工业大学计算机与信息学院 合肥230009;合肥工业大学计算机与信息学院 合肥230009;合肥工业大学计算机与信息学院 合肥230009;合肥工业大学计算机与信息学院 合肥230009
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金:港澳学者合作研究基金项目(61229301),国家自然科学基金(61305062),博士后面上基金项目(2012M511403),安徽省自然科学基金(1308085QF102)资助

Characteristic Analysis of Pattern Matching with Wildcards Problem and its Solution Space

XIANG Tai-ning,GUO Dan,WANG Hai-ping and HU Xue-gang   

  • Online:2018-11-14 Published:2018-11-14

摘要: 随着生物信息学、信息检索等领域的发展,带有通配符和长度约束的模式匹配问题引起了广泛关注。该问题扩展了精确模式匹配问题,使匹配更加灵活,同时也增加了匹配的复杂性,极大地提高了非线性匹配算法的复杂度。求解该问题的匹配算法的效率与问题的解空间密切相关,而目前针对该问题的解空间及其特征尚缺乏系统的研究。鉴于此,描述了该问题的解空间,并分析了解空间的可分性。之后,提出解空间划分算法SPLIT,并分析了SPLIT的时间复杂性。实验部分以3个匹配算法为对照,在真实DNA数据集下,使用了5109组模式。实验结果表明,SPLIT不影响匹配解的结构,且可以有效降低非线性匹配算法的时间消耗。

关键词: 解空间,分割,模式匹配,通配符

Abstract: With the developments in bioinformatics and information retrieval,pattern matching with wildcards and length constraints has attracted wild attention.This problem extends the exact pattern matching,so it brings more flexibility as well as complexity.Meanwhile,the time complexity of non-linear algorithms is greatly increased.The solution space makes great difference to the efficiency of matching algorithms,but the characteristics of the problem solution space is still lack of systematic research.Therefore,the problem solution space was presented,and the divisibility of the space was analyzed.Afterwards,a solution space partitioning algorithm,named SPLIT,was proposed,and its time complexity was also illustrated.In our experiments,three pattern matching algorithms were used as baselines running on realDNA data sets with 5109 sets of patterns.Our empirical results verify that SPLIT can effectively reduce the time consumption of non-linear matching algorithms without influencing the matching solution.

Key words: Solution space,Partitioning,Pattern matching,Wildcard

[1] Chen Gong,Wu Xin-dong,Zhu Xing-quan,et al.Efficient string matching with wildcards and length constraints[J].Knowledge and Information Systems,2006,0(4):399-419
[2] Cole J R,Chai B,Marsh T L,et al.The ribosomal database project (RDP-11):Sequences and tools for high-throughput rRNA analysis[J].Nucleic Acids Research,2005,3(1):294-296
[3] Xie Fei,Wu Xin-dong,Hu Xue-gang,et al.Sequential PatternMining with Wildcards[C]∥22nd IEEE International Confe-rence on Tools with Artificial Intelligence (ICTAI).Arras,France,2010:241-247
[4] He Yu,Wu Xin-dong,Zhu Xing-quan,et al.Mining Frequent Patterns with Wildcards from Biological Sequences[C]∥IEEE International Conference on Information Reuse and Integration.Las Vegas,IL,2007:329-334
[5] Califf M E,Mooney R J.Bottom-up relational learning of pattern matching rules for information extraction[J].Journal of Machine Learning Research,2003,4(6):177-210
[6] Manber U,Baeza-Yates R.An algorithm for string matchingwith a sequence of don’t cares[J].Information Processing Letters,1991,7(3):133-136
[7] Fischer M J,Paterson M S.String matching and other products[R].Massachusetts Institute of Technology Cambridge Project MAC, 1974:113-125
[8] Zhang Ming-hua,Kao B,Cheung DW,et al.Mining periodic patterns with gap requirement from sequences[C]∥Proc.of ACM SIGMOD.Baltimore Maryland,2005:623-633
[9] Bille P,Grtz I L,Vildhj H,et al.String matching with variable length gaps[C]∥Proc.of 17th SPIRE.2010:385-394
[10] Muthukrishnan S,Krishna P.Non-stadard stringology:algo-rithms and complexity [C]∥Proc.of the twenty-sixth annual ACM symposium on Theory of computing.New York,NY,USA,1994
[11] Guo Dan,Hong Xiao-li,Hu Xue-gang,et al.A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards[J].Cybernetics and Systems,2011,2(6):382-401
[12] 武优西,吴信东,江贺,等.一种求解MPMGOOC问题的启发式算法[J].计算机学报,2011,4(8):1452-1462
[13] 侯宝剑,谢飞,胡学钢,等.基于后缀树的带有通配符的模式匹配研究[J].计算机科学,2012,9(12):177-180
[14] 张雁,焦方正,卢昕玮,等.采用染色划分改进的RLS 算法及性能分析[J].软件学报,2011,2(10):2305-2316
[15] Zhang Li-hua,Xu Wen-li,Chang Cheng.Genetic algorithm for affine point pattern matching[J].Pattern Recognition Letters,2003,4:9-19
[16] Sagot M F,Viari A.A Double Combinatorial Approach to Dis-covering Patterns in Biological Sequence[C]∥Proc.of the 7th Symposium on Combinatorial Pattern Matching.Springer,1996:186-208
[17] Min Fan,Wu Xin-dong,Lu Zhen-yu.Pattern matching with independent wildcard gaps[C]∥Eighth IEEE International Conference on Dependable,Autonomic and Secure Computing(DASC-2009).Chengdu,China,2009:194-199
[18] National center for biotechnology information website.http://www.ncbi.nlm.nih.gov/

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!