计算机科学 ›› 2016, Vol. 43 ›› Issue (Z6): 51-54.doi: 10.11896/j.issn.1002-137X.2016.6A.011

• 智能计算 • 上一篇    下一篇

基于基本操作序列的编辑距离顺序验证

张润梁,牛之贤   

  1. 太原理工大学计算机科学与技术学院 太原030024,太原理工大学计算机科学与技术学院 太原030024
  • 出版日期:2018-12-01 发布日期:2018-12-01

Sequential Verification Algorithm to Compute Edit Distance Based on Edit Operation Sequence

ZHANG Run-liang and NIU Zhi-xian   

  • Online:2018-12-01 Published:2018-12-01

摘要: 两字符串的编辑距离是从一个串转换到另一个串所需要的最少基本操作数。编辑距离广泛应用于字符串近似匹配、字符串相似连接等领域。动态规划法利用编辑距离矩阵来计算两个串的编辑距离,需要计算矩阵中的所有元素,时间效率低。改进的方法改变了矩阵中元素的计算次序,减少了需要比对的元素,但仍需要比对一半以上的元素,时间效率还有待提高。提出基于基本操作序列的编辑距离顺序验证方法。首先,分析了基本操作序列的可列性,给出了列举基本操作序列的方法。然后依次顺序验证基本操作数从小到大的基本操作序列直到某一序列通过验证,得到其编辑距离。在阈值为2的字符串近似搜索实验中发现,所提方法比动态规划类方法具有更高的效率。

关键词: 近似串搜索,编辑距离,顺序验证,基本操作序列

Abstract: The edit distance between two strings is the minimum number of edit operations required to transform one into another.The edit distance is widely used in approximate string match,string similarity joins and etc.Dynamic programming algorithm(DPA) uses an edit distance matrix to compute the edit distance between two strings,which needs to compute all the elements in the matrix and has poor time efficiency.The progressive method changes the calculation orders of the elements to reduce the calculation numbers,which still needs to compute half of the elements and whose time efficiency needs to be improved.In our paper,we proposed a sequential verification algorithm to compute the edit distance based on the edit operation sequence.First,we analyzed the enumerable nature of edit operation sequence and gave a way to enumerate the sequences.Then,we got the result though sequentially verifying the edit operation sequences ordered by its edit operation numbers until the verification being successful.Experiments on approximate string search with threshold 2 show that compared with the DPA,our method achieves high performance.

Key words: Approximate string search,Edit distance,Sequential verification,Edit operation sequence

[1] Deng D,Li G,Feng J,et al.Top-k string similarity search with edit-distance constraints[C]∥ICDE.2013:925-936
[2] Bayardo R,Ma Y,Srikant R.Scaling up all-pairs similaritysearch[C]∥WWW Conference.2007
[3] Chaudhuri S,Ganjam K,Ganti V,et al.Robust and EfficientFuzzy Match for Online Data Cleaning[C]∥SIGMOD.2003:313-324
[4] Jiang Y,Li G,Feng J,et al.String similarity joins:An experimental evaluation[J].Proceedings of the VLDB Endowment,2014,7(8):625-636
[5] Li C,Lu J,Lu Y.Efficient merging and filtering algorithms for approximate string searches[C]∥ICDE.2008: 257-266
[6] 纳瓦罗.柔性字符串匹配[M].北京:电子工业出版社,2007
[7] 姜华,韩安琪,王美佳,等.基于改进编辑距离的字符串相似度求解算法[J].计算机工程,2014,40(1):222-227
[8] 黄亮,赵泽茂,梁兴开.基于编辑距离的Web数据挖掘[J].计算机应用,2012,32(6):1662-1665
[9] 薛晔伟,沈钧毅,张云.一种编辑距离算法及其在网页搜索中的应用[J].西安交通大学学报,2008,42(12):1450-1454

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!