Computer Science ›› 2016, Vol. 43 ›› Issue (Z6): 51-54.doi: 10.11896/j.issn.1002-137X.2016.6A.011

Previous Articles     Next Articles

Sequential Verification Algorithm to Compute Edit Distance Based on Edit Operation Sequence

ZHANG Run-liang and NIU Zhi-xian   

  • Online:2018-12-01 Published:2018-12-01

Abstract: The edit distance between two strings is the minimum number of edit operations required to transform one into another.The edit distance is widely used in approximate string match,string similarity joins and etc.Dynamic programming algorithm(DPA) uses an edit distance matrix to compute the edit distance between two strings,which needs to compute all the elements in the matrix and has poor time efficiency.The progressive method changes the calculation orders of the elements to reduce the calculation numbers,which still needs to compute half of the elements and whose time efficiency needs to be improved.In our paper,we proposed a sequential verification algorithm to compute the edit distance based on the edit operation sequence.First,we analyzed the enumerable nature of edit operation sequence and gave a way to enumerate the sequences.Then,we got the result though sequentially verifying the edit operation sequences ordered by its edit operation numbers until the verification being successful.Experiments on approximate string search with threshold 2 show that compared with the DPA,our method achieves high performance.

Key words: Approximate string search,Edit distance,Sequential verification,Edit operation sequence

[1] Deng D,Li G,Feng J,et al.Top-k string similarity search with edit-distance constraints[C]∥ICDE.2013:925-936
[2] Bayardo R,Ma Y,Srikant R.Scaling up all-pairs similaritysearch[C]∥WWW Conference.2007
[3] Chaudhuri S,Ganjam K,Ganti V,et al.Robust and EfficientFuzzy Match for Online Data Cleaning[C]∥SIGMOD.2003:313-324
[4] Jiang Y,Li G,Feng J,et al.String similarity joins:An experimental evaluation[J].Proceedings of the VLDB Endowment,2014,7(8):625-636
[5] Li C,Lu J,Lu Y.Efficient merging and filtering algorithms for approximate string searches[C]∥ICDE.2008: 257-266
[6] 纳瓦罗.柔性字符串匹配[M].北京:电子工业出版社,2007
[7] 姜华,韩安琪,王美佳,等.基于改进编辑距离的字符串相似度求解算法[J].计算机工程,2014,40(1):222-227
[8] 黄亮,赵泽茂,梁兴开.基于编辑距离的Web数据挖掘[J].计算机应用,2012,32(6):1662-1665
[9] 薛晔伟,沈钧毅,张云.一种编辑距离算法及其在网页搜索中的应用[J].西安交通大学学报,2008,42(12):1450-1454

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!