计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 130-138.doi: 10.11896/jsjkx.220700105

• 数据库&大数据&数据科学 • 上一篇    下一篇

高效低索引的图相似性搜索算法

邱珍, 郑朝晖   

  1. 苏州大学计算机科学与技术学院 江苏 苏州 215006
    江苏省网络空间安全工程实验室 江苏 苏州 215006
  • 收稿日期:2022-07-11 修回日期:2022-11-11 出版日期:2023-09-15 发布日期:2023-09-01
  • 通讯作者: 郑朝晖(zhengzh@suda.edu.cn)
  • 作者简介:(20205227108@stu.suda.edu.cn)
  • 基金资助:
    江苏省高校自然科学研究项目(19KJA550002)

Graph Similarity Search with High Efficiency and Low Index

QIU Zhen, ZHENG Zhaohui   

  1. School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    Jiangsu Province Cyberspace Security Engineering Laboratory,Suzhou,Jiangsu 215006,China
  • Received:2022-07-11 Revised:2022-11-11 Online:2023-09-15 Published:2023-09-01
  • About author:QIU Zhen,born in 1997,postgraduate,is a member of China Computer Federation.Her main research interests include data retrieval and network security.
    ZHENG Zhaohui,born in 1968,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining and network security.
  • Supported by:
    Natural Science Foundation of the Jiangsu Higher Education Institutions of China(19KJA550002).

摘要: 图相似性搜索是在给定的度量标准下查找与查询图相似的图集合,目前大多采用“过滤-验证”的计算框架。针对现有方法中过滤下界不紧密和索引空间占用较大等问题,提出了一种基于查询图分区的多层级过滤、低索引空间占用的图相似性搜索算法Z-Index。该算法首先通过全局粗粒度过滤得到预候选集;然后提出基于扩展概率的查询图分区算法,并采用层级过滤机制进一步精简候选集,增强下界紧密性;最后引入序列相似性差值计算序列中数据分布的稀疏度,提出分区压缩和差值压缩两种编码压缩算法,并据此构建“零”索引结构,降低索引空间开销。实验结果表明,Z-Index算法所得下界更加紧密,产生的候选集大小可减少50%左右,算法执行时间大大缩短,且该算法在索引空间占用极小的情况下仍具有可扩展性。

关键词: 图相似性搜索, 层级过滤, 扩展概率, 编码压缩, 查询图分区

Abstract: Graph similarity search is to search the graph set that is similar to query graph under a measurement,which adopts the “filtering-verification” framework.Aiming at the problems of the existing methods,such as the untight lower bound and the large index space,an improved graph similarity search algorithm(Z-Index) based on query graph partition with multi-level filtering and low index space is proposed.Firstly,the pre-candidate set is obtained by global coarse-grained filtering.Secondly,a query graph partitioning algorithm based on extension probability is proposed,and a hierarchical filtering mechanism is adopted to further shrink the candidate set,so as to enhance the tightness of the lower bound.Finally,the sequence similarity difference is introduced to compute the sparsity of the data contribution.Then partition compression and difference compression algorithm are proposed to construct “zero” index structure,so as to reduce the index space.Experimental results show that Z-Index algorithm has a tighter lower bound,and the candidate set size of Z-Index can be reduced about 50%.Moreover,the algorithm execution time is greatly reduced,and it still shows great scalability in the case of tiny index space.

Key words: Graph similarity search, Hierarchical filtering, Extension probability, Coding compression, Query graph partitioning

中图分类号: 

  • TP391
[1]ZHENG W G,ZOU L,LIAN X,et al.Graph similarity search with edit distance constraint in large graph databases[C]//ACM International Conference on Conference on Information & Knowledge Management.ACM,2013:1595-1600.
[2]ZHAO X,XIAO C,LIN X,et al.Efficient processing of graph similarity queries with edit distance constraints[J].The VLDB Journal,2013,22(6):727-752.
[3]ZHAO X,XIAO C,LIN X,et al.A partition-based approach to structure similarity search[J].The VLDB Journal,2013,7(3):169-180.
[4]GOUDA K,ARAFA M.An improved global lower bound for graph edit similarity search[J].Pattern Recognition Letters,2015,58(Jun.1):8-14.
[5]WESKAMP N,HULLERMEIER E,KUHN D,et al.Multiple graph alignment for the structural analysis of protein active sites[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(2):310-320.
[6]CHENG J,KE Y,FU W C,et al.Fast graph query processing with a low-cost index[J].The VLDB Journal,2011,20(4):521-539.
[7]WANG Y,FENG Z H,CHEN L,et al.Efficient similaritysearch for sets over graphs[J].IEEE Transactions on Know-ledge and Data Engineering,2021,33(2):444-458.
[8]SHIMOMURA L C,OYAMADA R S,VIEIRA M R,et al.Asurvey on graph-based methods for similarity searches in metric spaces [J].Information Systems,2020,95(Jan.):101507.1-10507.22.
[9]XIANG Y Z,TAN J X,HAN J S,et al.Survey of Graph Ma-tching Algorithms [J].Computer Science,2018,45(6):27-31.
[10]XU Z B,ZHANG K,NING L H,et al.Summary of Graph Edit Distance[J].Computer Science,2018,45(4):11-18.
[11]HART P E,NILSSON N J,RAPHAEL B.A formal basis for the heuristic determination of minimum cost paths[J].IEEE transactions on Systems Science and Cybernetics,1968,4(2):100-107.
[12]ABU-AISHEH Z,RAVEAUX R,RAMEL J Y,et al.An exact graph edit distance algorithm for solving pattern recognition problems[C]//4th International Conference on Pattern Recognition Applications and Methods.Lisbon,Portugal,SCITEPRESS,2015:271-278.
[13]ABU-AISHEH Z,RAVEAUX R,RAMEL J Y,et al.A Distri-buted Algorithm for Graph Edit Distance[C]//the Eighth International Conference on Advances in Databases,Knowledge and Data Applications.Lisbon,Portugal,2016:66-71.
[14]GOUDA K,HASSAAN M.CSI_GED:An efficient approach for graph edit similarity computation[C]//32nd International Conference on Data Engineering(ICDE).Helsinki,Finland,IEEE,2016:265-276.
[15]CHANG L,FENG X,LIN X,et al.Speeding up GED verification for graph similarity search[C]//36th International Confe-rence on Data Engineering(ICDE).Dallas,TX,USA,IEEE,2020:793-804.
[16]CHEN X,HUO H W,HUAN J,et al.An efficient algorithm for graph edit distance computation[J].Knowledge-Based Systems,2019,163:762-775.
[17]WANG G,WANG B,YANG X,et al.Efficiently indexing large sparse graphs for similarity search [J].IEEE Transactions on Knowledge and Data Engineering,2012,24(3):440-451.
[18]ZHENG W G,ZOU L,LIAN X,et al.Efficient graph similarity search over large graph databases[J].IEEE Transactions on Knowledge and Data Engineering,2014,27(4):964-978.
[19]ZHAO X,XIAO C,LIN X,et al.Efficient structure similaritysearches:a partition-based approach[J].The VLDB Journal,2018,27(1):53-78.
[20]DIEGO R,RAMIRO T,POLO V.An exact approach for the multi-constraint graph partitioning problem[J].EURO Journal on Computational Optimization,2020,8(3/4):289-308.
[21]LIANG Y J,ZHAO P X.Similarity search in graph databases:A Multi-Layered Indexing Approach[C]//2017 IEEE 33rd International Conference on Data Engineering(ICDE).San Diego,CA,USA,IEEE,2017:783-794.
[22]BLUMENTHAL D B,BORIA N,GAMPER J,et al.Comparing heuristics for graph edit distance computation[J].The VLDB Journal,2020,29(1):419-458.
[23]QIN J,XIAO C,WANG Y,et al.Generalizing the pigeonhole principle for similarity search in hamming space[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2019,33(2):489-505.
[24]PIBIRI G E,VENTURINI R.Techniques for inverted indexcompression[J].ACM Computing Surveys,2021,52(6):125:1-125:36.
[25]PIBIRI G E,VENTURINI R.On optimally partitioning variable-byte codes[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2020,32(9):1812-1823.
[26]GIULI O,ERMANN O,PIBIR I,et al.Clustered Elias-Fano indexes[J].ACM Transactions on Information Systems,2018,36(1):2.1-2.33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!