计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 130-138.doi: 10.11896/jsjkx.220700105
邱珍, 郑朝晖
QIU Zhen, ZHENG Zhaohui
摘要: 图相似性搜索是在给定的度量标准下查找与查询图相似的图集合,目前大多采用“过滤-验证”的计算框架。针对现有方法中过滤下界不紧密和索引空间占用较大等问题,提出了一种基于查询图分区的多层级过滤、低索引空间占用的图相似性搜索算法Z-Index。该算法首先通过全局粗粒度过滤得到预候选集;然后提出基于扩展概率的查询图分区算法,并采用层级过滤机制进一步精简候选集,增强下界紧密性;最后引入序列相似性差值计算序列中数据分布的稀疏度,提出分区压缩和差值压缩两种编码压缩算法,并据此构建“零”索引结构,降低索引空间开销。实验结果表明,Z-Index算法所得下界更加紧密,产生的候选集大小可减少50%左右,算法执行时间大大缩短,且该算法在索引空间占用极小的情况下仍具有可扩展性。
中图分类号:
[1]ZHENG W G,ZOU L,LIAN X,et al.Graph similarity search with edit distance constraint in large graph databases[C]//ACM International Conference on Conference on Information & Knowledge Management.ACM,2013:1595-1600. [2]ZHAO X,XIAO C,LIN X,et al.Efficient processing of graph similarity queries with edit distance constraints[J].The VLDB Journal,2013,22(6):727-752. [3]ZHAO X,XIAO C,LIN X,et al.A partition-based approach to structure similarity search[J].The VLDB Journal,2013,7(3):169-180. [4]GOUDA K,ARAFA M.An improved global lower bound for graph edit similarity search[J].Pattern Recognition Letters,2015,58(Jun.1):8-14. [5]WESKAMP N,HULLERMEIER E,KUHN D,et al.Multiple graph alignment for the structural analysis of protein active sites[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(2):310-320. [6]CHENG J,KE Y,FU W C,et al.Fast graph query processing with a low-cost index[J].The VLDB Journal,2011,20(4):521-539. [7]WANG Y,FENG Z H,CHEN L,et al.Efficient similaritysearch for sets over graphs[J].IEEE Transactions on Know-ledge and Data Engineering,2021,33(2):444-458. [8]SHIMOMURA L C,OYAMADA R S,VIEIRA M R,et al.Asurvey on graph-based methods for similarity searches in metric spaces [J].Information Systems,2020,95(Jan.):101507.1-10507.22. [9]XIANG Y Z,TAN J X,HAN J S,et al.Survey of Graph Ma-tching Algorithms [J].Computer Science,2018,45(6):27-31. [10]XU Z B,ZHANG K,NING L H,et al.Summary of Graph Edit Distance[J].Computer Science,2018,45(4):11-18. [11]HART P E,NILSSON N J,RAPHAEL B.A formal basis for the heuristic determination of minimum cost paths[J].IEEE transactions on Systems Science and Cybernetics,1968,4(2):100-107. [12]ABU-AISHEH Z,RAVEAUX R,RAMEL J Y,et al.An exact graph edit distance algorithm for solving pattern recognition problems[C]//4th International Conference on Pattern Recognition Applications and Methods.Lisbon,Portugal,SCITEPRESS,2015:271-278. [13]ABU-AISHEH Z,RAVEAUX R,RAMEL J Y,et al.A Distri-buted Algorithm for Graph Edit Distance[C]//the Eighth International Conference on Advances in Databases,Knowledge and Data Applications.Lisbon,Portugal,2016:66-71. [14]GOUDA K,HASSAAN M.CSI_GED:An efficient approach for graph edit similarity computation[C]//32nd International Conference on Data Engineering(ICDE).Helsinki,Finland,IEEE,2016:265-276. [15]CHANG L,FENG X,LIN X,et al.Speeding up GED verification for graph similarity search[C]//36th International Confe-rence on Data Engineering(ICDE).Dallas,TX,USA,IEEE,2020:793-804. [16]CHEN X,HUO H W,HUAN J,et al.An efficient algorithm for graph edit distance computation[J].Knowledge-Based Systems,2019,163:762-775. [17]WANG G,WANG B,YANG X,et al.Efficiently indexing large sparse graphs for similarity search [J].IEEE Transactions on Knowledge and Data Engineering,2012,24(3):440-451. [18]ZHENG W G,ZOU L,LIAN X,et al.Efficient graph similarity search over large graph databases[J].IEEE Transactions on Knowledge and Data Engineering,2014,27(4):964-978. [19]ZHAO X,XIAO C,LIN X,et al.Efficient structure similaritysearches:a partition-based approach[J].The VLDB Journal,2018,27(1):53-78. [20]DIEGO R,RAMIRO T,POLO V.An exact approach for the multi-constraint graph partitioning problem[J].EURO Journal on Computational Optimization,2020,8(3/4):289-308. [21]LIANG Y J,ZHAO P X.Similarity search in graph databases:A Multi-Layered Indexing Approach[C]//2017 IEEE 33rd International Conference on Data Engineering(ICDE).San Diego,CA,USA,IEEE,2017:783-794. [22]BLUMENTHAL D B,BORIA N,GAMPER J,et al.Comparing heuristics for graph edit distance computation[J].The VLDB Journal,2020,29(1):419-458. [23]QIN J,XIAO C,WANG Y,et al.Generalizing the pigeonhole principle for similarity search in hamming space[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2019,33(2):489-505. [24]PIBIRI G E,VENTURINI R.Techniques for inverted indexcompression[J].ACM Computing Surveys,2021,52(6):125:1-125:36. [25]PIBIRI G E,VENTURINI R.On optimally partitioning variable-byte codes[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2020,32(9):1812-1823. [26]GIULI O,ERMANN O,PIBIR I,et al.Clustered Elias-Fano indexes[J].ACM Transactions on Information Systems,2018,36(1):2.1-2.33. |
|