Computer Science ›› 2023, Vol. 50 ›› Issue (9): 130-138.doi: 10.11896/jsjkx.220700105

• Database & Big Data & Data Science • Previous Articles     Next Articles

Graph Similarity Search with High Efficiency and Low Index

QIU Zhen, ZHENG Zhaohui   

  1. School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    Jiangsu Province Cyberspace Security Engineering Laboratory,Suzhou,Jiangsu 215006,China
  • Received:2022-07-11 Revised:2022-11-11 Online:2023-09-15 Published:2023-09-01
  • About author:QIU Zhen,born in 1997,postgraduate,is a member of China Computer Federation.Her main research interests include data retrieval and network security.
    ZHENG Zhaohui,born in 1968,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining and network security.
  • Supported by:
    Natural Science Foundation of the Jiangsu Higher Education Institutions of China(19KJA550002).

Abstract: Graph similarity search is to search the graph set that is similar to query graph under a measurement,which adopts the “filtering-verification” framework.Aiming at the problems of the existing methods,such as the untight lower bound and the large index space,an improved graph similarity search algorithm(Z-Index) based on query graph partition with multi-level filtering and low index space is proposed.Firstly,the pre-candidate set is obtained by global coarse-grained filtering.Secondly,a query graph partitioning algorithm based on extension probability is proposed,and a hierarchical filtering mechanism is adopted to further shrink the candidate set,so as to enhance the tightness of the lower bound.Finally,the sequence similarity difference is introduced to compute the sparsity of the data contribution.Then partition compression and difference compression algorithm are proposed to construct “zero” index structure,so as to reduce the index space.Experimental results show that Z-Index algorithm has a tighter lower bound,and the candidate set size of Z-Index can be reduced about 50%.Moreover,the algorithm execution time is greatly reduced,and it still shows great scalability in the case of tiny index space.

Key words: Graph similarity search, Hierarchical filtering, Extension probability, Coding compression, Query graph partitioning

CLC Number: 

  • TP391
[1]ZHENG W G,ZOU L,LIAN X,et al.Graph similarity search with edit distance constraint in large graph databases[C]//ACM International Conference on Conference on Information & Knowledge Management.ACM,2013:1595-1600.
[2]ZHAO X,XIAO C,LIN X,et al.Efficient processing of graph similarity queries with edit distance constraints[J].The VLDB Journal,2013,22(6):727-752.
[3]ZHAO X,XIAO C,LIN X,et al.A partition-based approach to structure similarity search[J].The VLDB Journal,2013,7(3):169-180.
[4]GOUDA K,ARAFA M.An improved global lower bound for graph edit similarity search[J].Pattern Recognition Letters,2015,58(Jun.1):8-14.
[5]WESKAMP N,HULLERMEIER E,KUHN D,et al.Multiple graph alignment for the structural analysis of protein active sites[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(2):310-320.
[6]CHENG J,KE Y,FU W C,et al.Fast graph query processing with a low-cost index[J].The VLDB Journal,2011,20(4):521-539.
[7]WANG Y,FENG Z H,CHEN L,et al.Efficient similaritysearch for sets over graphs[J].IEEE Transactions on Know-ledge and Data Engineering,2021,33(2):444-458.
[8]SHIMOMURA L C,OYAMADA R S,VIEIRA M R,et al.Asurvey on graph-based methods for similarity searches in metric spaces [J].Information Systems,2020,95(Jan.):101507.1-10507.22.
[9]XIANG Y Z,TAN J X,HAN J S,et al.Survey of Graph Ma-tching Algorithms [J].Computer Science,2018,45(6):27-31.
[10]XU Z B,ZHANG K,NING L H,et al.Summary of Graph Edit Distance[J].Computer Science,2018,45(4):11-18.
[11]HART P E,NILSSON N J,RAPHAEL B.A formal basis for the heuristic determination of minimum cost paths[J].IEEE transactions on Systems Science and Cybernetics,1968,4(2):100-107.
[12]ABU-AISHEH Z,RAVEAUX R,RAMEL J Y,et al.An exact graph edit distance algorithm for solving pattern recognition problems[C]//4th International Conference on Pattern Recognition Applications and Methods.Lisbon,Portugal,SCITEPRESS,2015:271-278.
[13]ABU-AISHEH Z,RAVEAUX R,RAMEL J Y,et al.A Distri-buted Algorithm for Graph Edit Distance[C]//the Eighth International Conference on Advances in Databases,Knowledge and Data Applications.Lisbon,Portugal,2016:66-71.
[14]GOUDA K,HASSAAN M.CSI_GED:An efficient approach for graph edit similarity computation[C]//32nd International Conference on Data Engineering(ICDE).Helsinki,Finland,IEEE,2016:265-276.
[15]CHANG L,FENG X,LIN X,et al.Speeding up GED verification for graph similarity search[C]//36th International Confe-rence on Data Engineering(ICDE).Dallas,TX,USA,IEEE,2020:793-804.
[16]CHEN X,HUO H W,HUAN J,et al.An efficient algorithm for graph edit distance computation[J].Knowledge-Based Systems,2019,163:762-775.
[17]WANG G,WANG B,YANG X,et al.Efficiently indexing large sparse graphs for similarity search [J].IEEE Transactions on Knowledge and Data Engineering,2012,24(3):440-451.
[18]ZHENG W G,ZOU L,LIAN X,et al.Efficient graph similarity search over large graph databases[J].IEEE Transactions on Knowledge and Data Engineering,2014,27(4):964-978.
[19]ZHAO X,XIAO C,LIN X,et al.Efficient structure similaritysearches:a partition-based approach[J].The VLDB Journal,2018,27(1):53-78.
[20]DIEGO R,RAMIRO T,POLO V.An exact approach for the multi-constraint graph partitioning problem[J].EURO Journal on Computational Optimization,2020,8(3/4):289-308.
[21]LIANG Y J,ZHAO P X.Similarity search in graph databases:A Multi-Layered Indexing Approach[C]//2017 IEEE 33rd International Conference on Data Engineering(ICDE).San Diego,CA,USA,IEEE,2017:783-794.
[22]BLUMENTHAL D B,BORIA N,GAMPER J,et al.Comparing heuristics for graph edit distance computation[J].The VLDB Journal,2020,29(1):419-458.
[23]QIN J,XIAO C,WANG Y,et al.Generalizing the pigeonhole principle for similarity search in hamming space[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2019,33(2):489-505.
[24]PIBIRI G E,VENTURINI R.Techniques for inverted indexcompression[J].ACM Computing Surveys,2021,52(6):125:1-125:36.
[25]PIBIRI G E,VENTURINI R.On optimally partitioning variable-byte codes[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2020,32(9):1812-1823.
[26]GIULI O,ERMANN O,PIBIR I,et al.Clustered Elias-Fano indexes[J].ACM Transactions on Information Systems,2018,36(1):2.1-2.33.
[1] LU Yuhan, CHEN Liquan, WANG Yu, HU Zhiyuan. Efficient Encrypted Image Content Retrieval System Based on Secure CNN [J]. Computer Science, 2023, 50(9): 26-34.
[2] WANG Jing, ZHANG Miao, LIU Yang, LI Haoling, LI Haotian, WANG Bailing, WEI Yuliang. Study on Dual-security Knowledge Graph for Process Industrial Control [J]. Computer Science, 2023, 50(9): 68-74.
[3] YANG Yi, SHEN Sheng, DOU Zhiyang, LI Yuan, HAN Zhenjun. Tiny Person Detection for Intelligent Video Surveillance [J]. Computer Science, 2023, 50(9): 75-81.
[4] CHEN Yunliang, LIU Hao, ZHU Guishui, HUANG Xiaohui, CHEN Xiaodao, WANG Lizhe. Study on Supervised Learning Model for Optimal Histogram Solution [J]. Computer Science, 2023, 50(9): 145-151.
[5] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[6] WANG Wei, DU Xiangcheng, JIN Cheng. Image Relighting Network Based on Context-gated Residuals and Multi-scale Attention [J]. Computer Science, 2023, 50(9): 168-175.
[7] WANG Luo, LI Biao, FU Ruigang. Infrared Ground Multi-object Tracking Method Based on Improved ByteTrack Algorithm [J]. Computer Science, 2023, 50(9): 176-183.
[8] HU Shen, QIAN Yuhua, WANG Jieting, LI Feijiang, LYU Wei. Super Multi-class Deep Image Clustering Model Based on Contrastive Learning [J]. Computer Science, 2023, 50(9): 192-201.
[9] BAI Zhengyao, XU Zhu, ZHANG Yihan. Deep Artificial Correspondence Generation for 3D Point Cloud Registration [J]. Computer Science, 2023, 50(9): 210-219.
[10] LIU Peigang, SUN Jie, YANG Chaozhi, LI Zongmin. Crowd Counting Based on Multi-scale Feature Aggregation in Dense Scenes [J]. Computer Science, 2023, 50(9): 235-241.
[11] ZHAI Lizhi, LI Ruixiang, YANG Jiabei, RAO Yuan, ZHANG Qitan, ZHOU Yun. Overview About Composite Semantic-based Event Graph Construction [J]. Computer Science, 2023, 50(9): 242-259.
[12] LIU Yubo, GUO Bin, MA Ke, QIU Chen, LIU Sicong. Design of Visual Context-driven Interactive Bot System [J]. Computer Science, 2023, 50(9): 260-268.
[13] AN Haojia, SHI Dianxi, LI lin, SUN Yixuan, YANG Shaowu, CHEN Xucan. TAMP:A Hierarchical Multi-robot Task Assignment Method for Area Coverage [J]. Computer Science, 2023, 50(9): 269-277.
[14] YI Liu, GENG Xinyu, BAI Jing. Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion [J]. Computer Science, 2023, 50(9): 278-286.
[15] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!