计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 70-84.doi: 10.11896/jsjkx.230300003

• 数据库&大数据&数据科学 • 上一篇    下一篇

大图中多样化Top-k模式挖掘算法研究

何宇昂, 王欣, 沈玲珍   

  1. 西南石油大学计算机科学学院 成都 61050
  • 收稿日期:2023-02-28 修回日期:2023-06-21 出版日期:2024-05-15 发布日期:2024-05-08
  • 通讯作者: 王欣(xinwang@swpu.edu.cn)
  • 作者简介:(202022000311@stu.swpu.edu.cn)
  • 基金资助:
    四川省科技创新人才基金(2022JDRC0009)

Diversified Top-k Pattern Mining on Large Graphs

HE Yuang, WANG Xin, SHEN Lingzhen   

  1. School of Computer Science,Southwest Petroleum University,Chengdu 610500,China
  • Received:2023-02-28 Revised:2023-06-21 Online:2024-05-15 Published:2024-05-08
  • About author:HE Yuang,born in 1996,postgraduate.His main research interests include data mining and machine learning.
    WANG Xin,born in 1981,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.D0922M).His main research interests include knowledge discovery in database,artificial intelligence,machine learning and data mining.
  • Supported by:
    Foundation of Sichuan Innovational Talents of Scientific and Technology(2022JDRC0009).

摘要: 频繁模式挖掘(Frequent Pattern Mining,FPM)是图数据挖掘领域的一项重要任务。该任务的目标是从图数据中找到出现频次大于给定阈值的所有模式。近年来,随着社交网络等大规模图数据的涌现,单一大图上的FPM问题受到广泛关注,并得到了较为充分的研究,取得了一系列研究成果。然而,已有技术大都存在着计算成本高、挖掘结果理解困难以及并行计算难等问题。针对上述问题,文中提出了一种从大规模图数据中挖掘多样化top-k模式的方法。首先设计了一个多样化函数,用于度量模式集合的多样性;随后设计了一种面向分布式图数据,具有提前终止特性的分布式挖掘算法DisTopk,以实现多样化top-k模式高效挖掘。在真实图数据和合成图数据上进行了大量实验,结果表明,与传统分布式挖掘算法相比,DisTopk算法能更高效地挖掘多样化top-k模式。

关键词: 频繁模式挖掘, Top-k模式, 结果多样性, 分布式挖掘, 提前终止

Abstract: Frequent pattern mining(FPM) is one of the most important problems in graph mining.The FPM problem is defined as mining all the patterns,with frequency above a user-defined threshold in a large graph.In recent years,with the popularity of social networks and so on,single-graph-based FPM has received more and more attention.Investigators have developed considerable techniques,while most of them suffer from high computational cost,inconvenient result inspection and inconvenient in parallel computation.To tackle the issues,this paper proposes an approach to discover diversified top-k patterns from singe large graphs.This paper first designs a diversification function to measure the diversity of patterns,then develops a distributed algorithm with early termination property named DisTopk,to efficiently identify diversified top-k patterns,from distributive stored graphs.Expe-rimental results conducted on real-life and synthetic graphs show that DisTopk can mine diversified top-k patterns more efficiently than traditional algorithms.

Key words: Frequent pattern mining, Top-k patterns, Result diversification, Distributed mining, Early termination

中图分类号: 

  • TP311
[1]HUAN J,WANG W,PRINS J,et al.SPIN:mining maximal frequent subgraphs from graph databases[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2004:581-586.
[2]YAN X F,HAN J W.CloseGraph:mining closed frequent graph patterns[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2003:286-295.
[3]ZHU F,QU Q,LO D,et al.Mining top-k large structural patterns in a massive network[J].Proceedings of the VLDB Endowment,2020,4(11):807-818.
[4]KURAMOCHI M,KARYPIS G.Finding frequent patterns in a large sparse graph[J].Data mining and knowledge discovery,2005,11(3):243-271.
[5]ELSEIDY M,ABDELHAMID E,SKIADOPOULOS S,et al.GraMi:frequent subgraph and pattern mining in a single large graph[J].Proceedings of the VLDB Endowment,2014,7(7):517-528.
[6]ASHRAF N,HAQUE R R,ISLAM M,et al.WeFreS:weighted frequent subgraph mining in a single large graph[C]//19th Industrial Conference on Data Mining.Ibai Publishing,2019:201-215.
[7]LEN T,VO B,NGUYEN L B Q,et al.Mining weighted subgraphs in a single large graph[J].Information Sciences,2020,514(C):149-165.
[8]LI L,DING P,CHEN H,et al.Frequent pattern mining in big social graphs[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2021,6(3):638-648.
[9]BORGWARDT K M,KRIEGEL H P,WACKERSREUTHERP.Pattern mining in frequent dynamic subgraphs[C]//Sixth International Conference on Data Mining.Piscataway:IEEE,2006:818-822.
[10]WACKERSREUTHER B,WACKERSREUTHER P,OSWALD A,et al.Frequent subgraph discovery in dynamic networks[C]//Proceedings of the Eighth Workshop on Mining and Learning with Graphs.New York:ACM,2010:155-162.
[11]ABDELHAMID E,CANIM M,SADOGHI M,et al.Incremental frequent subgraph mining on large evolving graphs[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(12):2710-2723.
[12]DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[13]MALEWICZ G,AUSTERN M H,BIK A J C,et al.Pregel:a system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.New York:ACM,2010:135-146.
[14]GONZALEZ J E,XIN R S,DAVE A,et al.Graphx:Graph processing in a distributed dataflow framework[C]//11th USENIX Symposium on Operating Systems Design and Implementation.Berkeley:USENIX,2014:599-613.
[15]ZHU X,CHEN W,ZHENG W,et al.Gemini:A computation-centric distributed graph processing system[C]//12th USENIX Symposium on Operating Systems Design and Implementation.Berkeley:USENIX,2016:301-316.
[16]TEIXEIRA C H C,FONSECA A J,SERAFINI M,et al.Arabesque:a system for distributed graph mining[C]//Proceedings of the 25th Symposium on Operating Systems Principles.New York:ACM,2015:425-440.
[17]DIAS V,TEIXEIRA C H C,GUEDES D,et al.Fractal:A general-purpose graph pattern mining system[C]//Proceedings of the 2019 International Conference on Management of Data.New York:ACM,2019:1357-1374.
[18]ABDELHAMID E,ABDELAZIZ I,KALNIS P,et al.Scalemine:Scalable parallel frequent subgraph mining in a single large graph[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.USA:IEEE,2016:716-727.
[19]TALUKDER N,ZAKI M J.A distributed approach for graphmining in massive networks[J].Data mining and knowledge discovery,2016,30(5):1024-1052.
[20]CHEN H,LIU M,ZHAO Y,et al.G-miner:an efficient task-oriented graph mining system[C]//Proceedings of the Thirteenth EuroSys Conference.New York:ACM,2018:1-12.
[21]YAN D,QU W,GUO G,et al.PrefixFPM:a parallel framework for general-purpose mining of frequent and closed patterns[J].The VLDB Journal,2022,31(2):253-286.
[22]CHEN J,QIAN X.Khuzdul:Efficient and Scalable Distributed Graph Pattern Mining Engine[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,Volume 2.New York:ACM,2023:413-426.
[23]PRATEEK A,KHAN A,GOYAL A,et al.Mining top-k pairs of correlated subgraphs in a large network[J].Proceedings of the VLDB Endowment,2020,13(9):1511-1524.
[24]WANG X,LAN Z,HE Y A,et al.A cost-effective approach for mining near-optimal top-k patterns[J].Expert Systems with Applications,2022,202:117262.
[25]NATARAJAN D,RANU S.Resling:a scalable and genericframework to mine top-k representative subgraph patterns[J].Knowledge and Information Systems,2018,54(1):123-149.
[26]DAVASHI R.ITUFP:A fast method for interactive mining of Top-K frequent patterns from uncertain data[J].Expert Systems with Applications,2023,214:119156.
[27]ZENG J,YAN X,HAN M,et al.Fast core-based top-k frequent pattern discovery in knowledge graphs[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).IEEE,2021:936-947.
[28]WANG X,XIANG M,ZHAN H,et al.Distributed Top-k Pattern Mining[C]//Asia-Pacific Web(APWeb) and Web-Age Information Management(WAIM) Joint International Conference on Web and Big Data.Springer,Cham,2021:203-220.
[29]SEMERTZIDIS K,PITOURA E.Top-k durable graph pattern queries on temporal graphs[J].IEEE Transactions on Know-ledge and Data Engineering,2018,31(1):181-194.
[30]GOLLAPUDI S,SHARMA A.An axiomatic approach for result diversification[C]//Proceedings of the 18th International Conference on World Wide Web.New York:ACM,2009:381-390.
[31]VIEIRA M R,RAZENTE H L,BARIONI M C N,et al.Divdb:A system for diversifying query results[J].Proceedings of the VLDB Endowment,2011,4(12):1395-1398.
[32]WANG X,DOU Z,SAKAI T,et al.Evaluating search result diversity using intent hierarchies[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2016:415-424.
[33]HU B,ZHANG Y,CHEN W,et al.Characterizing search intent diversity into click models[C]//Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:17-26.
[34]WELCH M J,CHO J,OLSTON C.Search result diversity for informational queries[C]//Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:237-246.
[35]YU C,LAKSHMANAN L,AMER-YAHIA S.It takes variety to make a world:diversification in recommender systems[C]//Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology.New York:ACM,2009:368-378.
[36]HUANG X,CHENG H,LI R H,et al.Top-k structural diversity search in large networks[J].Proceedings of the VLDB Endowment,2013,6(13):1618-1629.
[37]YUAN L,QIN L,LIN X,et al.Diversified top-k clique search[J].The VLDB Journal,2016,25:171-196.
[38]YANG Z,FU A W C,LIU R.Diversified top-k subgraph querying in a large graph[C]//Proceedings of the 2016 International Conference on Management of Data.New York:ACM,2016:1167-1182.
[39]LI J,CAI T,DENG K,et al.Community-diversified influence maximization in social networks[J].Information Systems,2020,92:101522.
[40]CORDELLA L P,FOGGIA P,SANSONE C,et al.A(sub)graph isomorphism algorithm for matching large graphs[J].IEEE transactions on pattern analysis and machine intelligence,2004,26(10):1367-1372.
[41]BRINGMANN B,NIJSSEN S.What is frequent in a singlegraph?[C]//Proceedings of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.Berlin:Springer,2008:858-863.
[42]FIEDLER M,BORGELT C.Subgraph support in a single large graph[C]//7th IEEE International Conference on Data Mining Workshops.USA:IEEE,2007:399-404.
[43]GUDES E,SHIMONY S E,VANETIK N.Discovering frequent graph patterns using disjoint paths[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(11):1441-1456.
[44]KARYPIS G,KUMAR V.Parallel multilevel k-way partitioning scheme for irregular graphs[C]//Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.USA:IEEE Computer Society,1996.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!