计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 70-84.doi: 10.11896/jsjkx.230300003
何宇昂, 王欣, 沈玲珍
HE Yuang, WANG Xin, SHEN Lingzhen
摘要: 频繁模式挖掘(Frequent Pattern Mining,FPM)是图数据挖掘领域的一项重要任务。该任务的目标是从图数据中找到出现频次大于给定阈值的所有模式。近年来,随着社交网络等大规模图数据的涌现,单一大图上的FPM问题受到广泛关注,并得到了较为充分的研究,取得了一系列研究成果。然而,已有技术大都存在着计算成本高、挖掘结果理解困难以及并行计算难等问题。针对上述问题,文中提出了一种从大规模图数据中挖掘多样化top-k模式的方法。首先设计了一个多样化函数,用于度量模式集合的多样性;随后设计了一种面向分布式图数据,具有提前终止特性的分布式挖掘算法DisTopk,以实现多样化top-k模式高效挖掘。在真实图数据和合成图数据上进行了大量实验,结果表明,与传统分布式挖掘算法相比,DisTopk算法能更高效地挖掘多样化top-k模式。
中图分类号:
[1]HUAN J,WANG W,PRINS J,et al.SPIN:mining maximal frequent subgraphs from graph databases[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2004:581-586. [2]YAN X F,HAN J W.CloseGraph:mining closed frequent graph patterns[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2003:286-295. [3]ZHU F,QU Q,LO D,et al.Mining top-k large structural patterns in a massive network[J].Proceedings of the VLDB Endowment,2020,4(11):807-818. [4]KURAMOCHI M,KARYPIS G.Finding frequent patterns in a large sparse graph[J].Data mining and knowledge discovery,2005,11(3):243-271. [5]ELSEIDY M,ABDELHAMID E,SKIADOPOULOS S,et al.GraMi:frequent subgraph and pattern mining in a single large graph[J].Proceedings of the VLDB Endowment,2014,7(7):517-528. [6]ASHRAF N,HAQUE R R,ISLAM M,et al.WeFreS:weighted frequent subgraph mining in a single large graph[C]//19th Industrial Conference on Data Mining.Ibai Publishing,2019:201-215. [7]LEN T,VO B,NGUYEN L B Q,et al.Mining weighted subgraphs in a single large graph[J].Information Sciences,2020,514(C):149-165. [8]LI L,DING P,CHEN H,et al.Frequent pattern mining in big social graphs[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2021,6(3):638-648. [9]BORGWARDT K M,KRIEGEL H P,WACKERSREUTHERP.Pattern mining in frequent dynamic subgraphs[C]//Sixth International Conference on Data Mining.Piscataway:IEEE,2006:818-822. [10]WACKERSREUTHER B,WACKERSREUTHER P,OSWALD A,et al.Frequent subgraph discovery in dynamic networks[C]//Proceedings of the Eighth Workshop on Mining and Learning with Graphs.New York:ACM,2010:155-162. [11]ABDELHAMID E,CANIM M,SADOGHI M,et al.Incremental frequent subgraph mining on large evolving graphs[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(12):2710-2723. [12]DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113. [13]MALEWICZ G,AUSTERN M H,BIK A J C,et al.Pregel:a system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.New York:ACM,2010:135-146. [14]GONZALEZ J E,XIN R S,DAVE A,et al.Graphx:Graph processing in a distributed dataflow framework[C]//11th USENIX Symposium on Operating Systems Design and Implementation.Berkeley:USENIX,2014:599-613. [15]ZHU X,CHEN W,ZHENG W,et al.Gemini:A computation-centric distributed graph processing system[C]//12th USENIX Symposium on Operating Systems Design and Implementation.Berkeley:USENIX,2016:301-316. [16]TEIXEIRA C H C,FONSECA A J,SERAFINI M,et al.Arabesque:a system for distributed graph mining[C]//Proceedings of the 25th Symposium on Operating Systems Principles.New York:ACM,2015:425-440. [17]DIAS V,TEIXEIRA C H C,GUEDES D,et al.Fractal:A general-purpose graph pattern mining system[C]//Proceedings of the 2019 International Conference on Management of Data.New York:ACM,2019:1357-1374. [18]ABDELHAMID E,ABDELAZIZ I,KALNIS P,et al.Scalemine:Scalable parallel frequent subgraph mining in a single large graph[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.USA:IEEE,2016:716-727. [19]TALUKDER N,ZAKI M J.A distributed approach for graphmining in massive networks[J].Data mining and knowledge discovery,2016,30(5):1024-1052. [20]CHEN H,LIU M,ZHAO Y,et al.G-miner:an efficient task-oriented graph mining system[C]//Proceedings of the Thirteenth EuroSys Conference.New York:ACM,2018:1-12. [21]YAN D,QU W,GUO G,et al.PrefixFPM:a parallel framework for general-purpose mining of frequent and closed patterns[J].The VLDB Journal,2022,31(2):253-286. [22]CHEN J,QIAN X.Khuzdul:Efficient and Scalable Distributed Graph Pattern Mining Engine[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,Volume 2.New York:ACM,2023:413-426. [23]PRATEEK A,KHAN A,GOYAL A,et al.Mining top-k pairs of correlated subgraphs in a large network[J].Proceedings of the VLDB Endowment,2020,13(9):1511-1524. [24]WANG X,LAN Z,HE Y A,et al.A cost-effective approach for mining near-optimal top-k patterns[J].Expert Systems with Applications,2022,202:117262. [25]NATARAJAN D,RANU S.Resling:a scalable and genericframework to mine top-k representative subgraph patterns[J].Knowledge and Information Systems,2018,54(1):123-149. [26]DAVASHI R.ITUFP:A fast method for interactive mining of Top-K frequent patterns from uncertain data[J].Expert Systems with Applications,2023,214:119156. [27]ZENG J,YAN X,HAN M,et al.Fast core-based top-k frequent pattern discovery in knowledge graphs[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).IEEE,2021:936-947. [28]WANG X,XIANG M,ZHAN H,et al.Distributed Top-k Pattern Mining[C]//Asia-Pacific Web(APWeb) and Web-Age Information Management(WAIM) Joint International Conference on Web and Big Data.Springer,Cham,2021:203-220. [29]SEMERTZIDIS K,PITOURA E.Top-k durable graph pattern queries on temporal graphs[J].IEEE Transactions on Know-ledge and Data Engineering,2018,31(1):181-194. [30]GOLLAPUDI S,SHARMA A.An axiomatic approach for result diversification[C]//Proceedings of the 18th International Conference on World Wide Web.New York:ACM,2009:381-390. [31]VIEIRA M R,RAZENTE H L,BARIONI M C N,et al.Divdb:A system for diversifying query results[J].Proceedings of the VLDB Endowment,2011,4(12):1395-1398. [32]WANG X,DOU Z,SAKAI T,et al.Evaluating search result diversity using intent hierarchies[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2016:415-424. [33]HU B,ZHANG Y,CHEN W,et al.Characterizing search intent diversity into click models[C]//Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:17-26. [34]WELCH M J,CHO J,OLSTON C.Search result diversity for informational queries[C]//Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:237-246. [35]YU C,LAKSHMANAN L,AMER-YAHIA S.It takes variety to make a world:diversification in recommender systems[C]//Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology.New York:ACM,2009:368-378. [36]HUANG X,CHENG H,LI R H,et al.Top-k structural diversity search in large networks[J].Proceedings of the VLDB Endowment,2013,6(13):1618-1629. [37]YUAN L,QIN L,LIN X,et al.Diversified top-k clique search[J].The VLDB Journal,2016,25:171-196. [38]YANG Z,FU A W C,LIU R.Diversified top-k subgraph querying in a large graph[C]//Proceedings of the 2016 International Conference on Management of Data.New York:ACM,2016:1167-1182. [39]LI J,CAI T,DENG K,et al.Community-diversified influence maximization in social networks[J].Information Systems,2020,92:101522. [40]CORDELLA L P,FOGGIA P,SANSONE C,et al.A(sub)graph isomorphism algorithm for matching large graphs[J].IEEE transactions on pattern analysis and machine intelligence,2004,26(10):1367-1372. [41]BRINGMANN B,NIJSSEN S.What is frequent in a singlegraph?[C]//Proceedings of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.Berlin:Springer,2008:858-863. [42]FIEDLER M,BORGELT C.Subgraph support in a single large graph[C]//7th IEEE International Conference on Data Mining Workshops.USA:IEEE,2007:399-404. [43]GUDES E,SHIMONY S E,VANETIK N.Discovering frequent graph patterns using disjoint paths[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(11):1441-1456. [44]KARYPIS G,KUMAR V.Parallel multilevel k-way partitioning scheme for irregular graphs[C]//Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.USA:IEEE Computer Society,1996. |
|