Computer Science ›› 2024, Vol. 51 ›› Issue (5): 70-84.doi: 10.11896/jsjkx.230300003

• Database & Big Data & Data Science • Previous Articles     Next Articles

Diversified Top-k Pattern Mining on Large Graphs

HE Yuang, WANG Xin, SHEN Lingzhen   

  1. School of Computer Science,Southwest Petroleum University,Chengdu 610500,China
  • Received:2023-02-28 Revised:2023-06-21 Online:2024-05-15 Published:2024-05-08
  • About author:HE Yuang,born in 1996,postgraduate.His main research interests include data mining and machine learning.
    WANG Xin,born in 1981,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.D0922M).His main research interests include knowledge discovery in database,artificial intelligence,machine learning and data mining.
  • Supported by:
    Foundation of Sichuan Innovational Talents of Scientific and Technology(2022JDRC0009).

Abstract: Frequent pattern mining(FPM) is one of the most important problems in graph mining.The FPM problem is defined as mining all the patterns,with frequency above a user-defined threshold in a large graph.In recent years,with the popularity of social networks and so on,single-graph-based FPM has received more and more attention.Investigators have developed considerable techniques,while most of them suffer from high computational cost,inconvenient result inspection and inconvenient in parallel computation.To tackle the issues,this paper proposes an approach to discover diversified top-k patterns from singe large graphs.This paper first designs a diversification function to measure the diversity of patterns,then develops a distributed algorithm with early termination property named DisTopk,to efficiently identify diversified top-k patterns,from distributive stored graphs.Expe-rimental results conducted on real-life and synthetic graphs show that DisTopk can mine diversified top-k patterns more efficiently than traditional algorithms.

Key words: Frequent pattern mining, Top-k patterns, Result diversification, Distributed mining, Early termination

CLC Number: 

  • TP311
[1]HUAN J,WANG W,PRINS J,et al.SPIN:mining maximal frequent subgraphs from graph databases[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2004:581-586.
[2]YAN X F,HAN J W.CloseGraph:mining closed frequent graph patterns[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2003:286-295.
[3]ZHU F,QU Q,LO D,et al.Mining top-k large structural patterns in a massive network[J].Proceedings of the VLDB Endowment,2020,4(11):807-818.
[4]KURAMOCHI M,KARYPIS G.Finding frequent patterns in a large sparse graph[J].Data mining and knowledge discovery,2005,11(3):243-271.
[5]ELSEIDY M,ABDELHAMID E,SKIADOPOULOS S,et al.GraMi:frequent subgraph and pattern mining in a single large graph[J].Proceedings of the VLDB Endowment,2014,7(7):517-528.
[6]ASHRAF N,HAQUE R R,ISLAM M,et al.WeFreS:weighted frequent subgraph mining in a single large graph[C]//19th Industrial Conference on Data Mining.Ibai Publishing,2019:201-215.
[7]LEN T,VO B,NGUYEN L B Q,et al.Mining weighted subgraphs in a single large graph[J].Information Sciences,2020,514(C):149-165.
[8]LI L,DING P,CHEN H,et al.Frequent pattern mining in big social graphs[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2021,6(3):638-648.
[9]BORGWARDT K M,KRIEGEL H P,WACKERSREUTHERP.Pattern mining in frequent dynamic subgraphs[C]//Sixth International Conference on Data Mining.Piscataway:IEEE,2006:818-822.
[10]WACKERSREUTHER B,WACKERSREUTHER P,OSWALD A,et al.Frequent subgraph discovery in dynamic networks[C]//Proceedings of the Eighth Workshop on Mining and Learning with Graphs.New York:ACM,2010:155-162.
[11]ABDELHAMID E,CANIM M,SADOGHI M,et al.Incremental frequent subgraph mining on large evolving graphs[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(12):2710-2723.
[12]DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[13]MALEWICZ G,AUSTERN M H,BIK A J C,et al.Pregel:a system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.New York:ACM,2010:135-146.
[14]GONZALEZ J E,XIN R S,DAVE A,et al.Graphx:Graph processing in a distributed dataflow framework[C]//11th USENIX Symposium on Operating Systems Design and Implementation.Berkeley:USENIX,2014:599-613.
[15]ZHU X,CHEN W,ZHENG W,et al.Gemini:A computation-centric distributed graph processing system[C]//12th USENIX Symposium on Operating Systems Design and Implementation.Berkeley:USENIX,2016:301-316.
[16]TEIXEIRA C H C,FONSECA A J,SERAFINI M,et al.Arabesque:a system for distributed graph mining[C]//Proceedings of the 25th Symposium on Operating Systems Principles.New York:ACM,2015:425-440.
[17]DIAS V,TEIXEIRA C H C,GUEDES D,et al.Fractal:A general-purpose graph pattern mining system[C]//Proceedings of the 2019 International Conference on Management of Data.New York:ACM,2019:1357-1374.
[18]ABDELHAMID E,ABDELAZIZ I,KALNIS P,et al.Scalemine:Scalable parallel frequent subgraph mining in a single large graph[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.USA:IEEE,2016:716-727.
[19]TALUKDER N,ZAKI M J.A distributed approach for graphmining in massive networks[J].Data mining and knowledge discovery,2016,30(5):1024-1052.
[20]CHEN H,LIU M,ZHAO Y,et al.G-miner:an efficient task-oriented graph mining system[C]//Proceedings of the Thirteenth EuroSys Conference.New York:ACM,2018:1-12.
[21]YAN D,QU W,GUO G,et al.PrefixFPM:a parallel framework for general-purpose mining of frequent and closed patterns[J].The VLDB Journal,2022,31(2):253-286.
[22]CHEN J,QIAN X.Khuzdul:Efficient and Scalable Distributed Graph Pattern Mining Engine[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,Volume 2.New York:ACM,2023:413-426.
[23]PRATEEK A,KHAN A,GOYAL A,et al.Mining top-k pairs of correlated subgraphs in a large network[J].Proceedings of the VLDB Endowment,2020,13(9):1511-1524.
[24]WANG X,LAN Z,HE Y A,et al.A cost-effective approach for mining near-optimal top-k patterns[J].Expert Systems with Applications,2022,202:117262.
[25]NATARAJAN D,RANU S.Resling:a scalable and genericframework to mine top-k representative subgraph patterns[J].Knowledge and Information Systems,2018,54(1):123-149.
[26]DAVASHI R.ITUFP:A fast method for interactive mining of Top-K frequent patterns from uncertain data[J].Expert Systems with Applications,2023,214:119156.
[27]ZENG J,YAN X,HAN M,et al.Fast core-based top-k frequent pattern discovery in knowledge graphs[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).IEEE,2021:936-947.
[28]WANG X,XIANG M,ZHAN H,et al.Distributed Top-k Pattern Mining[C]//Asia-Pacific Web(APWeb) and Web-Age Information Management(WAIM) Joint International Conference on Web and Big Data.Springer,Cham,2021:203-220.
[29]SEMERTZIDIS K,PITOURA E.Top-k durable graph pattern queries on temporal graphs[J].IEEE Transactions on Know-ledge and Data Engineering,2018,31(1):181-194.
[30]GOLLAPUDI S,SHARMA A.An axiomatic approach for result diversification[C]//Proceedings of the 18th International Conference on World Wide Web.New York:ACM,2009:381-390.
[31]VIEIRA M R,RAZENTE H L,BARIONI M C N,et al.Divdb:A system for diversifying query results[J].Proceedings of the VLDB Endowment,2011,4(12):1395-1398.
[32]WANG X,DOU Z,SAKAI T,et al.Evaluating search result diversity using intent hierarchies[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2016:415-424.
[33]HU B,ZHANG Y,CHEN W,et al.Characterizing search intent diversity into click models[C]//Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:17-26.
[34]WELCH M J,CHO J,OLSTON C.Search result diversity for informational queries[C]//Proceedings of the 20th International Conference on World Wide Web.New York:ACM,2011:237-246.
[35]YU C,LAKSHMANAN L,AMER-YAHIA S.It takes variety to make a world:diversification in recommender systems[C]//Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology.New York:ACM,2009:368-378.
[36]HUANG X,CHENG H,LI R H,et al.Top-k structural diversity search in large networks[J].Proceedings of the VLDB Endowment,2013,6(13):1618-1629.
[37]YUAN L,QIN L,LIN X,et al.Diversified top-k clique search[J].The VLDB Journal,2016,25:171-196.
[38]YANG Z,FU A W C,LIU R.Diversified top-k subgraph querying in a large graph[C]//Proceedings of the 2016 International Conference on Management of Data.New York:ACM,2016:1167-1182.
[39]LI J,CAI T,DENG K,et al.Community-diversified influence maximization in social networks[J].Information Systems,2020,92:101522.
[40]CORDELLA L P,FOGGIA P,SANSONE C,et al.A(sub)graph isomorphism algorithm for matching large graphs[J].IEEE transactions on pattern analysis and machine intelligence,2004,26(10):1367-1372.
[41]BRINGMANN B,NIJSSEN S.What is frequent in a singlegraph?[C]//Proceedings of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.Berlin:Springer,2008:858-863.
[42]FIEDLER M,BORGELT C.Subgraph support in a single large graph[C]//7th IEEE International Conference on Data Mining Workshops.USA:IEEE,2007:399-404.
[43]GUDES E,SHIMONY S E,VANETIK N.Discovering frequent graph patterns using disjoint paths[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(11):1441-1456.
[44]KARYPIS G,KUMAR V.Parallel multilevel k-way partitioning scheme for irregular graphs[C]//Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.USA:IEEE Computer Society,1996.
[1] SHA Yuji, WANG Xin, HE Yanxiao, ZHONG Xueyan, FANG Yu. Mining and Application of Frequent Patterns with Counting Quantifiers [J]. Computer Science, 2023, 50(11A): 230100041-12.
[2] WU Cheng-feng, CAI Li, LI Jin, LIANG Yu. Frequent Pattern Mining of Residents’ Travel Based on Multi-source Location Data [J]. Computer Science, 2021, 48(7): 155-163.
[3] DENG Guo-qiang, TANG Min, LIANG Zhuang-chang. Divide-and-Conquer Algorithm for Sparse Polynomial Interpolation [J]. Computer Science, 2019, 46(5): 298-303.
[4] LEI Dong, WANG Tao and MA Yun-fei. Frequent Pattern Mining in Bit Stream Based on AC Algorithm [J]. Computer Science, 2017, 44(1): 128-133.
[5] DING Jian, HAN Meng and LI Juan. Review of Concept Drift Data Streams Mining Techniques [J]. Computer Science, 2016, 43(12): 24-29.
[6] . Advances in Study of Distributed Mining of Data Streams [J]. Computer Science, 2012, 39(1): 1-8.
[7] GAO Ang,YANG Yang,WANG Yue-wei. New Algorithm Research for Mining Workflow Frequent Pattern [J]. Computer Science, 2009, 36(9): 231-233.
[8] WU Jian,LI Xing-ming. Efficient Distributed Mining Algorithm for Alarm Correlation in Communication Networks [J]. Computer Science, 2009, 36(11): 204-207.
[9] DENG Song,WANG Ru-chuan,REN Xun-yi. Distributed Function Mining for GEP on Grid Services [J]. Computer Science, 2009, 36(11): 177-181.
[10] HE Hai-Tao, ZHANG Shi-Ling (College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004). [J]. Computer Science, 2008, 35(3): 200-202.
[11] . [J]. Computer Science, 2006, 33(9): 76-80.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!