计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 11-18.doi: 10.11896/jsjkx.231100049
王瀚橙, 戴海鹏, 陈志鹏, 陈树森, 陈贵海
WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai
摘要: 社区发现是社会网络挖掘领域的基本问题。随着海量数据的迅速产生,传统社区发现算法愈发难以处理大规模社会网络。因此,针对大规模网络设计高效的社区发现算法意义重大。文中提出了一种基于MapReduce和k中心聚类的新型分布式算法。首先,该算法提出“朋友圈系数”技术,该技术可更加准确地度量结点间的距离。其次,该算法提出“两阶段k中心聚类”技术,该技术在选取中心点过程中融入结点中心度启发式信息,可显著优化输出结果的模块度。最后,该算法提出“以模块度为优化目标的社区融合”技术,该技术能够在无先验知识的前提下自动确定网络中的社区数目。实验结果表明,所提算法的社区发现结果模块度明显优于最先进的社区发现算法。例如,相比LPA算法,其将模块度平均提升9.19倍。
中图分类号:
[1]NEWMAN M E J,GIRVAN M.Finding and evaluating community structure in networks[J].Physical Review E,2004,69(2):1-16. [2]GAO Y,ZHANG H L.Survey on community detection method based on random walk[J].Journal on Communications,2023,44(6):198-210. [3]SANGAIAH A K,REZAEI S,JAVADPOUR A,et al.Explai-nable AI in big data intelligence of community detection for digitalization e-healthcare services[J].Applied Soft Computing,2023,136(1):1-10. [4]YANG H L,ZHAO X,CHEN C,et al.Community detection in social network based on node influence expansion[J].Journal of Harbin University of Science and Technology,2023,28(3):10-19. [5]GHAREHCHOPOGH F S.An improved harris hawks optimization algorithm with multi-strategy for community detection in social network[J].Journal of Bionic Engineering,2023,20(3):1175-1197. [6]NAIK D,RAMESH D,GOROJANAM N B.Enhanced link prediction using sentiment attribute and community detection[J].Journal of Ambient Intelligence and Humanized Computing,2023,14(4):4157-4174. [7]MITTAL A,GOEL A.A review on community detection approach for recommender systems[C]//Proceedings of the International Conference on Communication and Electronics Systems.2023:1521-1526. [8]AMIRA A,DERHAB A,KARBAB E B,et al.A survey of malware analysis using community detection algorithms[J].ACM Computing Surveys,2023,56(2):1-29. [9]FORTUNATO S.Community detection in graphs[J].Physics Reports,2010,486(3):75-174. [10]NEWMAN M E J.Fast algorithm for detecting communitystructure in networks[J].Physical Review E,2004,69(6):1-5. [11]RAGHAVAN U N,ALBERT R,KUMARA S.Near linear time algorithm to detect community structures in large-scale networks[J].Physical Review E,2007,76(3):1-11. [12]KERNIGHAN B W,LIN S.An efficient heuristic procedure for partitioning graphs[J].The BellSystem Technical Journal,1970,49(2):291-307. [13]LUO Z G,DING F,JIANG X Z,et al.New progress on community detection in complex networks[J].Journal of National University of Defense Technology,2011,33(1):47-52. [14]SUN H,LIU J,HUANG J,et al.LinkLPA:A link-based label propagation algorithm for overlapping community detection in networks[J].Computational Intelligence,2017,33(2):308-331. [15]FLAKE G W,LAWRENCE S,GILES C L.Efficient identification of web communities[C]//Proceedings of the International Conference on Knowledge Discovery and Data Mining.2000:150-160. [16]NEWMAN M E J.Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences,2006,103(23):8577-8582. [17]CLAUSET A,NEWMAN M E J,MOORE C.Finding community structure in very large networks[J].Physical Review E,2004,70(6):1-6. [18]PIZZUTI C.Evolutionary computation for community detection in networks:a review[J].IEEE Transactions on Evolutionary Computation,2017,22(3):464-483. [19]LIM S,KIM J,LEE J G.BlackHole:Robust community detection inspired by graph drawing[C]//Proceedings of the International Conference on Data Engineering.2016:25-36. [20]SINGH A,GARG S,BATRA S,et al.Probabilistic data structure-based community detection and storage scheme in online social networks[J].Future Generation Computer Systems,2019,94(1):173-184. [21]BLOOM B H.Space/time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426. [22]BENDER M A,FARACH-COLTON M,JOHNSON R,et al.Don't thrash:How to cache your hash on flash[J].Proceedings of the VLDB Endowment,2012,5(11):1627-1637. [23]SHIOKAWA H,FUJIWARA Y,ONIZUKA M.Fast algorithm for modularity-based graph clustering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2013:1170-1176. [24]RIEDY J,BADER D A,MEYERHENKE H.Scalable multi-threaded community detection in social networks[C]//Procee-dings of the International Parallel and Distributed Processing Symposium Workshops & PhD Forum.2012:1619-1628. [25]YANG S,WANG B,ZHAO H,et al.Efficient dense structure mining using MapReduce[C]//Proceedings of the International Conference on Data Mining Workshops.2009:332-337. [26]SHI J,XUE W,WANG W,et al.Scalable community detection in massive social networks using MapReduce[J].IBM Journal of Research and Development,2013,57(3):1-12. [27]ZHANG Y,WANG J,WANG Y,et al.Parallel community de-tection on large networks with propinquity dynamics[C]//Proceedings of the International Conference on Knowledge Disco-very and Data Mining.2009:997-1006. [28]WU Z,GAO G,BU Z,et al.SIMPLE:Simplifying-ensemblingframework for parallel community detection from large networks[J].Cluster Computing,2016,19(1):211-221. [29]GONZALEZ T F.Clustering to minimize the maximum intercluster distance[J].Theoretical Computer Science,1985,38(C):293-306. [30]ZACHARY W W.An information flow model for conflict andfission in small groups[J].Journal ofAnthropological Research,1977,33(4):452-473. [31]LUSSEAU D,SCHNEIDER K,BOISSEAU O J,et al.The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations[J].Behavioral Ecology and Sociobiology,2003,54(4):396-405. [32]GIRVAN M,NEWMAN M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences,2002,99(12):7821-7826. [33]LESKOVEC J,MCAULEY J J.Learning to discover social circles in ego networks[J].Advances in Neural Information Processing Systems,2012,1(1):539-547. [34]ROZEMBERCZKI B,ALLEN C,SARKAR R.Multi-scale attributed node embedding[J].Journal of Complex Networks,2021,9(2):1-22. [35]YANG J,LESKOVEC J.Defining and evaluating network communities based on ground-truth[J].Knowledge and Information Systems,2015,42(1):181-213. [36]CHO E,MYERS S A,LESKOVEC J.Friendship and mobility[C]//Proceedings of the International Conference on Knowledge Discovery and Data Mining.2011:1082-1090. [37]LESKOVEC J,KLEINBERG J,FALOUTSOS C.Graph evolu-tion:Densification and shrinking diameters[J].ACM Transactions on Knowledge Discovery from Data,2007,1(1):2-41. [38]WANG H,DAI H,CHEN S,et al.Filterdata structures:A survey[J].Computer Science,2024,51(1):35-40. [39]WANG H,DAI H,LI M,et al.Bamboo filters:Make resizing smooth[C]//Proceedings of International Conference on Data Engineering.2022:979-991. [40]LI M,CHEN D,DAI H,et al.Seesaw counting filter:An efficient guardian for vulnerable negative keys during dynamic filtering[C]//Proceedings of the ACM Web Conference.2022:2759-2767. [41]WANG H,DAI H,GU R,et al.Wormhole filters:Caching your hash on persistent memory[C]//Proceedings of European Conference on Computer Systems.2024:1-16. [42]DAI H,YU J,LI M,et al.Bloom filter with noisy coding framework for multi-set membership testing[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(7):6710-6724. [43]DAI H,LI M,LIU A X,et al.Finding persistent items in distributed datasets[J].IEEE/ACM Transactions on Networking,2020,28(1):1-14. [44]DAI H,SHAHZAD M,LIU A X,et al.Identifying and estimating persistent items in data streams[J].IEEE/ACM Transactions on Networking,2018,26(6):2429-2442. [45]XIA R,DAI H,DU Z,et al.Mining robust frequent items in data streams[C]//Proceedings of International Conference on Joint Cloud Computing.2020:110-117. [46]LIU J,DAI H,XIA R,et al.DUET:A generic framework forfinding special quadratic elements in data streams[C]//Procee-dings of ACM Web Conference.2022:2989-2997. [47]LI M,DAI H,WANG X,et al.Thresholded monitoring in distributed data streams[C]//Proceedings of International Confe-rence on Distributed Computing Systems.2019:218-227. [48]DAI H,SHAHZAD M,LIU A X,et al.Finding persistent items in data streams[J].Proceedings of the VLDB Endowment,2016,10(4):289-300. [49]FU P,LUO L,GUO D,et al.Jump filter:Dynamic sketch design for big data governance[J].Journal of Software,2023,34(3):1193-1212. [50]XIE K,WEN J,ZHANG D,et al.Bloom filter query algorithm[J].Journal of Software,2009,20(1):96-108. [51]YU S,LI X,WANG H,et al.C_CART:An instance confidence-based decision tree algorithm for classification[J].Intelligent Data Analysis,2021,25(4):929-948. [52]YU S,LI X,WANG H,et al.BIDI:A classification algorithm with instance difficulty invariance[J].Expert Systems with Applications,2021,165(1):1-13. [53]DAI H,LI M,WANG H,et al.Distantly supervised entity lin-king with selection consistency constraint[C]//Proceedings of International Conference on Database Systems for Advanced Applications.2023:784-799. [54]DAI M,XU S,SHAO S,et al.Blockchainbased reliable fog cloud service solution for IIoT[J].Chinese Journal of Electronics,2021,30(2):359-366. [55]Li P,YU X,XU H,et al.Secure localization technology based on dynamic trust management in wireless sensor networks[J].Chinese Journal of Electronics,2021,30(4):759-768. [56]LI L,WU J,HU H,et al.Secure cloud architecture for 5G core network[J].Chinese Journal of Electronics,2021,30(3):516-522. [57]DAI H,WANG H,CHEN Z,et al.Variable-length encodingframework:A generic framework for enhancing the accuracy of approximate membership queries[C]//Proceedings of the 23rd International Conference on Data Mining.2023:61-70. [58]LI M,WANG H,DAI H,et al.A survey of multi-dimensional indexes:Past and future trends[J].IEEE Transactions on Knowledge and Data Engineering,2024,36(1):1-20. [59]GU R,LI S,DAI H,et al.Adaptive online cache capacity optimization via lightweight working set size estimation at scale[C]//Proceedings of USENIX Annual Technical Conference.2023:467-484. [60]DAI H,ZHONG Y,LIU A,et al.Noisy bloom filters for multi-set membership testing[C]//Proceedings of International Conference on Measurement and Modeling of Computer Science.2016:139-151. [61]LI Y,WANG Z,YANG R,et al.Learned bloom filter for multi-key membership testing[C]//Proceedings of the International Conference on Database Systems for Advanced Applications.2023:62-79. [62]DAYAN N,BERCEA I,REVIRIEGO P,et al.InfiniFilter:Expanding filters to infinity and beyond[J].Proceedings of the ACM on Management of Data,2023,1(2):1-27. [63]PANDEY P,CONWAY A,DURIE J,et al.Vector quotient filters:Overcoming the time/space trade-off in filter design[C]//Proceedings of the International Conference on Management of Data.2021:1386-1399. [64]PANDEY P,BENDER M A,JOHNSON R,et al.A general-purpose counting filter:Making every bit count[C]//Proceedings of the International Conference on Management of Data.2017:775-787. [65]HUANG K,YANG T.Tagged cuckoo filters[C]//Proceedings of the International Conference on Computer Communications and Networks.2021:1-10. [66]ZENG Z,XIAO R,LIN X,et al.Double locality sensitive ha-shing Bloom filter for high-dimensional streaming anomaly detection[J].Information Processing and Management,2023,60(3):1-18. [67]QU Y,SUN H,DONG C,et al.Elastic collaborative edge intelligence for UAV swarm:Architecture,challenges,and opportunities[J].IEEE Communications Magazine,2024,62(1):62-68. [68]HE S,SHI K,LIU C,et al.Collaborative sensing in internet of things:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2022,24(3):1435-1474. [69]DENG X,JIANG Y,YANG L T,et al.Data fusion based cove-rage optimization in heterogeneous sensor networks:A survey[J].Information Fusion,2019,52(1):90-105. |
|