Computer Science ›› 2024, Vol. 51 ›› Issue (4): 11-18.doi: 10.11896/jsjkx.231100049

• Compact Data Structure • Previous Articles     Next Articles

Large-scale Network Community Detection Algorithm Based on MapReduce

WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai   

  1. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2023-11-06 Revised:2024-01-05 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62272223,U22A2031,61872178).

Abstract: Community detection is a fundamental problem in the field of social network mining.With the rapid generation of massive data,traditional community detection algorithms are becoming increasingly difficult to handle large-scale social networks.Therefore,it is of great significance to design efficient community detection algorithms for large-scale networks.This paper proposes a new distributed algorithm based on MapReduce and k-center clustering.Firstly,the algorithm proposes the “friend circle coefficient” technique,which can measure the distance between nodes more accurately.Secondly,the algorithm proposes the “two-stage k-center clustering” technique,which incorporates node centrality heuristic information into the process of selecting center points and can significantly optimize the modularity of the results.Finally,the algorithm proposes a “community fusion method with modularity as the optimization goal” technique,which can automatically determine the number of communities in the network without prior knowledge.The evaluation results show that the proposed algorithm significantly outperforms the state-of-the-art community discovery algorithms in terms of modularity.For example,compared with the LPA algorithm,the proposed algorithm increases the modularity by an average of 9.19 times.

Key words: Community detection, k-center clustering, Distributed computing, Data mining, Big data

CLC Number: 

  • TP391
[1]NEWMAN M E J,GIRVAN M.Finding and evaluating community structure in networks[J].Physical Review E,2004,69(2):1-16.
[2]GAO Y,ZHANG H L.Survey on community detection method based on random walk[J].Journal on Communications,2023,44(6):198-210.
[3]SANGAIAH A K,REZAEI S,JAVADPOUR A,et al.Explai-nable AI in big data intelligence of community detection for digitalization e-healthcare services[J].Applied Soft Computing,2023,136(1):1-10.
[4]YANG H L,ZHAO X,CHEN C,et al.Community detection in social network based on node influence expansion[J].Journal of Harbin University of Science and Technology,2023,28(3):10-19.
[5]GHAREHCHOPOGH F S.An improved harris hawks optimization algorithm with multi-strategy for community detection in social network[J].Journal of Bionic Engineering,2023,20(3):1175-1197.
[6]NAIK D,RAMESH D,GOROJANAM N B.Enhanced link prediction using sentiment attribute and community detection[J].Journal of Ambient Intelligence and Humanized Computing,2023,14(4):4157-4174.
[7]MITTAL A,GOEL A.A review on community detection approach for recommender systems[C]//Proceedings of the International Conference on Communication and Electronics Systems.2023:1521-1526.
[8]AMIRA A,DERHAB A,KARBAB E B,et al.A survey of malware analysis using community detection algorithms[J].ACM Computing Surveys,2023,56(2):1-29.
[9]FORTUNATO S.Community detection in graphs[J].Physics Reports,2010,486(3):75-174.
[10]NEWMAN M E J.Fast algorithm for detecting communitystructure in networks[J].Physical Review E,2004,69(6):1-5.
[11]RAGHAVAN U N,ALBERT R,KUMARA S.Near linear time algorithm to detect community structures in large-scale networks[J].Physical Review E,2007,76(3):1-11.
[12]KERNIGHAN B W,LIN S.An efficient heuristic procedure for partitioning graphs[J].The BellSystem Technical Journal,1970,49(2):291-307.
[13]LUO Z G,DING F,JIANG X Z,et al.New progress on community detection in complex networks[J].Journal of National University of Defense Technology,2011,33(1):47-52.
[14]SUN H,LIU J,HUANG J,et al.LinkLPA:A link-based label propagation algorithm for overlapping community detection in networks[J].Computational Intelligence,2017,33(2):308-331.
[15]FLAKE G W,LAWRENCE S,GILES C L.Efficient identification of web communities[C]//Proceedings of the International Conference on Knowledge Discovery and Data Mining.2000:150-160.
[16]NEWMAN M E J.Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences,2006,103(23):8577-8582.
[17]CLAUSET A,NEWMAN M E J,MOORE C.Finding community structure in very large networks[J].Physical Review E,2004,70(6):1-6.
[18]PIZZUTI C.Evolutionary computation for community detection in networks:a review[J].IEEE Transactions on Evolutionary Computation,2017,22(3):464-483.
[19]LIM S,KIM J,LEE J G.BlackHole:Robust community detection inspired by graph drawing[C]//Proceedings of the International Conference on Data Engineering.2016:25-36.
[20]SINGH A,GARG S,BATRA S,et al.Probabilistic data structure-based community detection and storage scheme in online social networks[J].Future Generation Computer Systems,2019,94(1):173-184.
[21]BLOOM B H.Space/time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426.
[22]BENDER M A,FARACH-COLTON M,JOHNSON R,et al.Don't thrash:How to cache your hash on flash[J].Proceedings of the VLDB Endowment,2012,5(11):1627-1637.
[23]SHIOKAWA H,FUJIWARA Y,ONIZUKA M.Fast algorithm for modularity-based graph clustering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2013:1170-1176.
[24]RIEDY J,BADER D A,MEYERHENKE H.Scalable multi-threaded community detection in social networks[C]//Procee-dings of the International Parallel and Distributed Processing Symposium Workshops & PhD Forum.2012:1619-1628.
[25]YANG S,WANG B,ZHAO H,et al.Efficient dense structure mining using MapReduce[C]//Proceedings of the International Conference on Data Mining Workshops.2009:332-337.
[26]SHI J,XUE W,WANG W,et al.Scalable community detection in massive social networks using MapReduce[J].IBM Journal of Research and Development,2013,57(3):1-12.
[27]ZHANG Y,WANG J,WANG Y,et al.Parallel community de-tection on large networks with propinquity dynamics[C]//Proceedings of the International Conference on Knowledge Disco-very and Data Mining.2009:997-1006.
[28]WU Z,GAO G,BU Z,et al.SIMPLE:Simplifying-ensemblingframework for parallel community detection from large networks[J].Cluster Computing,2016,19(1):211-221.
[29]GONZALEZ T F.Clustering to minimize the maximum intercluster distance[J].Theoretical Computer Science,1985,38(C):293-306.
[30]ZACHARY W W.An information flow model for conflict andfission in small groups[J].Journal ofAnthropological Research,1977,33(4):452-473.
[31]LUSSEAU D,SCHNEIDER K,BOISSEAU O J,et al.The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations[J].Behavioral Ecology and Sociobiology,2003,54(4):396-405.
[32]GIRVAN M,NEWMAN M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences,2002,99(12):7821-7826.
[33]LESKOVEC J,MCAULEY J J.Learning to discover social circles in ego networks[J].Advances in Neural Information Processing Systems,2012,1(1):539-547.
[34]ROZEMBERCZKI B,ALLEN C,SARKAR R.Multi-scale attributed node embedding[J].Journal of Complex Networks,2021,9(2):1-22.
[35]YANG J,LESKOVEC J.Defining and evaluating network communities based on ground-truth[J].Knowledge and Information Systems,2015,42(1):181-213.
[36]CHO E,MYERS S A,LESKOVEC J.Friendship and mobility[C]//Proceedings of the International Conference on Knowledge Discovery and Data Mining.2011:1082-1090.
[37]LESKOVEC J,KLEINBERG J,FALOUTSOS C.Graph evolu-tion:Densification and shrinking diameters[J].ACM Transactions on Knowledge Discovery from Data,2007,1(1):2-41.
[38]WANG H,DAI H,CHEN S,et al.Filterdata structures:A survey[J].Computer Science,2024,51(1):35-40.
[39]WANG H,DAI H,LI M,et al.Bamboo filters:Make resizing smooth[C]//Proceedings of International Conference on Data Engineering.2022:979-991.
[40]LI M,CHEN D,DAI H,et al.Seesaw counting filter:An efficient guardian for vulnerable negative keys during dynamic filtering[C]//Proceedings of the ACM Web Conference.2022:2759-2767.
[41]WANG H,DAI H,GU R,et al.Wormhole filters:Caching your hash on persistent memory[C]//Proceedings of European Conference on Computer Systems.2024:1-16.
[42]DAI H,YU J,LI M,et al.Bloom filter with noisy coding framework for multi-set membership testing[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(7):6710-6724.
[43]DAI H,LI M,LIU A X,et al.Finding persistent items in distributed datasets[J].IEEE/ACM Transactions on Networking,2020,28(1):1-14.
[44]DAI H,SHAHZAD M,LIU A X,et al.Identifying and estimating persistent items in data streams[J].IEEE/ACM Transactions on Networking,2018,26(6):2429-2442.
[45]XIA R,DAI H,DU Z,et al.Mining robust frequent items in data streams[C]//Proceedings of International Conference on Joint Cloud Computing.2020:110-117.
[46]LIU J,DAI H,XIA R,et al.DUET:A generic framework forfinding special quadratic elements in data streams[C]//Procee-dings of ACM Web Conference.2022:2989-2997.
[47]LI M,DAI H,WANG X,et al.Thresholded monitoring in distributed data streams[C]//Proceedings of International Confe-rence on Distributed Computing Systems.2019:218-227.
[48]DAI H,SHAHZAD M,LIU A X,et al.Finding persistent items in data streams[J].Proceedings of the VLDB Endowment,2016,10(4):289-300.
[49]FU P,LUO L,GUO D,et al.Jump filter:Dynamic sketch design for big data governance[J].Journal of Software,2023,34(3):1193-1212.
[50]XIE K,WEN J,ZHANG D,et al.Bloom filter query algorithm[J].Journal of Software,2009,20(1):96-108.
[51]YU S,LI X,WANG H,et al.C_CART:An instance confidence-based decision tree algorithm for classification[J].Intelligent Data Analysis,2021,25(4):929-948.
[52]YU S,LI X,WANG H,et al.BIDI:A classification algorithm with instance difficulty invariance[J].Expert Systems with Applications,2021,165(1):1-13.
[53]DAI H,LI M,WANG H,et al.Distantly supervised entity lin-king with selection consistency constraint[C]//Proceedings of International Conference on Database Systems for Advanced Applications.2023:784-799.
[54]DAI M,XU S,SHAO S,et al.Blockchainbased reliable fog cloud service solution for IIoT[J].Chinese Journal of Electronics,2021,30(2):359-366.
[55]Li P,YU X,XU H,et al.Secure localization technology based on dynamic trust management in wireless sensor networks[J].Chinese Journal of Electronics,2021,30(4):759-768.
[56]LI L,WU J,HU H,et al.Secure cloud architecture for 5G core network[J].Chinese Journal of Electronics,2021,30(3):516-522.
[57]DAI H,WANG H,CHEN Z,et al.Variable-length encodingframework:A generic framework for enhancing the accuracy of approximate membership queries[C]//Proceedings of the 23rd International Conference on Data Mining.2023:61-70.
[58]LI M,WANG H,DAI H,et al.A survey of multi-dimensional indexes:Past and future trends[J].IEEE Transactions on Knowledge and Data Engineering,2024,36(1):1-20.
[59]GU R,LI S,DAI H,et al.Adaptive online cache capacity optimization via lightweight working set size estimation at scale[C]//Proceedings of USENIX Annual Technical Conference.2023:467-484.
[60]DAI H,ZHONG Y,LIU A,et al.Noisy bloom filters for multi-set membership testing[C]//Proceedings of International Conference on Measurement and Modeling of Computer Science.2016:139-151.
[61]LI Y,WANG Z,YANG R,et al.Learned bloom filter for multi-key membership testing[C]//Proceedings of the International Conference on Database Systems for Advanced Applications.2023:62-79.
[62]DAYAN N,BERCEA I,REVIRIEGO P,et al.InfiniFilter:Expanding filters to infinity and beyond[J].Proceedings of the ACM on Management of Data,2023,1(2):1-27.
[63]PANDEY P,CONWAY A,DURIE J,et al.Vector quotient filters:Overcoming the time/space trade-off in filter design[C]//Proceedings of the International Conference on Management of Data.2021:1386-1399.
[64]PANDEY P,BENDER M A,JOHNSON R,et al.A general-purpose counting filter:Making every bit count[C]//Proceedings of the International Conference on Management of Data.2017:775-787.
[65]HUANG K,YANG T.Tagged cuckoo filters[C]//Proceedings of the International Conference on Computer Communications and Networks.2021:1-10.
[66]ZENG Z,XIAO R,LIN X,et al.Double locality sensitive ha-shing Bloom filter for high-dimensional streaming anomaly detection[J].Information Processing and Management,2023,60(3):1-18.
[67]QU Y,SUN H,DONG C,et al.Elastic collaborative edge intelligence for UAV swarm:Architecture,challenges,and opportunities[J].IEEE Communications Magazine,2024,62(1):62-68.
[68]HE S,SHI K,LIU C,et al.Collaborative sensing in internet of things:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2022,24(3):1435-1474.
[69]DENG X,JIANG Y,YANG L T,et al.Data fusion based cove-rage optimization in heterogeneous sensor networks:A survey[J].Information Fusion,2019,52(1):90-105.
[1] CHEN Xinyang, CHEN Hanze, ZHOU Jiasheng, HUANG Jiaqing, YU Jiashuo, ZHU Longlong, ZHANG Dong. IntervalSketch:Approximate Statistical Method for Interval Items in Data Stream [J]. Computer Science, 2024, 51(4): 4-10.
[2] CHEN Pan, CHEN Hongmei, LUO Chuan. Academic Influence Ranking Algorithm Based on Topic Reputation and Dynamic HeterogeneousNetwork [J]. Computer Science, 2024, 51(3): 81-89.
[3] YAN Jiahe, LI Honghui, MA Ying, LIU Zhen, ZHANG Dalin, JIANG Zhouxian, DUAN Yuhang. Multi-source Heterogeneous Data Fusion Technologies and Government Big Data GovernanceSystem [J]. Computer Science, 2024, 51(2): 1-14.
[4] SHEN Zhehui, WANG Kailai, KONG Xiangjie. Exploring Station Spatio-Temporal Mobility Pattern:A Short and Long-term Traffic Prediction Framework [J]. Computer Science, 2023, 50(7): 98-106.
[5] ZHOU Zhiqiang, ZHU Yan. Local Community Detection Algorithm for Attribute Networks Based on Multi-objective Particle Swarm Optimization [J]. Computer Science, 2023, 50(6A): 220200015-6.
[6] ZHANG Jian, ZHANG Ye. College Students Employment Dynamic Prediction of Multi-feature Fusion Based on GRU-LSTM [J]. Computer Science, 2023, 50(6A): 220500056-6.
[7] FAN Shuhuan, HOU Mengshu. Dataspace:A New Data Organization and Management Model [J]. Computer Science, 2023, 50(5): 115-127.
[8] HU Xuegang, LI Yang, WANG Lei, LI Peipei, YOU Zhuhong. Key Technologies of Intelligent Identification of Biomarkers:Review of Research on Association Prediction Between Circular RNA and Disease [J]. Computer Science, 2023, 50(4): 369-387.
[9] ZENG Xiangyu, LONG Haixia, YANG Xuhua. Community Detection Based on Markov Similarity Enhancement and Network Embedding [J]. Computer Science, 2023, 50(4): 56-62.
[10] JIANG Chuanyu, HAN Xiangyu, YANG Wenrui, LYU Bohan, HUANG Xiaoou, XIE Xia, GU Yang. Survey of Medical Knowledge Graph Research and Application [J]. Computer Science, 2023, 50(3): 83-93.
[11] HAN Qiqi, LIU Xin. Application of Air-Sea Coupled Mode in High-speed Interconnection Environment [J]. Computer Science, 2023, 50(11A): 221000136-5.
[12] ZHAO Xuejian, ZHAO Ke. Bio-inspired Frequent Itemset Mining Strategy Based on Genetic Algorithm [J]. Computer Science, 2023, 50(11A): 220700200-8.
[13] MA Wensheng, HOU Xilin, WANG Hongbo, LIU Sen. Study on Value Calculation of Big Data Based on Granular Tree and Usage Relationship [J]. Computer Science, 2023, 50(11A): 230300109-8.
[14] LU Mingchen, LYU Yanqi, LIU Ruicheng, JIN Peiquan. Fast Storage System for Time-series Big Data Streams Based on Waterwheel Model [J]. Computer Science, 2023, 50(1): 25-33.
[15] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!