计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 39-48.doi: 10.11896/jsjkx.180901644

• 大数据与数据科学* • 上一篇    下一篇

基于拓扑结构的密度峰值重叠社区发现算法

封云飞1, 陈红梅1,2   

  1. (西南交通大学信息科学与技术学院 成都611756)1
    (西南交通大学云计算与智能技术高校重点实验室 成都611756)2
  • 收稿日期:2018-09-04 修回日期:2018-11-14 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 陈红梅(1971-),女,教授,博士生导师,主要研究方向为粒计算与粗糙集、智能信息处理、数据挖掘等,E-mail:hmchen@swjtu.edu.cn。
  • 作者简介:封云飞(1995-),男,硕士生,主要研究方向为数据挖掘、机器学习等。
  • 基金资助:
    本文受国家自然科学基金(61572406)资助。

Topological Structure Based Density Peak Algorithm for Overlapping Community Detection

FENG Yun-fei1, CHEN Hong-mei1,2   

  1. (School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China)1
    (Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China)2
  • Received:2018-09-04 Revised:2018-11-14 Online:2019-10-15 Published:2019-10-21

摘要: 现代网络科学的不断发展,为人们的生活提供了极大的便利。对复杂网络的研究是推动现代网络科学发展的重要动力,而社区是研究复杂网络的重要结构。已有的社区发现方法大多是高度复杂的,这不利于有效挖掘复杂网络。为了研究更高效的社区发现算法,文中将近年来被提出的密度峰值聚类算法应用于社区发现中,对密度峰值算法进行改进,提出了一种高效的社区发现算法。将密度峰值算法应用于社区发现存在一些问题,由于复杂网络数据结构具有特殊性,其数据大多以拓扑图或邻接矩阵的形式存储,因此将密度峰值聚类算法应用到社区发现中的核心问题是如何有效地计算网络中各节点间的距离、节点局部密度和选择中心节点。针对该问题,文中通过网络拓扑图中各节点及其邻居节点的度来计算每一个节点的局部密度,通过节点间的相似度来度量节点间的距离,并对距离进行离散化处理,以便选取社区中心节点;定义了核心跳变值来更精确地选取社区中心,防止大社区吞并小社区;基于LFR人工网络和真实网络数据集,将所提算法与已有算法进行比较,并采用扩展的模块度、调整兰德系数以及归一化互信息对实验结果进行评估。真实网络中的实验结果表明了所提算法具有不错的效果,且在一些真实场景中具有明显优势;在人工网络中,所提算法同样具有优势,同时其相比其他算法更加稳定。

关键词: 密度峰值, 社区发现, 数据挖掘, 拓扑结构, 重叠社区

Abstract: With the continuous development of modern network science,people’s lives have been greatly facilitated.The study on complex networks is an important driving force for the development of modern network science,and the community is an important structure for research on complex networks.Most of the existing community discovery methods are highly complex,so this paper applied the density peak clustering algorithm proposed in recent years to community discovery,and proposed an efficient community discovery algorithm.Due to the particularity of the data structure of complex network,the complex network data are stored in the form of topological graphorad jacency matrix,and thus how to effectively calculate the distance between nodes and the local density of each node,and how to select the core nodes of the community are the key problems when density peak clustering algorithm is applied tocommunity discovery.In light of this,this paper calculated the local density of each node by the degree of each node and its neighbor nodes in the network topological graph,measured the distance between nodes by the similarity between nodes,and discretized the distance to expediently select the core nodes for this algorithm.Moreover,the core hop values were defined to accurately select community centers,thus avoiding the large communities annex small communities.Experiments were carried out on the LFR artificial network dataset and the real network dataset with the evaluation indexes of the extended modularity,the adjusted rand coefficient and the normalized mutual information.Experimental results in real networks show that the proposed algorithm has good effects and obvious advantages in some real networks compared with other algorithms.In the artificial network,the algorithm also has advantages.Therefore,the proposed algorithm is more stable than other algorithms.

Key words: Community detection, Data mining, Density peak, Overlapping community, Topological structure

中图分类号: 

  • TP391
[1]FORTUNATO S.Community detection in graphs[J].Physics Reports-Review Section of Physics Letters,2010,486(3):75-174.
[2]NEWMAN M E J.The structure and function of complex networks[J].Siam Review,2003,45(2):167-256.
[3]YU Z D,YU H Q.Micro-Blog user recommendation based on community discovery and topic analysis[J].Journal of East China University of Science and Technology(Natural Science Edition),2014,40(6):763-768.(in Chinese)
余紫丹,虞慧群.基于社区发现及主题分析的微博用户推荐[J].华东理工大学学报(自然科学版),2014,40(6):763-768.
[4]MA F M,WANG G.Method for commodity recommendation based on user community[J].Computer & Digital Engineering,2013,41(8):1354-1356.(in Chinese)
麻风梅,王刚.基于用户社区的商品推荐方法[J].计算机与数字工程,2013,41(8):1354-1356.
[5]PAN W F,LI B,SHAO B,et al.Service classification and re-commendation based on software networks[J].Chinese Journal of Computers,2011,34(12):2355-2369.(in Chinese)
潘伟丰,李兵,邵波,等.基于软件网络的服务自动分类和推荐方法研究[J].计算机学报,2011,34(12):2355-2369.
[6]GIRVAN M,NEWMAN M E J.Community structure in social and biological networks[J].Proceedings of National Academy of Science,2002,99(12):7821-7826.
[7]NEWMAN M E J.Fast algorithm for detecting community structure in networks[J].Physical Review E Statistical Nonli-near and Soft Matter Physics,2004,69(6):066133.
[8]DERENYI I,PALLA G,VICSEK T.Clique percolation in random networks[J].Physical review letters,2005,94(16):160202.
[9]CLAUSET A,NEWMAN M E J,MOORE C.Finding community structure in very large networks[J].Physical Review E Statistical Nonlinear and Soft Matter Physics,2004,70(6):066111.
[10]DANON L,DIAZGUILERA A,ARENAS A.Effect of size heterogeneity on community identification in complex networks[J].Journal of Statistical Mechanics:Theory and Experiment,2006,2006(11):P11010.
[11]ZHU X,GHAHRAMANI Z.Learning from labeled and unlabeled data[J].Technology Report,2002,3175(2004):237-244.
[12]LIU S C,ZHU F X,GAN L.A label-propagation-probability-based algorithm for overlapping community detection[J].Chinese Journal of Computers,2016,39(4):717-729.(in Chinese)
刘世超,朱福喜,甘琳.基于标签传播概率的重叠社区发现算法[J].计算机学报,2016,39(4):717-729.
[13]ZHANG X K,REN J,SONG C,et al.Label propagation algorithm for community detection based on node importance and label influence[J].Physics Letters A,2017,381(33):2691-2698.
[14]LI L,JIAO L,ZHAO J,et al.Quantum-behaved discrete multi-objective particle swarm optimization for complex network clustering[J].Pattern Recognition,2017,63:1-14.
[15]LIU Q,ZHOU B,LI S,et al.Community detection utilizing a novel multi-swarm fruit fly optimization algorithm with hill-climbing strategy[J].Arabian Journal for Science & Engineering,2016,41(3):807-828.
[16]JIANG S Y,YANG B H,WANG L X.An adaptive dynamic community detection algorithm based on incremental spectral clustering[J].Acta Automatica Sinica,2015,41(12):2017-2025.(in Chinese)
蒋盛益,杨博泓,王连喜.一种基于增量式谱聚类的动态社区自适应发现算法[J].自动化学报,2015,41(12):2017-2025.
[17]ZHOU X,LIU Y,WANG J,et al.A density based link clustering algorithm for overlapping community detection in networks[J].Physica A Statistical Mechanics & Its Applications,2017,486(2017):65-78.
[18]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[19]YANG J,WANG G Y,PANG Z L.Relative researches of clustering by fast search and find of density peaks[J].Journal of Nanjing University(Natural Science),2017,53(4):791-801.(in Chinese)
杨洁,王国胤,庞紫玲.密度峰值聚类相关问题的研究[J].南京大学学报(自然科学版),2017,53(4):791-801.
[20]SHI X H,FENG G X,LI M,et al.Overlapping community detection method based on density peaks[J].Journal of Jilin University(Engineering and Technology Edition),2017,47(1):242-248.(in Chinese)
时小虎,冯国香,李牧,等.基于密度峰值的重叠社区发现算法.吉林大学(工学版),2017,47(1):242-248.
[21]HUANG L,LI Y,WANG G S,et al.Community detection method based on vertex distance and clustering of density peaks[J].Journal of Jilin University(Engineering and Technology Edition),2016,46(6):2042-2051.(in Chinese)
黄岚,李玉,王贵参,等.基于点距离和密度峰值聚类的社区发现方法[J].吉林大学学报(工学版),2016,46(6):2042-2051.
[22]HENNIG C,HAUSDORF B.Design of dissimilarity measures:A new dissimilarity between species distribution areas[M]//Data Science And Classification.Berlin Heidelberg:Springer,2006:29-37.
[23]SHEN H,CHENG X,CAI K,et al.Detect overlapping and hie-rarchical community structure in networks[J].Physica A Statistical Mechanics & Its Applications,2009,388(8):1706-1712.
[24]SANTOS J M,EMBRECHTS M.On the use of the adjusted rand index as a metric for evaluating supervised classification[C]//Proceeding of the 19th International Conference on Artificial Neural Networks.Berlin Heidelberg:Springer,2009:175-184.
[25]DANON L,DIAZGUILERA A,DUCH J,et al.Comparing community structure identification[J].Journal of Statistical Mechanics:Theory and Experiment,2005,2005(9):P09008.
[26]LANCICHINETTI A,FORTUNATO S.Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities[J].Physical Review E Statistical Nonlinear & Soft Matter Physics,2009,80(1):016118.
[27]BAI X,YANG P,SHI X.An overlapping community detection algorithm based on density peaks[J].Neurocomputing,2017,226:7-15.
[28]HUANG L,WANG G,WANG Y,et al.A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection[J].International Journal of Modern Physics B,2016,30(24):1650167.
[29]GAITERI C,CHEN M M,SZYMANSKI K,et al.Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering[J].Scientific Reports,2015,5:16361.
[30]XIE J,SZYMANSKI B K,LIU X.SLPA:Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process[C]//Proceedings of the 11th IEEE International Conference of Data Mining Workshops.Washington,CD:IEEE,2011:344-349.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 何亦琛, 毛宜军, 谢贤芬, 古万荣.
基于点割集图分割的矩阵变换与分解的推荐算法
Matrix Transformation and Factorization Based on Graph Partitioning by Vertex Separator for Recommendation
计算机科学, 2022, 49(6A): 272-279. https://doi.org/10.11896/jsjkx.210600159
[3] 王本钰, 顾益军, 彭舒凡, 郑棣文.
融合动态距离和随机竞争学习的社区发现算法
Community Detection Algorithm Based on Dynamic Distance and Stochastic Competitive Learning
计算机科学, 2022, 49(5): 170-178. https://doi.org/10.11896/jsjkx.210300206
[4] 唐春阳, 肖玉芝, 赵海兴, 冶忠林, 张娜.
面向双层网络的EWCC社区发现算法
EWCC Community Discovery Algorithm for Two-Layer Network
计算机科学, 2022, 49(4): 49-55. https://doi.org/10.11896/jsjkx.210800275
[5] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[6] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[7] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[8] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[9] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[10] 陈湘涛, 赵美杰, 杨梅.
基于子图结构的局部社区发现算法
Overlapping Community Detection Algorithm Based on Subgraph Structure
计算机科学, 2021, 48(9): 244-250. https://doi.org/10.11896/jsjkx.201100010
[11] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[12] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[13] 汤鑫瑶, 张正军, 储杰, 严涛.
基于自然最近邻的密度峰值聚类算法
Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor
计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112
[14] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[15] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!