计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 83-91.doi: 10.11896/jsjkx.220400146

• 数据库&大数据&数据科学* 上一篇    下一篇

一种基于节点稳定性和邻域相似性的社区发现算法

郑文萍1,2,3, 刘美麟1, 杨贵1   

  1. 1 山西大学计算机与信息技术学院 太原 030006
    2 山西大学计算智能与中文信息处理教育部重点实验室 太原 030006
    3 山西大学智能信息处理研究所 太原 030006
  • 收稿日期:2022-04-14 修回日期:2022-06-03 出版日期:2022-09-15 发布日期:2022-09-09
  • 通讯作者: 郑文萍(wpzheng@sxu.edu.cn)
  • 基金资助:
    国家自然科学基金(62072292);山西省1331工程项目

Community Detection Algorithm Based on Node Stability and Neighbor Similarity

ZHENG Wen-ping1,2,3, LIU Mei-lin1, YANG Gui1   

  1. 1 School of Computer & Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computation Intelligence and Chinese Information Processing,Ministry of Education,Shanxi University,Taiyuan 030006,China
    3 Institute of Intelligent Information Processing,Shanxi University,Taiyuan 030006,China
  • Received:2022-04-14 Revised:2022-06-03 Online:2022-09-15 Published:2022-09-09
  • About author:ZHENG Wen-ping,born in 1979,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include network science and bioinformatics,etc.
  • Supported by:
    National Natural Science Foundation of China(62072292) and 1331 Engineering Project of Shanxi Province,China.

摘要: 复杂网络规模的增大导致网络中社区结构变得复杂,节点与社区之间的关系更多样化,有效度量大规模网络中节点邻域的社区构成,并对社区归属确定性有差异的节点分别进行处理,可以提高算法的社区发现质量。基于此,提出了一种基于节点稳定性和邻域相似性的社区发现算法(Node Stability and Neighbor Similarity Based Community Detection Algorithm,NSNSA)。首先定义节点的标签熵并对节点在社区发现过程中的稳定性进行度量,选择标签熵较低的节点作为稳定节点集;其次根据节点邻域的标签构成情况定义节点的邻域相似性,对节点与其邻居节点的社区归属一致性进行度量;然后利用稳定节点与其直接邻居中邻域相似性最高的节点构造初始网络,并在该子网络上运行标签传播算法,以得到可靠性较高的初始社区发现结果;最后将未聚类节点分配至与其Katz相似性最高的节点所在的社区,对小规模社区进行合并处理,以得到最终的社区划分结果。在真实网络及人工网络数据集上,与LPA,BGLL,Walktrap,Infomap,LPA-S等经典社区发现算法的对比实验表明,NSNSA算法在模块度以及标准互信息方面表现良好。

关键词: 复杂网络, 社区结构, 标签熵, 节点稳定性, 邻域相似性

Abstract: With the increase of the scale of complex network,community structure becomes more complex.The relationship between nodes and communities become more diversified.It is expected to improve community detection algorithm performance by effectively measuring the community structure and dealing with the nodes with different certainty of community belonging.This paper proposes a community detection algorithm based on node stability and neighbor similarity.Firstly,label entropy of nodes is defined to measure node stability and the nodes with low label entropy are selected as stable node sets.Then the neighbor simila-rity is defined according to the label of node neighbor and the community belonging consistency of nodes and their neighbors is measured.The initial network is constructed by using the node with the highest neighbor similarity between the stable node and its neighbor,and the initial community detection results with high reliability are obtained by running label propagation algorithm on the subnetwork.The unclustered nodes are allocated to the community of the node with the highest Katz similarity.The final result of community detection is obtained by merging small-scale communities.Compared with LPA,BGLL,Walktrap,Infomap,LPA-S and other classical algorithms,experimental results show that the NSNSA algorithm performs well in modularity and NMI.

Key words: Complex network, Community structure, Label entropy, Node stability, Neighbor similarity

中图分类号: 

  • TP391
[1]NEWMAN M E J.Networks:An Introduction[M].Oxford University Press,2010.
[2]JIA S W,GAO L,GAO Y,et al.Exploring triad-rich substructures by graph-theoretic characterizations in complex networks[J].Physica A:Statistical Mechanics and its Applications,2017,468(2017):53-69.
[3]HIDEHIKO I,MINEICHI K,ATSUYOSHI N.Partitioning of web graphs by community topology[C]//Proceedings of the 14th International Conference on World Wide.New York,ACM Press,2005:661-669.
[4]DENG K,ZHANG J P,YANG J.Mobile recommendation based on link community detection[J].The Scientific World Journal,2014:8(26):259156.
[5]FARUTIN V,ROBISON K,LIGHTCAP E,et al.Edge Count probabilities for the identification of local protein communities and their organization[J].Proteins:Structure,Function,and Bioinformatics,2006,62(3):800-818.
[6]NEWMAN M E J,GIRVAN M.Finding and evaluating community structure in networks[J].Physical Review E,2004,69(2):026113.
[7]BLONDEL V D,GUILLAUME J L,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J].Journal of Statistical Mechanics:Theory and Experiment,2008,2008(10):P10008.
[8]TRAAG V A,WALTMAN L,VAN ECK N J.From Louvain to Leiden:guaranteeing well-connected communities[J].Scientific Reports,2019,9(1):1-12.
[9]FORTUNATO S,BARTHELEMY M.Resolution limit in community detection[J].Proceedings of the National Academy of Sciences of the United States of America,2007,104(1):36-41.
[10]RAGHAVAN U N,ALBERT R,KUMARA S.Near linear time algorithm to detect community structures in large-scale networks[J].Physical Review E,2007,76(3):036106.
[11]WANG T,CHEN S S,WANG X X,et al.Label propagation algorithm based on node importance[J].Physica A:Statistical Mechanics and its Applications,2020,551(2020):124137.
[12]LI W,HUANG C,WANG M,et al.Stepping community detection algorithm based on label propagation and similarity[J].Physica A:statistical Mechanics and Its Applications,2017,472:145-155.
[13]ROSVALL M,BERGSTROM C T.Maps of random walks on complex networks reveal community structure[J].Proceedings of the National Academy of Sciences of the United States of America,2008,105(4):1118-1123.
[14]PONS P,LATAPY M.Computing communities in large net-works using random walks[J].Journal of Graph Algorithms and Applications,2006,10(2):191-218.
[15]ZHENG W P,CHE C H,QIAN Y H,et al.A two-stage community detection algorithm based on label propagation[J].Journal of Computer Research and Development,2018,55(9):1959-1971.
[16]LANCICHINETTI A,FORTUNATO S,KERTESZ J.Detec-ting the overlapping and hierarchical community structure in complex networks[J].New Journal of Physics,2009,11(3):033015.
[17]SHEN H W,CHENG X Q,CAI K,et al.Detect overlapping and hierarchical community structure in networks[J].Physica A:Statistical Mechanics and its Applications,2009,388(8):1706-1712.
[18]ZACHARY W W.An information flow model for conflict and fission in small groups[J].Journal of Anthropological Research,1977,33(4):452-473.
[19]LUSSEAU D,NEWMAN M E J.Identifying the role that animals play in their social networks[J].Proceedings of the Royal Society B:Biological Sciences,2004,271(Suppl 6):S477-S481.
[20]GIRVAN M,NEWMAN M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences of the United States of America,2002,99(12):7821-7826.
[1] 杨波, 李远彪.
数据科学与大数据技术课程体系的复杂网络分析
Complex Network Analysis on Curriculum System of Data Science and Big Data Technology
计算机科学, 2022, 49(6A): 680-685. https://doi.org/10.11896/jsjkx.210800123
[2] 何茜, 贺可太, 王金山, 林绅文, 杨菁林, 冯玉超.
比特币实体交易模式分析
Analysis of Bitcoin Entity Transaction Patterns
计算机科学, 2022, 49(6A): 502-507. https://doi.org/10.11896/jsjkx.210600178
[3] 王本钰, 顾益军, 彭舒凡, 郑棣文.
融合动态距离和随机竞争学习的社区发现算法
Community Detection Algorithm Based on Dynamic Distance and Stochastic Competitive Learning
计算机科学, 2022, 49(5): 170-178. https://doi.org/10.11896/jsjkx.210300206
[4] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[5] 赵学磊, 季新生, 刘树新, 李英乐, 李海涛.
基于路径连接强度的有向网络链路预测方法
Link Prediction Method for Directed Networks Based on Path Connection Strength
计算机科学, 2022, 49(2): 216-222. https://doi.org/10.11896/jsjkx.210100107
[6] 李家文, 郭炳晖, 杨小博, 郑志明.
基于信息传播的致病基因识别研究
Disease Genes Recognition Based on Information Propagation
计算机科学, 2022, 49(1): 264-270. https://doi.org/10.11896/jsjkx.201100129
[7] 穆俊芳, 郑文萍, 王杰, 梁吉业.
基于重连机制的复杂网络鲁棒性分析
Robustness Analysis of Complex Network Based on Rewiring Mechanism
计算机科学, 2021, 48(7): 130-136. https://doi.org/10.11896/jsjkx.201000108
[8] 胡军, 王雨桐, 何欣蔚, 武晖栋, 李慧嘉.
基于复杂网络的全球航空网络结构分析与应用
Analysis and Application of Global Aviation Network Structure Based on Complex Network
计算机科学, 2021, 48(6A): 321-325. https://doi.org/10.11896/jsjkx.200900112
[9] 王学光, 张爱新, 窦炳琳.
复杂网络上的非线性负载容量模型
Non-linear Load Capacity Model of Complex Networks
计算机科学, 2021, 48(6): 282-287. https://doi.org/10.11896/jsjkx.200700040
[10] 马媛媛, 韩华, 瞿倩倩.
基于节点亲密度的重要性评估算法
Importance Evaluation Algorithm Based on Node Intimate Degree
计算机科学, 2021, 48(5): 140-146. https://doi.org/10.11896/jsjkx.200300184
[11] 殷子樵, 郭炳晖, 马双鸽, 米志龙, 孙怡帆, 郑志明.
群智体系网络结构的自治调节:从生物调控网络结构谈起
Autonomous Structural Adjustment of Crowd Intelligence Network: Begin from Structure of Biological Regulatory Network
计算机科学, 2021, 48(5): 184-189. https://doi.org/10.11896/jsjkx.210200161
[12] 何彬, 许道云.
正则(3,4)-CNF公式的社区结构
Community Structure of Regular (3,4)-CNF Formula
计算机科学, 2021, 48(4): 26-30. https://doi.org/10.11896/jsjkx.201000178
[13] 刘胜久, 李天瑞, 谢鹏, 刘佳.
带权图的多重分形度量
Measure for Multi-fractals of Weighted Graphs
计算机科学, 2021, 48(3): 136-143. https://doi.org/10.11896/jsjkx.200700159
[14] 龚追飞, 魏传佳.
基于改进AdaBoost算法的复杂网络链路预测
Link Prediction of Complex Network Based on Improved AdaBoost Algorithm
计算机科学, 2021, 48(3): 158-162. https://doi.org/10.11896/jsjkx.200600075
[15] 龚追飞, 魏传佳.
基于拓扑相似和XGBoost的复杂网络链路预测方法
Complex Network Link Prediction Method Based on Topology Similarity and XGBoost
计算机科学, 2021, 48(12): 226-230. https://doi.org/10.11896/jsjkx.200800026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!