计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 236-243.doi: 10.11896/jsjkx.210300205

• 大数据&数据科学 • 上一篇    下一篇

基于KL-Ball的社区挖掘方法

娄铮铮, 王冠威, 李辉, 吴云鹏   

  1. 郑州大学信息工程学院 郑州450001
  • 出版日期:2021-11-10 发布日期:2021-11-12
  • 通讯作者: 吴云鹏(ieypwu@zzu.edu.cn)
  • 作者简介:zzlou@zzu.edu.cn
  • 基金资助:
    国家自然科学基金(62002330)

Community Mining Based on KL-Ball

LOU Zheng-zheng, WANG Guan-wei, LI Hui, WU Yun-peng   

  1. School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:LOU Zheng-zheng,born in 1984,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include machine learning,pattern recognition and compute.
    WU Yun-peng,born in 1987,Ph.D,lecturer.His main research interests include computer vision and pattern re-cognition.
  • Supported by:
    National Natural Science Foundation of China(62002330).

摘要: 针对邻接矩阵的稀疏特性,采用KL散度来计算网络节点间的距离,提出了一种基于KL-Ball的社区挖掘方法。该方法中,一个KL-Ball代表一个社区,它从质心、半径、互信息及密度4个方面来描述社区,其中质心决定了社区在网络中的位置,半径刻画了社区所能覆盖的范围,互信息度量了社区中包含节点的一致性,密度反映了社区包含节点的数量。给定一个半径,期望从复杂网络中寻找具有低信息、高密度的社区,低信息使得社区包含的节点具有较强的一致性,高密度使得一个社区具有较强的凝聚性。为此,定义了一个基于KL-Ball的社区挖掘目标函数,给出它的优化算法,并从理论上证明了该算法的收敛性。依据社区半径的大小及质心的位置,该算法可应用于非重叠社区挖掘以及重叠社区挖掘。实验结果表明,基于KL-Ball的社区挖掘方法可有效地挖掘网络中蕴含的社区结构,包括非重叠的社区及重叠的社区。

关键词: KL散度, 非重叠社区, 社区挖掘, 重叠社区

Abstract: This paper presents a new community mining method where each community is viewed as a KL-Ball,and the KL divergence is adopted to measure the distance between nodes in the sparse adjacency matrix.This paper defines the KL-Ball consisting of four aspects:center,radius,mutual information and density.The center determines the location of KL-Ball in the complex network,and the radius determines the region of KL-Ball.The mutual information is used to measure the consistency of objects in a community,and the density is adopted to measure the coherence of a community.Given a radius,we aim to find communities with lower-information and higher-density in the complex network.For this purpose,we define the community mining objective function based on KL-Ball.Then we propose an optimization algorithm to minimize the objective function and theoretically prove the convergence of it.The proposed algorithm adopts a flexible community mining framework,and can be applied to several kinds of community mining tasks based on different locations and regions of the KL-Balls,such as the traditional community mining,high precision community mining and overlapping community detection.The experiment results show that the proposed KL-Ball based method can effectively find the community structure in complex network,including non-overlapping and overlapping communities.

Key words: Community mining, KL divergence, Non-overlapping community, Overlapping community

中图分类号: 

  • TP391
[1]GIRVAN M,NEWMAN M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences,2002,99(12):7821-7826.
[2]LIU D Y,JIN D,HE D X,et al.Community Mining in ComplexNetworks[J].Journal of Computer Research and Development,2013,50(10):2140-2154.
[3]COSCIA M,GIANNOTTI F,PEDRESCHI D.A classificationfor community discovery methods in complex networks[J].Statistical Analysis and Data Mining:The ASA Data Science Journal,2011,4(5):512-546.
[4]RADICCHI F,CASTELLANO C,CECCONI F,et al.Defining and identifying communities in networks[J].Proceedings of the National Academy of Sciences of the United States of America,2004,101(9):2658-2663.
[5]FORTUNATO S.Community detection in graphs[J].PhysicsReports,2010,486(3/4/5):75-174.
[6]YANG B,CHEUNG W,LIU J.Community mining from signed social networks[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(10):1333-1348.
[7]LIBEN-NOWELLY D,KLEINBERG J.The link predictionproblem for social networks[C]//Proceedings of the Twelfth International Conference on Information and Knowledge Mana-gement(CIKM).2004,556-559.
[8]YANG N,GONG D Z,LI X,et al.Survey of Web Communities Identification[J].Journal of Computer Research and Development,2005(3):439-447.
[9]SRIVASTAVA J,COOLEY R,DESHPANDE M,et al.Web usage mining:Discovery and applications of usage patterns from web data[J].Acm Sigkdd Explorations Newsletter,2000,1(2):12-23.
[10]BENSON A R,GLEICH D F,LESKOVEC J.Higher-order or-ganization of complex networks[J].Science,2016,353(6295):163.
[11]YIN D W,HONG L J,XIONG X,et al.Link formation analysisin microblogs[C]//Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR 11).Beijing,China:ACM Press,2011:1235-126.
[12]NEWMAN M E J.Communities,modules and large-scale structure in networks[J].Nature Physics,2012(8):225-31.
[13]NEWMAN M E J.Fast algorithm for detecting communitystructure in networks[J].Physical Review E,2004,69(6):066133.
[14]CLAUSET A,NEWMAN M E J,MOORE C.Finding community structure in very large networks[J].Physical Review E,2004,70(6):066111.
[15]NEWMAN M E J.The structure and function of complex net-works[J].SIAM Review,2003,45(2):167-256.
[16]PALLA G,DERENYI L,FARKAS L,et al.Uncovering theoverlapping community structure of complex networks in nature and society[J].Nature,2005,435(9):814-818.
[17]BAUMES J.Efficient identification of overlapping communities[C]//Proc of IEEE International Conference on Intelligence and Security Informations.2005:27-36.
[18]LANCICHINETTI A,FORTUNATO S,KERTÉSZ J.Detecting the overlapping and hierarchical community structure in complex networks[J].New Journal of Physics,2009,11(3):033015.
[19]COVER T M,THOMAS J A.Elements of information theory [M].John Wiley & Sons.2012.
[20]TELGARSKY M,VATTANI A.Hartigan's Method:k-means clustering without voronoi[C]//International Conference on Artificial Intelligence and Statistics.2010:820-827.
[21]SLONIM N,AHARONI E,CRAMMER K.Hartigan's k-means versus Lloyd's k-means:is it time for a change?[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2013:1677-1684.
[22]ZACHARY W W.An information flow model for conflict andfission in small groups[J].Journal of Anthropological Research,1977,33:452-473.
[23]NEWMAN M E J.Fast algorithm for detecting communitystructure in networks[J].Physical Review E,2004,69 (6):066133.
[24]NEWMAN M E J.Modularity and community structure in networks[J].Proceedings of the national academy of sciences,2006,103(23):8577-8582.
[25]BLONDEL V D,GUILLAUME J L,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J].Journal of Statistical Mechanics,2008,2008(10):155-168.
[26]RAGHAVAN U N,ALBERT R,KUMARA S.Near LinearTime Algorithm to Detect Community Structures in Large-scale Networks[J].Phys Rev E,2007,76(3):036106.
[27]LEE C,REID F,MCDAID A,et al.Detecting highly overlapping community structure by greedy clique expansion[C]//Proceedings Of the 4th SNA-KDD Workshop on Social Network Mining and Analysis.Washington,DC,USA,2010:33-42.
[28]EVANS T,LAMBIOTTE R.Line Graphs,Link Partitions and Overlapping Communities[J].Physical Review E,2009,80(1):16105.
[29]WHANG J J,DHILLON I S,GLEICH D F.Non-exhaustive,overlapping k-means[C]//Proceedings of the 2015 SIAM International Conference on Data Mining,Society for Industrial and Applied Mathematics.2015:936-944.
[30]KOUNI I B E,KAROUI W,ROMDHANE L B.Node Importance based Label Propagation Algorithm for overlapping community detection in networks[J].Expert Systems with Applications,2019,162:113020.
[31]ZHANG Q,CHEN H M,FENG Y F.Overlapping Community Detection Method Based on Rough Sets and Distance Dynamic Model[J].Computer Science,2020,47(10):75-82.
[32]XUE L,TANG X Q.Algorithm for Detecting Overlapping Communities Based on Centered Cliques[J].Computer Science,2020,47(8):157-163.
[33]WANG Y,BU Z,YANG H,et al.An effective and scalable overlapping community detection approach:Integrating social identity model and game theory[J].Applied Mathematics and Computation,2021,390:125601.
[34]LANCICHINETTI A,FORTUNATO S,RADICCHI F.Benchmark graphs for testing community detection algorithms[J].Physical Review E,2008,78(4):46-61.
[35]LUSSEAU D,NEWMAN M E J.Identifying the role that animals play in their social networks[J].Proceedings of the Royal Society of London.Series B:Biological Sciences,2004,271(suppl_6):S477-S481.
[36]NEWMAN M E J.Network Data[EB/OL].[2013-04-19].http://www-personal.umich.edu/~mejn/netdata/.
[37]MCDAID A F,GREENE D,HURLEY N.Normalized mutual information to evaluate overlapping community finding algorithms[J].arXiv:1110.2515,2011.
[38]COSCIA M,ROSSETTI G,GIANNOTTI F,et al.Demon:a local-first discovery method for overlapping communities[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2012:615-623.
[1] 陈湘涛, 赵美杰, 杨梅.
基于子图结构的局部社区发现算法
Overlapping Community Detection Algorithm Based on Subgraph Structure
计算机科学, 2021, 48(9): 244-250. https://doi.org/10.11896/jsjkx.201100010
[2] 宁懿昕, 谢辉, 姜火文.
图神经网络社区发现研究综述
Survey of Graph Neural Network in Community Detection
计算机科学, 2021, 48(11A): 11-16. https://doi.org/10.11896/jsjkx.210500151
[3] 薛磊, 唐旭清.
基于中心团的重叠社区检测算法
Algorithm for Detecting Overlapping Communities Based on Centered Cliques
计算机科学, 2020, 47(8): 157-163. https://doi.org/10.11896/jsjkx.200300034
[4] 张琴, 陈红梅, 封云飞.
一种基于粗糙集和密度峰值的重叠社区发现方法
Overlapping Community Detection Method Based on Rough Sets and Density Peaks
计算机科学, 2020, 47(5): 72-78. https://doi.org/10.11896/jsjkx.190400160
[5] 张琴, 陈红梅, 封云飞.
基于粗糙集和距离动态模型的重叠社区发现方法
Overlapping Community Detection Method Based on Rough Sets and Distance Dynamic Model
计算机科学, 2020, 47(10): 75-82. https://doi.org/10.11896/jsjkx.190800002
[6] 李建国, 赵海涛, 孙韶媛.
基于KL散度的策略优化
KL-divergence-based Policy Optimization
计算机科学, 2019, 46(6): 212-217. https://doi.org/10.11896/j.issn.1002-137X.2019.06.032
[7] 刘春, 张国良.
一种基于重叠社区发现的软件特征提取方法
Software Feature Extraction Method Based on Overlapping Community Detection
计算机科学, 2019, 46(12): 201-207. https://doi.org/10.11896/jsjkx.181001856
[8] 封云飞, 陈红梅.
基于拓扑结构的密度峰值重叠社区发现算法
Topological Structure Based Density Peak Algorithm for Overlapping Community Detection
计算机科学, 2019, 46(10): 39-48. https://doi.org/10.11896/jsjkx.180901644
[9] 戴彩艳, 陈崚, 胡孔法.
基于模块度增量的二分网络社区挖掘算法
Algorithm for Mining Bipartite Network Based on Incremental Modularity
计算机科学, 2018, 45(6A): 442-446.
[10] 邹凌君,陈崚,戴彩艳.
基于广义后缀树的二分网络社区挖掘算法
Detecting Community from Bipartite Network Based on Generalized Suffix Tree
计算机科学, 2017, 44(7): 221-226. https://doi.org/10.11896/j.issn.1002-137X.2017.07.039
[11] 张忠正,李建武.
一种基于局部拓展的并行重叠社区发现算法
Parallelizable Overlapping Community Detection Algorithm Based on Local Expansion
计算机科学, 2016, 43(9): 61-65. https://doi.org/10.11896/j.issn.1002-137X.2016.09.011
[12] 刘冰玉,王翠荣,王聪,苑迎.
基于引力因子的加权网络重叠社区识别算法
Overlapping Community Recognition Algorithm of Weighted Networks Based on Gravity Factor
计算机科学, 2016, 43(12): 153-157. https://doi.org/10.11896/j.issn.1002-137X.2016.12.027
[13] 徐建民,武晓波,吴树芳,粟武林.
基于WB-MMSB模型的微博网络社区发现
Community Detection for Micro-blog Network Based on WB-MMSB Model
计算机科学, 2015, 42(3): 65-70. https://doi.org/10.11896/j.issn.1002-137X.2015.03.014
[14] 李莉杰,陈端兵,王冠楠.
有向网络重叠社区的快速划分算法
Fast Algorithm for Overlapping Community Detection in Directed Networks
计算机科学, 2014, 41(Z6): 258-261.
[15] 陈端兵,尚明生,李霞.
重叠社区发现的两段策略
Two-phase Strategy on Overlapping Communities Detection
计算机科学, 2013, 40(1): 225-228.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!