计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 72-78.doi: 10.11896/jsjkx.190400160

• 数据库&大数据&数据科学 • 上一篇    下一篇

一种基于粗糙集和密度峰值的重叠社区发现方法

张琴, 陈红梅, 封云飞   

  1. 西南交通大学信息科学与技术学院 成都611756
    西南交通大学云计算与智能技术高校重点实验室 成都611756
  • 收稿日期:2019-04-18 出版日期:2020-05-15 发布日期:2020-05-19
  • 通讯作者: 陈红梅(hmchen@swjtu.edu.cn)
  • 作者简介:qinzhang@my.swjtu.edu.cn
  • 基金资助:
    国家自然科学基金(61572406)

Overlapping Community Detection Method Based on Rough Sets and Density Peaks

ZHANG Qin, CHEN Hong-mei, FENG Yun-fei   

  1. School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China
    Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China
  • Received:2019-04-18 Online:2020-05-15 Published:2020-05-19
  • About author:ZHANG Qin,born in 1995,postgradua-te,is a member of China Computer Fe-deration.Her main research interests include database technology and data mining.
    CHEN Hong-mei,born in 1971,Ph.D,professor,Ph.D supervisor,is a ember of China Computer Federation.Her main research interests include granular calculation,rough sets and intelligent information processing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61572406).

摘要: 随着互联网和社会的发展,各个领域每天都会产生大量相互关联、彼此依赖的数据,这些数据根据不同的主题形成了各种复杂网络。挖掘社区结构是复杂网络领域中的一项重要研究内容,因为其在推荐系统、行为预测和信息传播等方面具有极其重要的意义。社区结构中的重叠社区结构在生活中普遍存在,更具有实际研究意义。为有效发现复杂网络中的重叠社区,文中引入了粗糙集理论对社区进行分析,识别出重叠节点,进而提出了一种基于粗糙集和密度峰值的重叠社区发现方法OCDRD(Overlapping Community Detection Algorithm Based on Rough Sets and Density Peaks)。该方法在传统网络节点局部相似性度量的基础上,结合灰色关联分析方法求出网络节点间的全局相似性,进而将其转化为节点间距离。将密度峰值聚类算法的思想应用于该算法中,以根据网络结构自动选取社区中心节点。依据网络中节点的距离比例关系,定义了社区的上近似、下近似以及边界域。最后,不断调整距离比率阈值并进行划分迭代,在每次迭代中针对社区的边界域进行计算,从而获得最佳重叠社区划分结构。在LFR基准人工网络数据集和真实网络数据集上,基于标准互信息(Normalized Mutual Information,NMI)和具有重叠性模块度EQ这两个评价指标,将OCDRD方法与近几年效果较好的其他社区发现算法进行测试比较。实验结果显示,OCDRD方法在社区划分结构方面整体优于其他社区发现算法,表明了该算法的可行性和有效性。

关键词: 粗糙集, 灰色关联分析方法, 密度峰值, 重叠社区发现

Abstract: With the development of the Internet and society,a large number of interrelated and interdependent data is produced in various fields every day,which form various complex networks according to different themes.Mining community structure of complex networks is an important research content,which has extremely important significance in recommendation system,behavior prediction and information spreading.Moreover,overlapping community structure of complex networks exists universally in life,which has practical research significance.In order to detect overlapping communities effectively in complex networks,an overlapping community detection method OCDRD based on rough sets and density peaks is proposed in this paper,in which rough set theory is used to analyze communities and identify overlapping nodes.Firstly,the global similarities among network nodes are obtained by using grey correlation analysis method based on the traditional local similarity measure of network nodes.Then the global similarities among network nodes are converted to distance among nodes.The center nodes of the community are automatically selected by the network structure by applying the idea of density peaks based clustering.Next,the lower approximation,the upper approximation,and the boundary region of the community are defined according to the distance ratio relation among nodes in the network.Finally,the threshold value of distance ratio is adjusted iteratively,and the boundary region of the community is calculated repeatedly in each iteration until the optimal overlapping community structure is obtained.The OCDRD algorithm is compared with other community detection algorithms that have achieved good results in recent years both on LFR benchmark artificial network datasets and real network datasets.By analyzing two common community detection evaluation indexes,NMI and overlapping module degree EQ,the experimental results show that OCDRD algorithm is superior to other community detection algorithms in community partition structure andit is feasible and effective.

Key words: Density peaks, Grey correlation analysis method, Overlapping community detection, Rough set

中图分类号: 

  • TP391
[1]NEWMAN M E J.Networks:An Introduction[EB/OL].[2010-09].http://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199206650.001.0001/acprof-9780199206650.
[2]BAI X Y,YANG P L,SHI X H.An overlapping community detection algorithm based on density peaks[J].Neurocomputing,2017,226:7-15.
[3]WU H,GAO L,DONG J H,et al.Detecting overlapping protein complexes by rough-fuzzy clustering in protein-protein interaction networks[J].PLoS One,2014,9(3):e91856.
[4]ZHANG X Y,WANG C T,SU Y S,et al.A fast overlapping community detection algorithm based on weak cliques for large-scale networks[J].IEEE Transactions on Computational Social Systems,2017,4(4):218-230.
[5]SUN H L,LIU J,HUANG J B,et al.LinkLPA:a link-based label propagation algorithm for overlapping community detection in networks[J].Computational Intelligence,2017,33(2):308-331.
[6]YU Z Y,CHEN J J,QUO K,et al.Overlapping community detection based on random walk and seeds extension[C]//Proceedings of the 12th Chinese Conference on Computer Supported Cooperative Work and Social Computing(ChineseCSCW '17).New York,USA:ACM Press,2017:18-24.
[7]BANSAL S,BHOWMICK S,PAYMAL P.Fast community detection for dynamic complex networks[M]//Communications in Computer and Information Science.Springer,2011:196-207.
[8]PAWLAK Z.Rough sets[J].International Journal of Computer &Information Sciences,1982,11(5):341-356.
[9]GRECO S,MATARAZZO B,SLOWINSKI R.Rough sets theory for multicriteria decision analysis[J].European Journal of Operational Research,2001,129(1):1-47.
[10]ZHANG W X,WU W Z.An introduction and a survey for the studies of rough set theory[J].Fuzzy Systems and Mathema-tics,2000,14(4):1-12.
[11]ZHANG Y,WU B,LIU Y.A novel community detection me-thod based on rough set K-means[J].Journal of Electronics and Information Technology,2017,39(4):770-777.
[12]ZHU W Q,FU Y C.Community structure detection algorithm based on rough set[J].Computer Engineering,2011,37(14):41-43.
[13]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[14]DENG J L.The relational space in grey system theory[J].Fuzzy Mathematics,1985(2):1-10.
[15]NEWMAN M E J,MICHELLE G.Finding and evaluating community structure in networks[J].Physical Review E,2004,69(2):026113-026113.
[16]ZHOU X,LIU Y H,WANG J,et al.A density based link clustering algorithm for overlapping community detection in networks[J].Physica A:Statistical Mechanics and Its Applications,2017,486:65-78.
[17]XU X,YURUK N,FENG Z,et al.SCAN:a structural clustering algorithm for networks[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2007:824-833.
[18]LIU S F,CAI H,YANG Y J,et al.Advance in grey incidence analysis modelling[J].Systems Engineering-Theory & Practice,2013,33(8):2041-2046.
[19]ZHANG Q,QIU Q,GUO W,et al.A social community detection algorithm based on parallel grey label propagation[J].Computer Networks,2016,107(1):133-143.
[20]WANG M M,ZUO W,WANG Y.An improved density peaks-based clustering method for social circle discovery in social networks[J].Neurocomputing,2016,179:219-227.
[21]YANG Z,WANG H J,ZHOU Y.A clustering algorithm with adaptive cut-off distance and cluster centers[J].Data Analysis and Knowledge Discovery,2018(3):39-48.
[22]DWYER T,HONG S H,DIPK K,et al.Visual analysis of network centralities[C]//Proceedings of the 2006 Asia-Pacific Symposium on Information Visualisation.2006:189-197.
[23]BARTHELEMY M.[Lecture Notes in Morphogenesis] Mor-phogenesis of Spatial Networks‖Betweenness centrality[OL].https://www.onacademic.com/detail/journal_1000040157249110_02f6.html.
[24]SABIDUSSI G.The centrality index of a graph[J].Psychome-trika,1966,31(4):581-603.
[25]BONACICH P.Power and centrality:a family of measures[J].American Journal of Sociology,1987,92(5):1170-1182.
[26]DING J J,HE X X,YUAN J Q,et al.Community detection by propagating the label of center[J].Physica A:Statistical Mechanics and Its Applications,2018,503:675-686.
[27]HUANG L,WANG G S,WANG Y,et al.A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection[J].International Journal of Modern Physics B,2016,30(24):1650167.
[28]LANCICHINETTI A,FORTUNATO S.Community detection algorithms:a comparative analysis[J].Physical Review E,2009,80(5):056117.
[29]XIE J R,KELLEY S,SZYMANSKI B K.Overlapping community detection in networks[J].ACM Computing Surveys,2013,45(4):1-35.
[30]NEWMAN M.Network data[EB/OL].[2013-04-19].http://www-personal.umich.edu/~mejn/netdata/.
[31]ZACHARY W W.An information flow model for conflict and fission in small groups[J].Journal of Anthropological Research,1977,33(4):452-473.
[32]LUSSEAU D,SCHNEIDER K,BOISSEAU O J,et al.The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations[J].Behavioral Ecology and Sociobiology,2003,54(4):396-405.
[33]KREBS V.Books about US Politics[EB/OL].http://www.orgnet.com/.
[34]GIRVAN M,NEWMAN M E J.Community structure in social and biological networks[EB/OL].http://arxiv.org/abs/cond-mat/0112110/.
[35]STREHL A,GHOSH J.Cluster ensembles:a knowledge reuse framework for combining partitionings[C]//Eighteenth national conference on Artificial intelligence.2002:93-98.
[36]LANCICHINETTI A,FORTUNATO S,KERTÉSZ J.Detecting the overlapping and hierarchical community structure in complex networks[J].New Journal of Physics,2009,11(3):033015.
[37]NICOSIA V,MANGIONI G,CARCHIOLO V,et al.Extending the definition of modularity to directed graphs with overlapping communities[J].Journal of Statistical Mechanics:Theory and Experiment,2009(3):P03024.
[1] 程富豪, 徐泰华, 陈建军, 宋晶晶, 杨习贝.
基于顶点粒k步搜索和粗糙集的强连通分量挖掘算法
Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory
计算机科学, 2022, 49(8): 97-107. https://doi.org/10.11896/jsjkx.210700202
[2] 许思雨, 秦克云.
基于剩余格的模糊粗糙集的拓扑性质
Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices
计算机科学, 2022, 49(6A): 140-143. https://doi.org/10.11896/jsjkx.210200123
[3] 方连花, 林玉梅, 吴伟志.
随机多尺度序决策系统的最优尺度选择
Optimal Scale Selection in Random Multi-scale Ordered Decision Systems
计算机科学, 2022, 49(6): 172-179. https://doi.org/10.11896/jsjkx.220200067
[4] 陈于思, 艾志华, 张清华.
基于三角不等式判定和局部策略的高效邻域覆盖模型
Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy
计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302
[5] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[6] 王子茵, 李磊军, 米据生, 李美争, 解滨.
基于误分代价的变精度模糊粗糙集属性约简
Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost
计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211
[7] 王志成, 高灿, 邢金明.
一种基于正域的三支近似约简
Three-way Approximate Reduction Based on Positive Region
计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067
[8] 薛占熬, 侯昊东, 孙冰心, 姚守倩.
带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法
Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes
计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042
[9] 陈湘涛, 赵美杰, 杨梅.
基于子图结构的局部社区发现算法
Overlapping Community Detection Algorithm Based on Subgraph Structure
计算机科学, 2021, 48(9): 244-250. https://doi.org/10.11896/jsjkx.201100010
[10] 李艳, 范斌, 郭劼, 林梓源, 赵曌.
基于k-原型聚类和粗糙集的属性约简方法
Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets
计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053
[11] 汤鑫瑶, 张正军, 储杰, 严涛.
基于自然最近邻的密度峰值聚类算法
Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor
计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112
[12] 宁懿昕, 谢辉, 姜火文.
图神经网络社区发现研究综述
Survey of Graph Neural Network in Community Detection
计算机科学, 2021, 48(11A): 11-16. https://doi.org/10.11896/jsjkx.210500151
[13] 薛占熬, 孙冰心, 侯昊东, 荆萌萌.
基于多粒度粗糙直觉犹豫模糊集的最优粒度选择方法
Optimal Granulation Selection Method Based on Multi-granulation Rough Intuitionistic Hesitant Fuzzy Sets
计算机科学, 2021, 48(10): 98-106. https://doi.org/10.11896/jsjkx.200800074
[14] 王卫东, 徐金慧, 张志峰, 杨习贝.
基于密度峰值聚类的高斯混合模型算法
Gaussian Mixture Models Algorithm Based on Density Peaks Clustering
计算机科学, 2021, 48(10): 191-196. https://doi.org/10.11896/jsjkx.200800191
[15] 张煜, 陆亿红, 黄德才.
基于密度峰值的加权犹豫模糊聚类算法
Weighted Hesitant Fuzzy Clustering Based on Density Peaks
计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!