计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 77-87.doi: 10.11896/jsjkx.230600003

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于节点聚类复杂度的图聚类方法

郑文萍1,2,3, 王富民1, 刘美麟1, 杨贵1   

  1. 1 山西大学计算机与信息技术学院 太原 030006
    2 计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006
    3 山西大学智能信息处理研究所 太原 030006
  • 收稿日期:2023-05-31 修回日期:2023-08-24 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 郑文萍 (wpzheng@sxu.edu.cn)
  • 基金资助:
    国家自然科学基金(62072292);山西省1331工程项目

Graph Clustering Algorithm Based on Node Clustering Complexity

ZHENG Wenping1,2,3, WANG Fumin1, LIU Meilin1, YANG Gui1   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computation Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China
    3 Institute of Intelligent Information Processing,Shanxi University,Taiyuan 030006,China
  • Received:2023-05-31 Revised:2023-08-24 Online:2023-11-15 Published:2023-11-06
  • About author:ZHENG Wenping,born in 1979,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include graph theory algorithms and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(62072292) and 1331 Engineering Project of Shanxi Province,China.

摘要: 图聚类可以发现网络中的社区结构,是复杂网络分析中的一项重要任务。针对不同节点的聚类难度各异的问题,提出了一种基于节点聚类复杂度的图聚类算法(Graph Clustering Algorithm Based on Node Clustering Complexity,GCNCC),用于判断节点的聚类复杂度,为聚类复杂度低的节点赋予伪标签,利用伪标签提供的监督信息降低其他节点的聚类复杂度,进而得到网络聚类结果。GCNCC包括节点表示、节点聚类复杂度判别和图聚类3个主要模块。节点表示模块得到保持网络集聚性的表示;节点聚类复杂度判别模块用于判断网络中的低聚类复杂度节点,并利用低聚类复杂度节点的伪标签信息来优化更新网络中其他节点的聚类复杂度;图聚类模块采用标签传播方法,将低聚类复杂度节点标签传播给高聚类复杂度节点,以得到聚类结果。在3个真实的引文网络和3个生物数据集上与9种经典算法进行对比,算法GCNCC在ACC,NMI,ARI和F1等方面均表现良好。

关键词: 图聚类, 节点聚类复杂度, 网络嵌入, 自监督

Abstract: Graph clustering is an important task in the analysis of complex networks,which can reveal the community structure within a network.However,clustering complexity of nodes varies throughout the network.To address this issue,a graph clustering algorithm based on node clustering complexity(GCNCC)is proposed.It calculates the clustering complexity of nodes and assigns pseudo-labels to nodes with low clustering complexity.Then it uses these pseudo-labels as supervised information to lower the clustering complexity of other nodes to obtain the community structure of the network.The GCNCC algorithm consists of three main modules:node representation,node clustering complexity assessment,and graph clustering.The node representation module represents nodes in a low-dimensional space to maintain the clustering of nodes,the node clustering complexity assessment module identifies low clustering complexity nodes,and assigns them pseudo-labels,which can be used to update the clustering complexity of other nodes.The graph clustering module uses label propagation to spread the pseudo-labels from nodes with low clustering complexity to those with high clustering complexity.Compared with 9 classic algorithms on 3 real citation networks and 3 biological datasets,the proposed GCNCC performed well in terms of ACC,NMI,ARI,and F1.

Key words: Graph clustering, Node clustering complexity, Network embedding, Self-supervised

中图分类号: 

  • TP391
[1]NEWMAN M E J.Fast algorithm for detecting communitystructure in networks[J].Physical Review.E,2004,69(6):026113.
[2]BLONDEL V D,GUILLAUME J,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J/OL].Journal of Statistical Mechanics:Theory and Experiment,2008,(10):P10008.https://iopscience.iop.org/article/10.1088/1742-5468/2008/10/P10008/meta#artAbst.
[3]LANCICHINETTI A,FORTUNATO S,KERTÉSZ J.Detec-ting the overlapping and hierarchical community structure in complex networks[J].New Journal of Physics,2009,11(3):033015.
[4]WHANG J J,GLEICH D F,DHILLON I S.Overlapping community detection using neighborhood-inflated seed expansion[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(5):1272-1284.
[5]RAGHAVAN U N,ALBERT R,SOUNDAR K.Near lineartime algorithm to detect community structures in large-scale networks[J].PHYSICAL REVIEW E,2007,76(3):036106.
[6]LI W,HUANG C,WANG M,et al.Stepping community detection algorithm based on label propagation and similarity[J].Physica A-Statistical Mechanics and Its Applications,2017,472:145-155.
[7]DING J J,HE X X,YUAN J Q,et al.Community detection by propagating the label of center[J].Physica A:Statistical Mechanics and its Applications,2018,503:675-686.
[8]LI C W,CHEN H M,LI T R,et al.A stable community detection approach for complex network based on density peak clustering and label propagation[J].Applied Intelligence,2021,52(2):1188-1208.
[9]LEE D D,SEUNG H.S.Learning the parts of objects by non-negative matrix factorization[J].Nature,1999(401):788-791.
[10]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000(290):2323-2326.
[11]BELKIN M,NIYOGI P.Laplacian eigenmaps and spectral techniques for embedding and clustering[J].Advances in Neural Information Processing Systems,2001,14(6):585-591.
[12]CAO S S,LU W,XU Q K.GraRep:Learning graph representa-tions with global structural information.[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge.Managemen New York,USA:ACM,2015:891-900.
[13]WANG X,CUI P,WANG J,et al.Community preserving network embedding[C]//Proceedings of the 32th AAAI Confe-rence on Artificial Intelligence.Palo Alto,USA:AAAI,2017:203-209.
[14]YANG C,LIU Z Y,ZHAO D L,et al.Network representation learning with rich text information[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence.Palo Alto,USA:AAAI,2015:2111-2117.
[15]HUANG X,LI J D,HU X.Accelerated attributed network embedding[C]//Proceedings of the 2017 SIAM International Conference on Data Mining.Philadelphia,USA:SIAM,2017:633-641.
[16]PEROZZI B,AL-RFOU R,SKIENA S.DeepWalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.New York,USA:ACM,2014:701-710.
[17]GROVER A,LESKOVEC J.Node2vec:Scalable feature learning for networks[C]//Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.New York,USA:ACM,2016:855-864.
[18]HUANG X,SONG Q Q,LI Y N,et al.Graph recurrent networks with attributed random walks.[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM,2019:732-740.
[19]HONG R C,HE Y,WU L,et al.Deep attributed network embedding by preserving structure and attribute information[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2021,51(3):1434-1445.
[20]KIPF T,WELLING M.Semi-Supervised classification withgraph convolutional networks[C]//Proceedings of the International Conference on Learning Representations.2016.
[21]KIPF T,WELLING M.Variational graph auto-encoders[C]//Proceedings of the NIPS Workshop on Bayesian Deep Learning.2016.
[22]VELICKOVIC P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]//Proceedings of the International Conference on Learning Representations.2018.
[23]GASTEIGER J,BOJCHEVSKI A,GÜNNEMANN S.Predictthen propagate:graph neural networks meet personalized pageRank[C]//Proceedings of the International Conference on Learning Representations.2018.
[24]WANG X,ZHU M Q,BO D Y,et al.AM-GCN:Adaptive multi-channel graph convolutional networks[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM,2020:1243-1253.
[25]XIE J Y,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 33nd International Conference on Machine Learning.Cambridge,USA:MIT Press,2016:478-487.
[26]WANG C,PAN S R,HU R Q,et al.Attributed graph clustering:A deep attentional embedding approach[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.San Francisco,USA:Morgan Kaufmann,2019:3670-3676.
[27]BO D Y,WANG X,SHI C,et al.Structural deep clustering network[C]//Proceedings of the 29th International World Wide Web Conference.New York,USA:ACM,2020:1400-1410.
[28]PAN S R,HU R Q,FUNG S F,et al.Learning graph embedding with adversarial training methods[J].IEEE Transactions on Cybernetics,2019,50(6):2475-2487.
[29]CUI G Q,ZHOU J,YANG C et al.Adaptive graph encoder for attributed graph embedding[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining.New York,USA:ACM,2020:976-985.
[30]JI J Z,LIANG Y LEI M L.Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation[J].Neural networks,2021,142:522-533.
[31]CHEN D L,LIN Y K,LI W,et al.Measuring and relieving the over-smoothing problem for graph neural networks from the topological View[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence.Palo Alto,USA:AAAI,2020:3438-3445.
[32]XIE Y Q,LI S,YANG C,et al.When do GNNs work:Understanding and improving neighborhood aggregation[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence.San Francisco,USA:Morgan Kaufmann,2020:1303-1309.
[33]HE D X,GUO R,WANG X B,et al.Inflation improves graph neural networks[C]//Proceedings of the 31th International World Wide Web Conference.New York,USA:ACM,2022:1466-1474.
[34]DUAN L,AGGARWAL C,MA S,et al.Improving spectralclustering with deep embedding and cluster estimation[C]//Proceedings of the International Conference on Data Mining.Washington,USA:IEEE,2019:170-179.
[35]LEIBER C,BAUER L G M,SCHELLINGB,et al.Dip-baseddeep embedded clustering with k-estimation[C]//Proceedings of the 27th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York,USA:ACM,2021:903-913.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!