计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 204-211.doi: 10.11896/jsjkx.210300060

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于矩阵分解的属性网络嵌入和社区发现算法

徐新黎, 肖云月, 龙海霞, 杨旭华, 毛剑飞   

  1. 浙江工业大学计算机科学与技术学院 杭州310023
  • 收稿日期:2021-03-05 修回日期:2021-06-08 发布日期:2021-11-26
  • 通讯作者: 毛剑飞(mjf@zjut.edu.cn)
  • 作者简介:xxl@zjut.edu.cn
  • 基金资助:
    国家自然科学基金项目(61773348);浙江省公益科技计划项目(LGG20F020017);浙江省自然科学基金项目(LQ18F030015)

Attributed Network Embedding Based on Matrix Factorization and Community Detection

XU Xin-li, XIAO Yun-yue, LONG Hai-xia, YANG Xu-hua, MAO Jian-fei   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2021-03-05 Revised:2021-06-08 Published:2021-11-26
  • About author:XU Xin-li,born in 1977,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include intelligent computing,network embedding and medical image computing.
    MAO Jian-fei,born in 1976,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include network visual media and mobile media technology,computer,vision and embedded system.
  • Supported by:
    National Natural Science Foundation of China(61773348),Zhejiang Public Welfare Science and Technology Plan Project(LGG20F020017) and Natural Science Foundation of Zhejiang Province(LQ18F030015).

摘要: 属性网络不但包含节点之间复杂的拓扑结构,还包含拥有丰富属性信息的节点,其可以比传统网络更有效地建模现代信息系统,属性网络的社区划分对于分析复杂系统的层次结构、控制信息在网络中的传播和预测网络用户的群体行为等方面具有重要的研究价值。为了更好地利用拓扑结构信息和属性信息进行社区发现,提出了一种基于矩阵分解的属性网络嵌入和社区发现算法(CDEMF)。首先提出基于矩阵分解的属性网络嵌入方法,基于网络局部链接信息计算相邻节点的相似性,将其与属性接近度联合建模,通过矩阵分解的分布式算法得到每个节点对应的低维嵌入向量,即把网络节点映射为低维向量表示的数据点集合。接着提出基于曲率和模块度的社区划分方法,自动确定数据点集合中蕴含的社区数量,并通过对数据点集合聚类完成属性网络社区划分。在真实网络数据集上,将CDEMF方法与其他8种知名算法进行比较,实验结果表明CDEMF具有良好的性能。

关键词: 属性网络嵌入, 矩阵分解, 自动聚类, 社区发现, 曲率

Abstract: An attributed network contains not only the complex topological structure but also the nodes with rich attribute information.It can be used to more effectively model modern information systems than traditional networks.Community detection of the attributed network has important research value in hierarchical analysis of complex systems,control of information propagation in the network,and prediction of group behavior of network users.In order to make better use of topology information and attribute information for community discovery,an attributed network embedding based on matrix factorization and community detection(CDEMF) are proposed.First,an attributed network embedding method based on matrix factorization is proposed to model the attributed proximity and the similarity of adjacent nodes calculated in term of the local link information of the network,where the low-dimensional embedding vector corresponding to each node can be obtained by a distributed algorithm of matrix decomposition,that is,the network nodes can be mapped into a collection of data points represented by low-dimensional vectors.Then the community detection method based on curvature and modularity is developed to achieve attributed network community division by clustering the data point set,which can automatically determine the number of communities contained in the data point set.CDEMF is compared with the other 8 kinds of well-known approaches on public real network datasets.The experimental results demonstrate the effectiveness and superiority of CDEMF.

Key words: Attributed network embedding, Matrix factorization, Automatic clustering, Community detection, Curvature

中图分类号: 

  • TP391
[1]BLONDEL V D,GUILLAUME J L,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J].Journal of Statistical Mechanics:Theory and Experiment,2008(10):P10008.
[2]RAGHAVAN U N,ALBERT R,KUMARA S.Near linear time algorithm to detect community structures in large-scale networks[J].Physical Review E,2007,76(3):036106.
[3]ROZEMBERCZKI B,DAVIES R,SARKAR R,et al.GEMSEC:Graph embedding with self clustering[C]//Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.2019:65-72.
[4]SUN F Y,QU M,HOFFMANN J,et al.vGRAPH:A generative model for joint community detection and node representation learning[C]//Advances in Neural Information Processing Systems.2019:514-524.
[5]YANG J,MCAULEY J,LESKOVEC J.Community detection in networks with node attributes[C]//2013 IEEE 13th International Conference on Data Mining.IEEE,2013:1151-1156.
[6]JIN D,LIU Z Y,HE R F,et al.A Robust and Strong Explanation Community Detection Method for Attributed Networks[J].Chinese Journal of Computers,2018,41(7):1476-1489.
[7]HUANG X,LI J,HU X.Accelerated attributed network embedding[C]//Proceedings of the 2017 SIAM International Confe-rence on Data Mining.Society for Industrial and Applied Mathematics,2017:633-641.
[8]HUANG X,LI J,ZOU N,et al.A general embedding framework for heterogeneous information learning in large-scale networks[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2018,12(6):1-24.
[9]LI J,HU X,WU L,et al.Robust unsupervised feature selection on networked data[C]//Proceedings of the 2016 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2016:387-395.
[10]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear- ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2014:701-710.
[11]TANG J,QU M,WANG M,et al.Line:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.2015:1067-1077.
[12]WANG H,YANG Y,LIU B.GMC:Graph-based multi-view clustering[J].IEEE Transactions on Knowledge and Data Engineering,2019,32(6):1116-1129.
[13]SUN H,HE F,HUANG J,et al.Network embedding for community detection in attributed networks[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2020,14(3):1-25.
[14]YE Z L,ZHAO H X,ZHANG K,et al.Network representation learning algorithm based on multi-view integration[J].Compu-ter Science,2019,46(1):124-132.
[15]ZHANG B,YU Z,ZHANG W.Community-Centric Graph Convolutional Network for Unsupervised Community Detection[C]//IJCAI.2020.
[16]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//KDD-96 Proceedings.AAAI,1996:226-231.
[17]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[18]CHEN J F,ZHANG M,HE Q.NJW spectral clustering algorithm based on heuristics to determine the number of classes[J].Computer Science,2018,45(2):474-479.
[19]TIBSHIRANI R,WALTHER G,HASTIE T.Estimating the number of clusters in a data set via the gap statistic[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology),2001,63(2):411-423.
[20]SUGAR C A,JAMES G M.Finding the number of clusters in a dataset:An information-theoretic approach[J].Journal of the American Statistical Association,2003,98(463):750-763.
[21]ZHANG Y,MAHDZIUK J,QUEK C H,et al.Curvature-based method for determining the number of clusters[J].Information Sciences,2017,415-416:414-428.
[22]YANG Y,LIU H,GUAN Z,et al.CoHomo:A cluster-attribute correlation aware graph clustering framework[J].Neurocomputing,2020,412:327-338.
[23]KUANG D,DING C,PARK H.Symmetric nonnegative matrix factorization for graph clustering[C]//Proceedings of the 2012 SIAM international conference on data mining.Society for Industrial and Applied Mathematics,2012:106-117.
[24]CALINSKI T,HARABASZ J.A dendrite method for cluster analysis[J].Communications in Statistics-theory and Methods,1974,3(1):1-27.
[25]KRZANOWSKI W J,LAI Y T.A criterion for determining the number of groups in a data set using sum-of-squares clustering[J].Biometrics,1988,44(1):23-34.
[26]NEWMAN M E J.Fast algorithm for detecting community structure in networks[J].Physical Review E,2004,69(6):066133.
[27]NEWMAN M E J.Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences,2006,103(23):8577-8582.
[28]HUANG X,LI J,HU X.Label informed attributed network embedding[C]//Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.2017:731-739.
[29]HUANG Z,ZHONG X,WANG Q,et al.Detecting community in attributed networks by dynamically exploring node attributes and topological structure[J].Knowledge-Based Systems,2020,196:105760.
[30]PAN Y,HU G,QIU J,et al.FLGAI:a unified network embedding framework integrating multi-scale network structures and node attribute information[J].Applied Intelligence,2020,50(11):3976-3989.
[31]GAO Y,GONG M,XIE Y,et al.Community-oriented attributed network embedding[J].Knowledge-Based Systems,2020,193:105418.
[32]YOU X,MA Y,LIU Z.A three-stage algorithm on community detection in social networks[J].Knowledge-Based Systems,2020,187:104822.
[1] 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35.
[2] 官铮, 邓扬琳, 聂仁灿. 光谱重建约束非负矩阵分解的高光谱与全色图像融合[J]. 计算机科学, 2021, 48(9): 153-159.
[3] 陈湘涛, 赵美杰, 杨梅. 基于子图结构的局部社区发现算法[J]. 计算机科学, 2021, 48(9): 244-250.
[4] 詹皖江, 洪植林, 方路平, 吴哲夫, 吕跃华. 基于对抗性学习的协同过滤推荐算法[J]. 计算机科学, 2021, 48(7): 172-177.
[5] 邵超, 宋淑米. 基于信任关系下用户兴趣偏好的协同过滤推荐算法[J]. 计算机科学, 2021, 48(6A): 240-245.
[6] 段菲, 王慧敏, 张超. 面向数据表示的Cauchy非负矩阵分解[J]. 计算机科学, 2021, 48(6): 96-102.
[7] 郝志峰, 廖祥财, 温雯, 蔡瑞初. 基于多上下文信息的协同过滤推荐算法[J]. 计算机科学, 2021, 48(3): 168-173.
[8] 宁懿昕, 谢辉, 姜火文. 图神经网络社区发现研究综述[J]. 计算机科学, 2021, 48(11A): 11-16.
[9] 李雨蓉, 刘杰, 刘亚林, 龚春叶, 王勇. 面向语音分离的深层转导式非负矩阵分解并行算法[J]. 计算机科学, 2020, 47(8): 49-55.
[10] 张清琪, 刘漫丹. 复杂网络社区发现的多目标五行环优化算法[J]. 计算机科学, 2020, 47(8): 284-290.
[11] 刘君良, 李晓光. 个性化推荐系统技术进展[J]. 计算机科学, 2020, 47(7): 47-55.
[12] 李向利, 贾梦雪. 基于预处理的超图非负矩阵分解算法[J]. 计算机科学, 2020, 47(7): 71-77.
[13] 董明刚, 弓佳明, 敬超. 基于谱聚类的多目标进化社区发现算法研究[J]. 计算机科学, 2020, 47(6A): 461-466.
[14] 马海江. 基于卷积神经网络与约束概率矩阵分解的推荐算法[J]. 计算机科学, 2020, 47(6A): 540-545.
[15] 张琴, 陈红梅, 封云飞. 一种基于粗糙集和密度峰值的重叠社区发现方法[J]. 计算机科学, 2020, 47(5): 72-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 蒋苏蓉,蓝江桥,杨玉海. Hadoop框架下的情报分析大数据调度超时预测方法[J]. 计算机科学, 2014, 41(Z6): 409 -413 .
[2] 陈雪萍, 何勇, 肖芬芳. 一类同步自动机及损耗函数分析[J]. 计算机科学, 2019, 46(11A): 535 -538 .
[3] 陆龙龙, 陈统, 潘敏学, 张天. CodeSearcher:基于自然语言功能描述的代码查询[J]. 计算机科学, 2020, 47(9): 1 -9 .
[4] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[5] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[6] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[7] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[8] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[9] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .
[10] 肖锋, 张鹏程, 罗夏朴. 基于正则表达式、程序插桩和代码替换的以太坊智能合约bug检测和修复方法[J]. 计算机科学, 2021, 48(11): 89 -101 .