计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 117-123.doi: 10.11896/jsjkx.230100018

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于代表性节点扩张的保持社区结构的图采样算法

宏宇1, 陈鸿昶2, 张建朋2, 黄瑞阳2, 李邵梅2   

  1. 1 郑州大学网络空间安全学院 郑州450000
    2 中国人民解放军战略支援部队信息工程大学信息技术研究所 郑州450000
  • 收稿日期:2023-01-03 修回日期:2023-12-19 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 张建朋(j_zhang_edu@sina.com)
  • 作者简介:(1161738832@qq.com)
  • 基金资助:
    国家自然科学基金(62002384);中国博士后科学基金面上项目(2020M683760);嵩山实验室项目(221100210700-03)

Graph Sampling Algorithm Based on Representative Node Expansion to Maintain CommunityStructure

HONG Yu1, CHEN Hongchang2, ZHANG Jianpeng2, HUANG Ruiyang2 , LI Shaomei2   

  1. 1 College of Cyberspace Security,Zhengzhou University,Zhengzhou 450000,China
    2 Institute of Information Technology,PLA Strategic Support Force Information Engineering University,Zhengzhou 450000,China
  • Received:2023-01-03 Revised:2023-12-19 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62002384),China Postdoctoral Science Foundation(2020M683760) and Songshan Laboratory Project(221100210700-03).

摘要: 作为一种能够简化大规模图并保留其指定属性的方法,图采样被广泛应用于现实生活中。然而当前研究大多集中于保留节点级的性质,如度分布等,而忽略了图的社区结构等更为重要的信息。针对此问题,提出了一种保持社区结构的图采样算法。算法主要分为两个步骤,第一步为初始化社区代表点,根据提出的节点重要度计算公式算出节点的重要度,然后选出每个社区的代表性节点;第二步为社区结构扩张,针对每个社区,选择可能引入最少额外邻居的节点加入社区中,直到达到该社区节点上限。在多个真实数据集上进行了对比实验,使用多个评价指标来评估实验结果。实验结果表明,所提出的采样算法能够很好地保持原始图的社区结构,为大规模图的社区结构采样提供了可行的解决方案。

关键词: 图采样, 社区结构, 代表性节点, 扩张, 重要度

Abstract: Graph sampling is widely used in real life as a method to simplify large-scale graphs and retain specified properties.However,most of the current research focuses on preserving node-level properties,such as degree distribution,while ignoring more important information such as the community structure of graphs.To solve this problem,a graph sampling algorithm is proposed to maintain the community structure.The algorithm is divided into two steps.The first step is to initialize the community representative points,and the node importance is calculated according to the proposed node importance calculation formula,and then the representative nodes of each community are selected.The second step is to expand the community structure.For each community,it selects the node that can introduce the least additional neighbors to join the community until the upper limit of the community node is reached.Comparative experiments are conducted on a number of real data sets,and multiple evaluation indicators are adopted to evaluate the experimental results.Experimental results show that the proposed sampling algorithm can well maintain the overall community structure,and provides a feasible solution for sampling community structure of large-scale graphs.

Key words: Graph sampling, Community structure, Representative nodes, Expansion, Importance

中图分类号: 

  • TP391
[1]ROZEMBERCZKI B,KISS O,SARKAR R.Little Ball of Fur:A Python Library for Graph Sampling[C]//Proceedings of the 29th ACM International Conference on Information & Know-ledge Management.Virtual Event Ireland:Association for Computing Machinery,2020:3133-3140.
[2]AHMED N,NEVILLE J,KOMPELLA R.Network Sampling:From Static to Streaming Graphs [J].ACM Transactions on Knowledge Discovery from Data,2014,8(2):1-56.
[3]LESKOVEC U,FALOUTSOS C.Sampling from large graphs[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:Association for Computing Machinery,2006:631-636.
[4]RIBERIO B,TOWSLEY D.Estimating and sampling graphswith multidimensional random walks [C]//Internet Measurement Conference.New York:Association for Computing Machinery,2010:390-403.
[5]ZHAO Y,JIANG H J,CHEN Q A,et al.Preserving Minority Structures in Graph Sampling [J].IEEE Transactions on Visua-lization and Computer Graphics,2021,27(2):1698-1708.
[6]MAIYA A S,BERGER-WOLF T Y.Sampling communitystructure[C]//Proceedings of the 19th International World Wide Web Conference.New York:Association for Computing Machinery,2010:701-710.
[7]WANG F,CHEUNG G,WANG Y C.Low-complexity GraphSampling With Noise and Signal Reconstruction via Neumann Series[J].IEEE Transactions on Signal Processing,2019,67(21):5511-5526.
[8]JIAN B,SHI J M,ZHANG W S,et al.Graph sampling for Internet topologies using normalized Laplacian spectral features[J].Information Sciences,2019,481:574-603.
[9]ZHOU Z G,SHI C,SHEN X L,et al.Context-aware Sampling of Large Networks via Graph Representation Learning[J].IEEE Transactions on Visualization and Computer Graphics,2021,27(2):1709-1719.
[10]HU J B,DAI G H,WANG Y,et al.GraphSDH:A GeneralGraph Sampling Framework with Distribution and Hierarchy[C]//2020 IEEE High Performance Extreme Computing Conference(HPEC).Waltham:IEEE,2020:1-7.
[11]MALL R,LANGONE R,SUYKENS J A K.FURS:Fast andUnique Representative Subset selection retaining large-scale community structure[J].Social Network Analysis and Mining,2013,3(4):1075-1095.
[12]ZHANG J P,CHEN H C,YU D J,et al.Cluster-preserving sampling algorithm for large-scale graphs[J].Science China Information Sciences,2022,66:112103.
[13]ZHANG J P,PEI Y L,GEORGE F,et al.Evaluation of theSample Clustering Process on Graphs[J].IEEE Transactions on Knowledge and Data Engineering,2020,32(7):1333-1347.
[14]SANTO F,DARTO H.Community detection in networks:Auser guide[J].Physics Reports,2016,659:1-44.
[15]MARK E J N.Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences,2006,103(23):8577-8582.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!