计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 82-91.doi: 10.11896/jsjkx.241200098

• 计算机软件 • 上一篇    下一篇

基于可控偏好抽样的复杂网络度分布推断方法

池艺妍, 祁明泽, 黄彭奇子, 段晓君   

  1. 国防科技大学理学院 长沙 410073
  • 收稿日期:2024-12-12 修回日期:2025-04-08 发布日期:2025-07-17
  • 通讯作者: 祁明泽(qimingze17@nudt.edu.cn)
  • 作者简介:(chiyiyan@nudt.edu.cn)
  • 基金资助:
    国家自然科学基金(72401290);新一代人工智能国家科技重大专项(2023ZD0121101);湖南省科技厅创新平台项目(2024JC1003)

Degree Distribution Inference Method for Complex Networks Based on Controllable PreferentialSampling

CHI Yiyan, QI Mingze, HUANGPENG Qizi, DUAN Xiaojun   

  1. College of Science, National University of Defense Technology, Changsha 410073, China
  • Received:2024-12-12 Revised:2025-04-08 Published:2025-07-17
  • About author:CHI Yiyan,born in 2001,Ph.D candidate.Her main research interests include complex network and artificial intelligence.
    QI Mingze,born in 1995,Ph.D,lectu-rer.His main research interests include complex system,social network and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(72401290),New Generation Artificial Intelligence National Science and Technology Major Project(2023ZD0121101) and Innovative Platform Project of the Department of Science and Technology of Hunan Province(2024JC1003).

摘要: 随着大数据时代的到来,复杂网络的规模和复杂性日益增长。受网络规模、动态性和隐私保护等限制,获取网络的完全信息难以实现。网络抽样作为一种有效的解决方案,可以通过获取网络的局部信息来估计整体属性和特征。然而,现有的网络抽样方法在获取节点的过程中往往存在不同程度的度偏置,导致样本网络度分布相较真实度分布出现显著偏差。针对这一问题,提出了一套基于可控偏好抽样的度分布推断框架。该框架包含一种单向边抽样方法,通过在邻居节点的选择概率中引入偏好参数,实现对抽样偏好的精确控制;同时,基于期望最大化算法,构建了针对偏好抽样数据的迭代推断机制,通过修正样本偏好,实现对原始网络度分布的准确推断。在模型网络和真实网络上的实验结果表明,该方法能够准确推断原始网络度分布,为有限观测信息下的网络鲁棒性分析和传播动力学研究等应用场景提供了可靠的属性分析工具。

关键词: 复杂网络, 网络抽样, 度偏好可控, 度分布推断, 期望最大化

Abstract: With the advent of the big data era,the scale and complexity of complex networks have grown exponentially.Due to constraints such as network size,dynamics,and privacy protection,obtaining complete information about these networks are impractical.Network sampling serves as an effective solution,allowing estimation of global properties and characteristics through local network information.However,existing network sampling methods often exhibit systematic degree bias during node acquisition,leading to considerable discrepancies between the degree distributions of sampled networks and the ground truth.To address this issue,this paper proposes a degree distribution inference framework based on controllable preferential sampling.This framework incorporates a directed edge sampling method that achieves precise control over sampling preference by introducing prefe-rence parameters into neighbor node selection probabilities.Additionally,it develops an iterative inference mechanism for prefe-rential sampled data based on the expectation-maximization algorithm,which accurately estimates the original network's degreedistribution by correcting sampling preferences.Experimental results on artificial networks and real networks demonstrate that the proposed method can accurately infer the original network's degree distribution,providing a reliable analytical tool for applications such as network robustness analysis and spreading dynamics research under limited observational information.

Key words: Complex network, Network sampling, Degree preference controllability, Degree distribution inference, Expectation maximization

中图分类号: 

  • TP391
[1]ZHOU T,ZHANG Z K,CHEN G R,et al.Opportunities and Challenges of Complex Networks Research[J].Journal of University of Electronic Science and Technology of China,2014,43(1):1-5.
[2]WANG X F,LI X,CHEN G R.Network Science:an Introduction[M].Beijing:Higher Education Press,2012.
[3]LYU X,LIU C C,CAI M S,et al.Network survey and indirect inference with privacy avoidance[J].Journal of Management Sciences in China,2023,26(5):103-120.
[4]HAGBERG A,CONWAY D.Networkx:Network analysis inpython[DB/OL].https://networkx.org/,2024.
[5]JU W,LI J,YU W,et al.iGraph:an incremental data processing system for dynamic graph[J].Frontiers of Computer Science,2016,10:462-476.
[6]QI X.A review:Random walk in graph sampling[J].arXiv:2209.13103,2022.
[7]HECKATHORN D D.Respondent-driven sampling:a new ap-proach to the study of hidden populations[J].Social Problems,1997,44(2):174-199.
[8]SALGANIK M J,HECKATHORN D D.Sampling and estimation in hidden populations using respondent-driven sampling[J].Sociological Methodology,2004,34(1):193-240.
[9]KURANT M,MARKOPOULOU A,THIRAN P.On the bias of BFS[C]//2010 22nd International Teletraffic Congress(LTC 22).IEEE,2010:1-8.
[10]KURANT M,MARKOPOULOU A,THIRAN P.Towards unbiased BFS sampling[J].IEEE Journal on Selected Areas in Communications,2011,29(9):1799-1809.
[11]GJOKA M,KURANT M,BUTTS C T,et al.Walking in facebook:A case study of unbiased sampling of OSNs[C]//2010 Proceedings IEEE INFOCOM.IEEE,2010:1-9.
[12]KOSSINETS G.Effects of missing data in social networks[J].Social Networks,2006,28(3):247-268.
[13]ACHLIOPTAS D,CLAUSET A,KEMPE D,et al.On the bias of traceroute sampling:or,power-law degree distributions in regular graphs[J].Journal of the ACM,2009,56(4):1-28.
[14]STUMPF M P,WIUF C,MAY R M.Subnets of scale-free networks are not scale-free:sampling properties of networks[J].Proceedings of the National Academy of Sciences,2005,102(12):4221-4224.
[15]ZHANG L,WANG F,JIANG H,et al.Random walk on node cliques for high-quality samples to estimate large graphs with high accuracies and low costs[J].Knowledge and Information Systems,2022,64(7):1909-1935.
[16]YOUSUF M I,KIM S.Guided sampling for large graphs[J].Data Mining and Knowledge Discovery,2020,34(4):905-948.
[17]EBADI JOKANDAN S M,BAYAT P,FARROKHBAKHTFOUMANI M.CS-and GA-based hybrid evolutionary sampling algorithm for large-scale social networks[J].Social Network Analysis and Mining,2021,11:1-15.
[18]ZHAO Y,JIANG H,QIN Y,et al.Preserving minority structures in graph sampling[J].IEEE Transactions on Visualization and Computer Graphics,2020,27(2):1698-1708.
[19]AHMED N K,NEVILLE J,KOMPELLA R.Reconsidering the foundations of network sampling[C]//Proceedings of the 2nd Workshop on Information in Networks.2010.
[20]ILLENBERGER J,FLÖTTERÖD G.Estimating propertiesfrom snowball sampled networks[J].Social Networks,2011,34:1-15.
[21]KION-CROSBY W B,MOROZOV A V.Rapid bayesian infe-rence of global network statistics using random walks[J].Physical Review Letters,2018,121(3):038301.
[22]NILAKANTA H.Output analysis of Monte Carlo methods with applications to networks and functional approximation[D].Minnesota:University of Minnesota,2020.
[23]PÓSFAI M,BARABÁSI A L.Network science[M].Cambridge:Cambridge University Press,2016.
[24]OTSUKA M,TSUGAWA S.Robustness of network attackstrategies against node sampling and link errors[J].PLoS One,2019,14(9):e0221885.
[25]FOURNET J,BARRAT A.Estimating the epidemic risk using non-uniformly sampled contact data[J].Scientific Reports,2017,7(1):9975.
[26]QI M,TAN S,CHEN P,et al.Efficient network intervention with sampling information[J].Chaos,Solitons & Fractals,2023,166:112952.
[27]CUI Y,LI X,LI J,et al.A survey of sampling method for social media embeddedness relationship[J].ACM Computing Surveys,2022,55(4):1-39.
[28]AHMED N,NEVILLE J,KOMPELLA R R.Network sampling via edge-based node selection with graph induction[J].Compu-ter Science Technical Reports,2011,11-016:1747.
[29]NOH J D,RIEGER H.Random walks on complex networks[J].Physical Review Letters,2004,92(11):118701.
[30]TIAN Y,LAMBIOTTE R.Structural balance and random walks on complex networks with complex weights[J].SIAM Journal on Mathematics of Data Science,2024,6(2):372-399.
[31]WU Y,CAO N,ARCHAMBAULT D,et al.Evaluation of graph sampling:A visualization perspective[J].IEEE Transactions on Visualization and Computer Graphics,2016,23(1):401-410.
[32]FELD S L.Why Your Friends Have More Friends Than You Do[J].American Journal of Sociology,1991,96(6):1464-1477.
[33]COHEN R,HAVLIN S,BEN-AVRAHAM D.Efficient immunization strategies for computer networks and populations[J].Physical Review Letters,2003,91(24):247901.
[34]LIU Y,HUANG Q,DONG G,et al.Acquaintance immunization with limited knowledge of network structure[J].New Journal of Physics,2023,25(9):093017.
[35]XUE Y,BOGDAN P.Reconstructing missing complex networks against adversarial interventions[J].Nature Communications,2019,10(1):1738.
[36]ZHENG Y M,JIA C Y,CHANG Z H,et al.A Degree Corrected Stochastic Block Model for Attributed Networks[J].Journal of Computer Research and Development,2020,57(8):1650-1662.
[37]KIM M,LESKOVEC J.The network completion problem:Inferring missing nodes and edges in networks[C]//Proceedings of the 2011 SIAM International Conference on Data Mining.2011:47-58.
[38]ZHANG X.Research on Multi-source Data Network Structure Inference and Balance Robustness of Network[D].Shenzhen:Shenzhen University,2020.
[39]YANG Z,COHEN W,SALAKHUDINOV R.Revisiting semi-supervised learning with graph embeddings[C]//International Conference on Machine Learning.2016:40-48.
[40]ROZEMBERCZKI B,SARKAR R.Characteristic functions ongraphs:Birds of a feather,from statistical descriptors to parametric models[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management.2020:1325-1334.
[41]ZAFARANI R,LIU H.Users joining multiple sites:Distributions and patterns[C]//Proceedings of the International AAAI Conference on Web and Social Media.2014:635-638.
[42]RICHARDSON M,AGRAWAL R,DOMINGOS P.Trustmanagement for the semantic web[C]//International Semantic Web Conference.Springer,2003:351-368.
[43]KLIMT B,YANG Y.The enron corpus:A new dataset for email classification research[C]//European Conference on Machine Learning.Springer,2004:217-226.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!