计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 82-91.doi: 10.11896/jsjkx.241200098
池艺妍, 祁明泽, 黄彭奇子, 段晓君
CHI Yiyan, QI Mingze, HUANGPENG Qizi, DUAN Xiaojun
摘要: 随着大数据时代的到来,复杂网络的规模和复杂性日益增长。受网络规模、动态性和隐私保护等限制,获取网络的完全信息难以实现。网络抽样作为一种有效的解决方案,可以通过获取网络的局部信息来估计整体属性和特征。然而,现有的网络抽样方法在获取节点的过程中往往存在不同程度的度偏置,导致样本网络度分布相较真实度分布出现显著偏差。针对这一问题,提出了一套基于可控偏好抽样的度分布推断框架。该框架包含一种单向边抽样方法,通过在邻居节点的选择概率中引入偏好参数,实现对抽样偏好的精确控制;同时,基于期望最大化算法,构建了针对偏好抽样数据的迭代推断机制,通过修正样本偏好,实现对原始网络度分布的准确推断。在模型网络和真实网络上的实验结果表明,该方法能够准确推断原始网络度分布,为有限观测信息下的网络鲁棒性分析和传播动力学研究等应用场景提供了可靠的属性分析工具。
中图分类号:
[1]ZHOU T,ZHANG Z K,CHEN G R,et al.Opportunities and Challenges of Complex Networks Research[J].Journal of University of Electronic Science and Technology of China,2014,43(1):1-5. [2]WANG X F,LI X,CHEN G R.Network Science:an Introduction[M].Beijing:Higher Education Press,2012. [3]LYU X,LIU C C,CAI M S,et al.Network survey and indirect inference with privacy avoidance[J].Journal of Management Sciences in China,2023,26(5):103-120. [4]HAGBERG A,CONWAY D.Networkx:Network analysis inpython[DB/OL].https://networkx.org/,2024. [5]JU W,LI J,YU W,et al.iGraph:an incremental data processing system for dynamic graph[J].Frontiers of Computer Science,2016,10:462-476. [6]QI X.A review:Random walk in graph sampling[J].arXiv:2209.13103,2022. [7]HECKATHORN D D.Respondent-driven sampling:a new ap-proach to the study of hidden populations[J].Social Problems,1997,44(2):174-199. [8]SALGANIK M J,HECKATHORN D D.Sampling and estimation in hidden populations using respondent-driven sampling[J].Sociological Methodology,2004,34(1):193-240. [9]KURANT M,MARKOPOULOU A,THIRAN P.On the bias of BFS[C]//2010 22nd International Teletraffic Congress(LTC 22).IEEE,2010:1-8. [10]KURANT M,MARKOPOULOU A,THIRAN P.Towards unbiased BFS sampling[J].IEEE Journal on Selected Areas in Communications,2011,29(9):1799-1809. [11]GJOKA M,KURANT M,BUTTS C T,et al.Walking in facebook:A case study of unbiased sampling of OSNs[C]//2010 Proceedings IEEE INFOCOM.IEEE,2010:1-9. [12]KOSSINETS G.Effects of missing data in social networks[J].Social Networks,2006,28(3):247-268. [13]ACHLIOPTAS D,CLAUSET A,KEMPE D,et al.On the bias of traceroute sampling:or,power-law degree distributions in regular graphs[J].Journal of the ACM,2009,56(4):1-28. [14]STUMPF M P,WIUF C,MAY R M.Subnets of scale-free networks are not scale-free:sampling properties of networks[J].Proceedings of the National Academy of Sciences,2005,102(12):4221-4224. [15]ZHANG L,WANG F,JIANG H,et al.Random walk on node cliques for high-quality samples to estimate large graphs with high accuracies and low costs[J].Knowledge and Information Systems,2022,64(7):1909-1935. [16]YOUSUF M I,KIM S.Guided sampling for large graphs[J].Data Mining and Knowledge Discovery,2020,34(4):905-948. [17]EBADI JOKANDAN S M,BAYAT P,FARROKHBAKHTFOUMANI M.CS-and GA-based hybrid evolutionary sampling algorithm for large-scale social networks[J].Social Network Analysis and Mining,2021,11:1-15. [18]ZHAO Y,JIANG H,QIN Y,et al.Preserving minority structures in graph sampling[J].IEEE Transactions on Visualization and Computer Graphics,2020,27(2):1698-1708. [19]AHMED N K,NEVILLE J,KOMPELLA R.Reconsidering the foundations of network sampling[C]//Proceedings of the 2nd Workshop on Information in Networks.2010. [20]ILLENBERGER J,FLÖTTERÖD G.Estimating propertiesfrom snowball sampled networks[J].Social Networks,2011,34:1-15. [21]KION-CROSBY W B,MOROZOV A V.Rapid bayesian infe-rence of global network statistics using random walks[J].Physical Review Letters,2018,121(3):038301. [22]NILAKANTA H.Output analysis of Monte Carlo methods with applications to networks and functional approximation[D].Minnesota:University of Minnesota,2020. [23]PÓSFAI M,BARABÁSI A L.Network science[M].Cambridge:Cambridge University Press,2016. [24]OTSUKA M,TSUGAWA S.Robustness of network attackstrategies against node sampling and link errors[J].PLoS One,2019,14(9):e0221885. [25]FOURNET J,BARRAT A.Estimating the epidemic risk using non-uniformly sampled contact data[J].Scientific Reports,2017,7(1):9975. [26]QI M,TAN S,CHEN P,et al.Efficient network intervention with sampling information[J].Chaos,Solitons & Fractals,2023,166:112952. [27]CUI Y,LI X,LI J,et al.A survey of sampling method for social media embeddedness relationship[J].ACM Computing Surveys,2022,55(4):1-39. [28]AHMED N,NEVILLE J,KOMPELLA R R.Network sampling via edge-based node selection with graph induction[J].Compu-ter Science Technical Reports,2011,11-016:1747. [29]NOH J D,RIEGER H.Random walks on complex networks[J].Physical Review Letters,2004,92(11):118701. [30]TIAN Y,LAMBIOTTE R.Structural balance and random walks on complex networks with complex weights[J].SIAM Journal on Mathematics of Data Science,2024,6(2):372-399. [31]WU Y,CAO N,ARCHAMBAULT D,et al.Evaluation of graph sampling:A visualization perspective[J].IEEE Transactions on Visualization and Computer Graphics,2016,23(1):401-410. [32]FELD S L.Why Your Friends Have More Friends Than You Do[J].American Journal of Sociology,1991,96(6):1464-1477. [33]COHEN R,HAVLIN S,BEN-AVRAHAM D.Efficient immunization strategies for computer networks and populations[J].Physical Review Letters,2003,91(24):247901. [34]LIU Y,HUANG Q,DONG G,et al.Acquaintance immunization with limited knowledge of network structure[J].New Journal of Physics,2023,25(9):093017. [35]XUE Y,BOGDAN P.Reconstructing missing complex networks against adversarial interventions[J].Nature Communications,2019,10(1):1738. [36]ZHENG Y M,JIA C Y,CHANG Z H,et al.A Degree Corrected Stochastic Block Model for Attributed Networks[J].Journal of Computer Research and Development,2020,57(8):1650-1662. [37]KIM M,LESKOVEC J.The network completion problem:Inferring missing nodes and edges in networks[C]//Proceedings of the 2011 SIAM International Conference on Data Mining.2011:47-58. [38]ZHANG X.Research on Multi-source Data Network Structure Inference and Balance Robustness of Network[D].Shenzhen:Shenzhen University,2020. [39]YANG Z,COHEN W,SALAKHUDINOV R.Revisiting semi-supervised learning with graph embeddings[C]//International Conference on Machine Learning.2016:40-48. [40]ROZEMBERCZKI B,SARKAR R.Characteristic functions ongraphs:Birds of a feather,from statistical descriptors to parametric models[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management.2020:1325-1334. [41]ZAFARANI R,LIU H.Users joining multiple sites:Distributions and patterns[C]//Proceedings of the International AAAI Conference on Web and Social Media.2014:635-638. [42]RICHARDSON M,AGRAWAL R,DOMINGOS P.Trustmanagement for the semantic web[C]//International Semantic Web Conference.Springer,2003:351-368. [43]KLIMT B,YANG Y.The enron corpus:A new dataset for email classification research[C]//European Conference on Machine Learning.Springer,2004:217-226. |
|