计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 314-320.doi: 10.11896/j.issn.1002-137X.2019.03.046
赵倩倩,吕敏,许胤龙
ZHAO Qian-qian, LV Min, XU Yin-long
摘要: graphlets是指大规模网络中节点数目较少的连通诱导子图,在社交网络和生物信息学领域有着广泛的应用。由于精确计数的计算成本较高,目前大多采用随机游走采样算法来近似估计graphlets的频率。随着节点数目的增多,graphlets的种类数增长迅速且结构变化复杂,快速估计大规模网络中所有种类的graphlets的频率是一项挑战。文中提出了基于两种子结构的随机游走采样算法CSRW2来估计graphlets频率,即给定graphlets节点数k(k=4,5),通过采样k-graphlets的子结构(k-1)-path和3-star得到两种样本,之后用比例放大法综合,以高效估计graphlets并适应graphlets结构的复杂变化。实验结果表明,CSRW2能以统一的框架估计所有 k-graphlets类型的频率,其估计精度优于现有代表性算法,更适用于频率较低且结构较稠密的graphlets。例如,用CSRW2估计真实网络sofb-Penn94中的5-graphlets,当样本数为2万时,标准均方根误差的平均值由WRW算法的0.8降低至CSRW2算法的0.22左右。
中图分类号:
[1]MILO R,SHEN-ORR S,ITZKOVITZ S,et al.Network motifs:simple building blocks of complex networks[J].Science,2002,298(5594):824-827. [2]UGANDER J,BACKSTROM L,KLEINBERG J.Subgraph fre- quencies:mapping the empirical and extremal geography of large graph collections[C]∥Proceedings of the 22nd international conference on World Wide Web.ACM,2013:1307-1318. [3]AKOGLU L,TONG H,KOUTRA D.Graph based anomaly detection and description:a survey[J].Data Mining and Know-ledge Discovery,2015,29(3):626-688. [4]MILENKOVIC′ T,PRULJ N.Uncovering Biological Network Function via Graphlet Degree Signatures[J].Cancer Informa-tics,2008,6:257-273. [5]HARDIMAN S J,KATZIR L.Estimating clustering coefficients and size of social networks via random walk[C]∥Proceedings of the 22nd International Conference on World Wide Web.ACM,2013:539-550. [6]WANG J C,CHANG C H.How online social ties and product-related risks influence purchase intentions:A Facebook experiment[J].Electronic Commerce Research and Applications,2013,12(5):337-346. [7]PINAR A,SESHADHRI C,VISHAL V.Escape:Efficiently counting all 5-vertex subgraphs[C]∥Proceedings of the 26th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2017:1431-1440. [8]HOCˇEVAR T,DEMAR J.A combinatorial approach to graphletcounting[J].Bioinformatics,2014,30(4):559-565. [9]SESHADHRI C,PINAR A,KOLDA T G.Triadic measures on graphs:The power of wedge sampling[C]∥Proceedings of the 2013 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2013:10-18. [10]JHA M,SESHADHRI C,PINAR A.Path sampling:A fast and provable method for estimating 4-vertex subgraph counts[C]∥Proceedings of the 24th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2015:495-505. [11]HAN G,SETHU H.Waddling random walk:Fast and accurate mining of motif statistics in large graphs[C]∥Proceedings of the 2016 IEEE 16th International Conference on Data Mining.IEEE,2016:181-190. [12]BHUIYAN M A,RAHMAN M,AL HASAN M.GUISE:Uniform Sampling of Graphlets for Large Graph Analysis[C]∥Proceedings of the 2012 IEEE 12th International Conference on Data Mining.IEEE Computer Society,2012:91-100. [13]CHEN X,LI Y,WANG P,et al.A general framework for estimating graphlet statistics via random walk[J].Proceedings of the VLDB Endowment,2016,10(3):253-264. [14]LESKOVEC J,KREVL A.SNAP Datasets:Stanford Large Network Dataset Collection[OL].http://snap.stanford.edu/data. |
[1] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[2] | 魏鹏, 马玉亮, 袁野, 吴安彪. 用户行为驱动的时序影响力最大化问题研究 Study on Temporal Influence Maximization Driven by User Behavior 计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145 |
[3] | 余皑欣, 冯秀芳, 孙静宇. 结合物品相似性的社交信任推荐算法 Social Trust Recommendation Algorithm Combining Item Similarity 计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217 |
[4] | 畅雅雯, 杨波, 高玥琳, 黄靖云. 基于SEIR的微信公众号信息传播建模与分析 Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR 计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169 |
[5] | 左园林, 龚月姣, 陈伟能. 成本受限条件下的社交网络影响最大化方法 Budget-aware Influence Maximization in Social Networks 计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228 |
[6] | 郭磊, 马廷淮. 基于好友亲密度的用户匹配 Friend Closeness Based User Matching 计算机科学, 2022, 49(3): 113-120. https://doi.org/10.11896/jsjkx.210200137 |
[7] | 王剑, 王玉翠, 黄梦杰. 社交网络中的虚假信息:定义、检测及控制 False Information in Social Networks:Definition,Detection and Control 计算机科学, 2021, 48(8): 263-277. https://doi.org/10.11896/jsjkx.210300053 |
[8] | 谭琪, 张凤荔, 王婷, 王瑞锦, 周世杰. 融入结构度中心性的社交网络用户影响力评估算法 Social Network User Influence Evaluation Algorithm Integrating Structure Centrality 计算机科学, 2021, 48(7): 124-129. https://doi.org/10.11896/jsjkx.200600096 |
[9] | 张人之, 朱焱. 基于主动学习的社交网络恶意用户检测方法 Malicious User Detection Method for Social Network Based on Active Learning 计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151 |
[10] | 鲍志强, 陈卫东. 基于最大后验估计的谣言源定位器 Rumor Source Detection in Social Networks via Maximum-a-Posteriori Estimation 计算机科学, 2021, 48(4): 243-248. https://doi.org/10.11896/jsjkx.200400053 |
[11] | 张少杰, 鹿旭东, 郭伟, 王世鹏, 何伟. 供需匹配中的非诚信行为预防 Prevention of Dishonest Behavior in Supply-Demand Matching 计算机科学, 2021, 48(4): 303-308. https://doi.org/10.11896/jsjkx.200900090 |
[12] | 袁得嵛, 陈世聪, 高见, 王小娟. 基于斯塔克尔伯格博弈的在线社交网络扭曲信息干预算法 Intervention Algorithm for Distorted Information in Online Social Networks Based on Stackelberg Game 计算机科学, 2021, 48(3): 313-319. https://doi.org/10.11896/jsjkx.200400079 |
[13] | 谭琪, 张凤荔, 张志扬, 陈学勤. 社交网络用户影响力的建模方法 Modeling Methods of Social Network User Influence 计算机科学, 2021, 48(2): 76-86. https://doi.org/10.11896/jsjkx.191200102 |
[14] | 郁友琴, 李弼程. 基于多粒度文本特征表示的微博用户兴趣识别 Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation 计算机科学, 2021, 48(12): 219-225. https://doi.org/10.11896/jsjkx.201100128 |
[15] | 刘丹, 赵森, 颜志良, 赵静, 王会青. 基于堆叠自动编码器的miRNA-疾病关联预测方法 miRNA-disease Association Prediction Model Based on Stacked Autoencoder 计算机科学, 2021, 48(10): 114-120. https://doi.org/10.11896/jsjkx.200900169 |
|