计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 314-320.doi: 10.11896/j.issn.1002-137X.2019.03.046

• 交叉与前沿 • 上一篇    下一篇

基于两种子结构感知的社交网络Graphlets采样估计算法

赵倩倩,吕敏,许胤龙   

  1. (中国科学技术大学计算机科学与技术学院高性能计算安徽省重点实验室 合肥 230026)
  • 收稿日期:2018-01-31 修回日期:2018-05-15 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 吕敏(1977-),女,博士,讲师,CCF会员,主要研究方向为图计算、数据隐私与安全,E-mail:lvmin05@ustc.edu.cn
  • 作者简介:赵倩倩(1993-),女,硕士生,主要研究方向为社交网络、图计算;许胤龙(1963-),男,博士,教授,博士生导师,CCF会员,主要研究方向为存储系统、图计算。
  • 基金资助:
    国家自然科学基金面上项目(61672486)资助

Estimating Graphlets via Two Common Substructures Aware Sampling in Social Networks

ZHAO Qian-qian, LV Min, XU Yin-long   

  1. (Anhui Key Laboratory on High Performance Computing,School of Computer Science and Technology, University of Science and Technology of China,Hefei 230026,China)
  • Received:2018-01-31 Revised:2018-05-15 Online:2019-03-15 Published:2019-03-22

摘要: graphlets是指大规模网络中节点数目较少的连通诱导子图,在社交网络和生物信息学领域有着广泛的应用。由于精确计数的计算成本较高,目前大多采用随机游走采样算法来近似估计graphlets的频率。随着节点数目的增多,graphlets的种类数增长迅速且结构变化复杂,快速估计大规模网络中所有种类的graphlets的频率是一项挑战。文中提出了基于两种子结构的随机游走采样算法CSRW2来估计graphlets频率,即给定graphlets节点数k(k=4,5),通过采样k-graphlets的子结构(k-1)-path和3-star得到两种样本,之后用比例放大法综合,以高效估计graphlets并适应graphlets结构的复杂变化。实验结果表明,CSRW2能以统一的框架估计所有 k-graphlets类型的频率,其估计精度优于现有代表性算法,更适用于频率较低且结构较稠密的graphlets。例如,用CSRW2估计真实网络sofb-Penn94中的5-graphlets,当样本数为2万时,标准均方根误差的平均值由WRW算法的0.8降低至CSRW2算法的0.22左右。

关键词: Graphlet, Graphlet频率, 采样算法, 社交网络, 随机游走, 无偏估计

Abstract: Graphlets refer to the connected induced subgraphs with small amount of nodes in large-scale network,and have extensive applications in social networks and bioinformatics.Due to extremely high computational costs of exactly counting graphlets,approximately estimating graphlets concentrations via random walk sampling algorithms already becomes the mainstream approach.As node size k increases,the number of k-graphlets increases rapidly and their structures change dramatically,so it is a challenge to quickly estimate the relative frequency of all types of graphlets (graphlet concentrations) in a large-scale network.Aiming at this problem,this paper proposed a novel sampling algorithm,namely common substructures path and 3-star based graphlets sampling via random walk (CSRW2),to efficiently estimate graphlets concentrations.Given k (k=4,5),apart from sampling path via random walk,CSRW2 also samplesano-ther substructure 3-star,and then derives graphlets concentrations by proportional amplification to find the dense graphlets with less appearance more efficiently and adapt to the complex structural changes.Experimental evaluations on real networks demonstrate that CSRW2 can estimate k-graphlets in a uniform framework.CSRW2 outperforms the representative methods in terms of accuracy and is more accurate for the k-graphlets with more edges and less appearances in graphs.For example,when 5-graphlets in sofb-Penn94 is estimated,the average NRMSE of all 5-graphlets is decreased to 0.22 via CSRW2 in contrast to 0.8 obtained by WRW.

Key words: Graphlet, Graphlet concentration, Random walk, Sampling algorithm, Social network, Unbiased estimation

中图分类号: 

  • TP393
[1]MILO R,SHEN-ORR S,ITZKOVITZ S,et al.Network motifs:simple building blocks of complex networks[J].Science,2002,298(5594):824-827.
[2]UGANDER J,BACKSTROM L,KLEINBERG J.Subgraph fre-
quencies:mapping the empirical and extremal geography of large graph collections[C]∥Proceedings of the 22nd international conference on World Wide Web.ACM,2013:1307-1318.
[3]AKOGLU L,TONG H,KOUTRA D.Graph based anomaly detection and description:a survey[J].Data Mining and Know-ledge Discovery,2015,29(3):626-688.
[4]MILENKOVIC′ T,PRULJ N.Uncovering Biological Network Function via Graphlet Degree Signatures[J].Cancer Informa-tics,2008,6:257-273.
[5]HARDIMAN S J,KATZIR L.Estimating clustering coefficients and size of social networks via random walk[C]∥Proceedings of the 22nd International Conference on World Wide Web.ACM,2013:539-550.
[6]WANG J C,CHANG C H.How online social ties and product-related risks influence purchase intentions:A Facebook experiment[J].Electronic Commerce Research and Applications,2013,12(5):337-346.
[7]PINAR A,SESHADHRI C,VISHAL V.Escape:Efficiently
counting all 5-vertex subgraphs[C]∥Proceedings of the 26th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2017:1431-1440.
[8]HOCˇEVAR T,DEMAR J.A combinatorial approach to graphletcounting[J].Bioinformatics,2014,30(4):559-565.
[9]SESHADHRI C,PINAR A,KOLDA T G.Triadic measures on graphs:The power of wedge sampling[C]∥Proceedings of the 2013 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2013:10-18.
[10]JHA M,SESHADHRI C,PINAR A.Path sampling:A fast and provable method for estimating 4-vertex subgraph counts[C]∥Proceedings of the 24th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2015:495-505.
[11]HAN G,SETHU H.Waddling random walk:Fast and accurate mining of motif statistics in large graphs[C]∥Proceedings of the 2016 IEEE 16th International Conference on Data Mining.IEEE,2016:181-190.
[12]BHUIYAN M A,RAHMAN M,AL HASAN M.GUISE:Uniform Sampling of Graphlets for Large Graph Analysis[C]∥Proceedings of the 2012 IEEE 12th International Conference on Data Mining.IEEE Computer Society,2012:91-100.
[13]CHEN X,LI Y,WANG P,et al.A general framework for estimating graphlet statistics via random walk[J].Proceedings of the VLDB Endowment,2016,10(3):253-264.
[14]LESKOVEC J,KREVL A.SNAP Datasets:Stanford Large Network Dataset Collection[OL].http://snap.stanford.edu/data.
[1] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[2] 魏鹏, 马玉亮, 袁野, 吴安彪.
用户行为驱动的时序影响力最大化问题研究
Study on Temporal Influence Maximization Driven by User Behavior
计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145
[3] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[4] 畅雅雯, 杨波, 高玥琳, 黄靖云.
基于SEIR的微信公众号信息传播建模与分析
Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR
计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169
[5] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[6] 郭磊, 马廷淮.
基于好友亲密度的用户匹配
Friend Closeness Based User Matching
计算机科学, 2022, 49(3): 113-120. https://doi.org/10.11896/jsjkx.210200137
[7] 王剑, 王玉翠, 黄梦杰.
社交网络中的虚假信息:定义、检测及控制
False Information in Social Networks:Definition,Detection and Control
计算机科学, 2021, 48(8): 263-277. https://doi.org/10.11896/jsjkx.210300053
[8] 谭琪, 张凤荔, 王婷, 王瑞锦, 周世杰.
融入结构度中心性的社交网络用户影响力评估算法
Social Network User Influence Evaluation Algorithm Integrating Structure Centrality
计算机科学, 2021, 48(7): 124-129. https://doi.org/10.11896/jsjkx.200600096
[9] 张人之, 朱焱.
基于主动学习的社交网络恶意用户检测方法
Malicious User Detection Method for Social Network Based on Active Learning
计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151
[10] 鲍志强, 陈卫东.
基于最大后验估计的谣言源定位器
Rumor Source Detection in Social Networks via Maximum-a-Posteriori Estimation
计算机科学, 2021, 48(4): 243-248. https://doi.org/10.11896/jsjkx.200400053
[11] 张少杰, 鹿旭东, 郭伟, 王世鹏, 何伟.
供需匹配中的非诚信行为预防
Prevention of Dishonest Behavior in Supply-Demand Matching
计算机科学, 2021, 48(4): 303-308. https://doi.org/10.11896/jsjkx.200900090
[12] 袁得嵛, 陈世聪, 高见, 王小娟.
基于斯塔克尔伯格博弈的在线社交网络扭曲信息干预算法
Intervention Algorithm for Distorted Information in Online Social Networks Based on Stackelberg Game
计算机科学, 2021, 48(3): 313-319. https://doi.org/10.11896/jsjkx.200400079
[13] 谭琪, 张凤荔, 张志扬, 陈学勤.
社交网络用户影响力的建模方法
Modeling Methods of Social Network User Influence
计算机科学, 2021, 48(2): 76-86. https://doi.org/10.11896/jsjkx.191200102
[14] 郁友琴, 李弼程.
基于多粒度文本特征表示的微博用户兴趣识别
Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation
计算机科学, 2021, 48(12): 219-225. https://doi.org/10.11896/jsjkx.201100128
[15] 刘丹, 赵森, 颜志良, 赵静, 王会青.
基于堆叠自动编码器的miRNA-疾病关联预测方法
miRNA-disease Association Prediction Model Based on Stacked Autoencoder
计算机科学, 2021, 48(10): 114-120. https://doi.org/10.11896/jsjkx.200900169
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!