Computer Science ›› 2019, Vol. 46 ›› Issue (3): 314-320.doi: 10.11896/j.issn.1002-137X.2019.03.046

• Interdiscipline & Frontier • Previous Articles     Next Articles

Estimating Graphlets via Two Common Substructures Aware Sampling in Social Networks

ZHAO Qian-qian, LV Min, XU Yin-long   

  1. (Anhui Key Laboratory on High Performance Computing,School of Computer Science and Technology, University of Science and Technology of China,Hefei 230026,China)
  • Received:2018-01-31 Revised:2018-05-15 Online:2019-03-15 Published:2019-03-22

Abstract: Graphlets refer to the connected induced subgraphs with small amount of nodes in large-scale network,and have extensive applications in social networks and bioinformatics.Due to extremely high computational costs of exactly counting graphlets,approximately estimating graphlets concentrations via random walk sampling algorithms already becomes the mainstream approach.As node size k increases,the number of k-graphlets increases rapidly and their structures change dramatically,so it is a challenge to quickly estimate the relative frequency of all types of graphlets (graphlet concentrations) in a large-scale network.Aiming at this problem,this paper proposed a novel sampling algorithm,namely common substructures path and 3-star based graphlets sampling via random walk (CSRW2),to efficiently estimate graphlets concentrations.Given k (k=4,5),apart from sampling path via random walk,CSRW2 also samplesano-ther substructure 3-star,and then derives graphlets concentrations by proportional amplification to find the dense graphlets with less appearance more efficiently and adapt to the complex structural changes.Experimental evaluations on real networks demonstrate that CSRW2 can estimate k-graphlets in a uniform framework.CSRW2 outperforms the representative methods in terms of accuracy and is more accurate for the k-graphlets with more edges and less appearances in graphs.For example,when 5-graphlets in sofb-Penn94 is estimated,the average NRMSE of all 5-graphlets is decreased to 0.22 via CSRW2 in contrast to 0.8 obtained by WRW.

Key words: Graphlet, Graphlet concentration, Random walk, Sampling algorithm, Social network, Unbiased estimation

CLC Number: 

  • TP393
[1]MILO R,SHEN-ORR S,ITZKOVITZ S,et al.Network motifs:simple building blocks of complex networks[J].Science,2002,298(5594):824-827.
[2]UGANDER J,BACKSTROM L,KLEINBERG J.Subgraph fre-
quencies:mapping the empirical and extremal geography of large graph collections[C]∥Proceedings of the 22nd international conference on World Wide Web.ACM,2013:1307-1318.
[3]AKOGLU L,TONG H,KOUTRA D.Graph based anomaly detection and description:a survey[J].Data Mining and Know-ledge Discovery,2015,29(3):626-688.
[4]MILENKOVIC′ T,PRULJ N.Uncovering Biological Network Function via Graphlet Degree Signatures[J].Cancer Informa-tics,2008,6:257-273.
[5]HARDIMAN S J,KATZIR L.Estimating clustering coefficients and size of social networks via random walk[C]∥Proceedings of the 22nd International Conference on World Wide Web.ACM,2013:539-550.
[6]WANG J C,CHANG C H.How online social ties and product-related risks influence purchase intentions:A Facebook experiment[J].Electronic Commerce Research and Applications,2013,12(5):337-346.
[7]PINAR A,SESHADHRI C,VISHAL V.Escape:Efficiently
counting all 5-vertex subgraphs[C]∥Proceedings of the 26th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2017:1431-1440.
[8]HOCˇEVAR T,DEMAR J.A combinatorial approach to graphletcounting[J].Bioinformatics,2014,30(4):559-565.
[9]SESHADHRI C,PINAR A,KOLDA T G.Triadic measures on graphs:The power of wedge sampling[C]∥Proceedings of the 2013 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2013:10-18.
[10]JHA M,SESHADHRI C,PINAR A.Path sampling:A fast and provable method for estimating 4-vertex subgraph counts[C]∥Proceedings of the 24th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2015:495-505.
[11]HAN G,SETHU H.Waddling random walk:Fast and accurate mining of motif statistics in large graphs[C]∥Proceedings of the 2016 IEEE 16th International Conference on Data Mining.IEEE,2016:181-190.
[12]BHUIYAN M A,RAHMAN M,AL HASAN M.GUISE:Uniform Sampling of Graphlets for Large Graph Analysis[C]∥Proceedings of the 2012 IEEE 12th International Conference on Data Mining.IEEE Computer Society,2012:91-100.
[13]CHEN X,LI Y,WANG P,et al.A general framework for estimating graphlet statistics via random walk[J].Proceedings of the VLDB Endowment,2016,10(3):253-264.
[14]LESKOVEC J,KREVL A.SNAP Datasets:Stanford Large Network Dataset Collection[OL].http://snap.stanford.edu/data.
[1] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[2] WEI Peng, MA Yu-liang, YUAN Ye, WU An-biao. Study on Temporal Influence Maximization Driven by User Behavior [J]. Computer Science, 2022, 49(6): 119-126.
[3] YU Ai-xin, FENG Xiu-fang, SUN Jing-yu. Social Trust Recommendation Algorithm Combining Item Similarity [J]. Computer Science, 2022, 49(5): 144-151.
[4] CHANG Ya-wen, YANG Bo, GAO Yue-lin, HUANG Jing-yun. Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR [J]. Computer Science, 2022, 49(4): 56-66.
[5] ZUO Yuan-lin, GONG Yue-jiao, CHEN Wei-neng. Budget-aware Influence Maximization in Social Networks [J]. Computer Science, 2022, 49(4): 100-109.
[6] GUO Lei, MA Ting-huai. Friend Closeness Based User Matching [J]. Computer Science, 2022, 49(3): 113-120.
[7] SHAO Yu, CHEN Ling, LIU Wei. Maximum Likelihood-based Method for Locating Source of Negative Influence Spreading Under Independent Cascade Model [J]. Computer Science, 2022, 49(2): 204-215.
[8] WANG Jian, WANG Yu-cui, HUANG Meng-jie. False Information in Social Networks:Definition,Detection and Control [J]. Computer Science, 2021, 48(8): 263-277.
[9] TAN Qi, ZHANG Feng-li, WANG Ting, WANG Rui-jin, ZHOU Shi-jie. Social Network User Influence Evaluation Algorithm Integrating Structure Centrality [J]. Computer Science, 2021, 48(7): 124-129.
[10] ZHANG Ren-zhi, ZHU Yan. Malicious User Detection Method for Social Network Based on Active Learning [J]. Computer Science, 2021, 48(6): 332-337.
[11] BAO Zhi-qiang, CHEN Wei-dong. Rumor Source Detection in Social Networks via Maximum-a-Posteriori Estimation [J]. Computer Science, 2021, 48(4): 243-248.
[12] ZHANG Shao-jie, LU Xu-dong, GUO Wei, WANG Shi-peng, HE Wei. Prevention of Dishonest Behavior in Supply-Demand Matching [J]. Computer Science, 2021, 48(4): 303-308.
[13] ZHANG Hao-chen, CAI Ying, XIA Hong-ke. Delivery Probability Based Routing Algorithm for Vehicular Social Network [J]. Computer Science, 2021, 48(3): 289-294.
[14] YUAN De-yu, CHEN Shi-cong, GAO Jian, WANG Xiao-juan. Intervention Algorithm for Distorted Information in Online Social Networks Based on Stackelberg Game [J]. Computer Science, 2021, 48(3): 313-319.
[15] TAN Qi, ZHANG Feng-li, ZHANG Zhi-yang, CHEN Xue-qin. Modeling Methods of Social Network User Influence [J]. Computer Science, 2021, 48(2): 76-86.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!