Computer Science ›› 2018, Vol. 45 ›› Issue (11): 249-255.doi: 10.11896/j.issn.1002-137X.2018.11.039

• Artificial Intelligence • Previous Articles     Next Articles

SNS:A Fast and Unbiased Stratified Graph Sampling Algorithm

ZHU Jun-peng, LI Hui, CHEN Mei, DAI Zhen-yu   

  1. (College of Computer Science and Technology,Guizhou University,Guiyang 550025,China)
    (Guizhou Engineering Laboratory of Advance Computing and Medical Information Service,Guiyang 550025,China)
  • Received:2017-10-08 Published:2019-02-25

Abstract: As an effective method of statistical analysis,sampling is commonly used in the field of analyzing the large-scale graph data to improve the performance.However,most of the existing graph sampling algorithms often have the problem of excessive sampling of high and low nodes,resulting in lower accuracy derived from the scale-free characteristic of complex networks.The scale-free characteristic means the degrees of different nodes follow a power law distribution,and the difference between nodes is huge.On the basis of the sampling method on node selection strategy,combining the approximate degree distribution strategy of nodes,this paper proposed and realized an efficient and unbiased stratified graph sampling algorithm named SNS.The experimental results show that SNS algorithm preserves more topological properties on three real data sets than other graph sampling algorithms,and consumes less time than FFS algorithm.Therefore,SNS algorithm is superior to the existing algorithms in terms of the unbiasedness of degree and the accuracy of sampling results.

Key words: Biased sampling, Graph sampling, Performance estimation, Stratified sampling, Vector clustering

CLC Number: 

  • TP393
[1]LESKOVEC J,FALOUTSOS C.Sampling from large graphs [C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2006:631-636.
[2]AHMED N K,NEVILLE J,KOMPELLA R.Network Samp- ling:From Static to Streaming Graphs[J].Acm Transactions on Knowledge Discovery from Data,2013,8(2):1-56.
[3]WATTS D J,STROGATZ S H.Collective dynamics of ‘small-world’ networks[J].Nature,1998:393(6684):440-442.
[4]FALOUTSOS M,FALOUTSOS P,FALOUTSOS C.On power-law relationships of the Internet topology[J].Acm Sigcomm Computer Communication Review,1999:29(4):251-262.
[5]HAN J.Data Mining:Concepts and Techniques[M].San Francisco,CA,USA:Morgan Kaufmann Publishers Inc,2005.
[6]HAMERLY G.Making k -means even faster[C] ∥SIAM International Conference on Data Mining 2010.2010:130-140.
[7]JORGENSEN M.EM Algorithm[M]∥Encyclopedia of Envi- ronmetrics.John Wiley & Sons,Ltd,2006:267-292.
[8]PARK H S,JUN C H.A simple and fast algorithm for K-Medoids clustering[J].Expert Systems with Applications,2009,36(2):3336-3341.
[9]DASZYKOWSKI M,WALCZAK B.Density-Based Clustering Methods[M]∥Comprehensive Chemometrics.Elsevier,2010.
[10]CAMPELLO R J G B,MOULAVI D,SANDER J.Density- Based Clustering Based on Hierarchical Density Estimates[C]∥Pacific-Asia Conference on Knowledge Discovery & Data Mi-ning.2013:160-172.
[11]SNAP[OL].http://snap.stanford.edu/data/index.html.
[12]DASGUPTA A,KUMAR R,SIVAKUMAR D.Social sampling[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:235-243.
[13]GJOKA M,KURANT M,BUTTS C T,et al.Walking in Facebook:A Case Study of Unbiased Sampling of OSNs[C]∥INFOCOM,2010 Proceedings IEEE.IEEE,2010:1-9.
[14]PALMER C R,FALOUTSOS C.Density biased sampling:an improved method for data mining and clustering[J].Acm Sigmod Record,2000,29(2):82-92.
[15]CALLAGHAN M,CALLAGHAN M,CALLAGHAN M,et al.LinkBench:a database benchmark based on the Facebook social graph[C]∥ACM SIGMOD International Conference on Mana-gement of Data.ACM,2013:1185-1196.
[16]FLESCA S,GRECO S.Querying graph databases[C]∥International Conference on Extending Database Technology:Advances in Database Technology.Springer-Verlag,2000:510-524.
[17]MAASS S,MIN C,KASHYAP S,et al.Mosaic:Processing a Trillion-Edge Graph on a Single Machine[C]∥Twelfth Euro-pean Conference on Computer Systems.ACM,2017:527-543.
[18]ZHAO J,WANG P,LUI J C S,et al.Sampling Online Social Networks by Random Walk with Indirect Jumps[OL].http://www.researchgate.net/publication/319391447_Sampling_Online_Social_Networks_by_Random_Walk_with_Indirect_Jumps.
[19]WAGNER C,SINGER P,KARIMI F,et al.Sampling from Social Networks with Attributes[C]∥Conference www’17Proceedings of the 26th International Conference on World Wide Wep.2017:1181-1190.
[20]HASAN M A.Methods and Applications of Network Sampling[C]∥SIAM Conference on Data Mining.2016.
[21]YU L.Sampling and characterizing online social networks[C]∥IISA 2016.2016:245-249.
[22]REZVANIAN A,MEYBODI M R.Sampling algorithms for weighted networks[J].Social Network Analysis & Mining,2016,6(1):1-22.
[23]VOUDIGARI E,SALAMANOS N,PAPAGEORGIOU T,et al.Rank degree:An efficient algorithm for graph sampling[C]∥IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.IEEE,2016:120-129.
[24]ZHAO J,LUI J C S,TOWSLEY D,et al.A tale of three graphs:Sampling design on hybrid social-affiliation networks[C]∥IEEE,International Conference on Data Engineering.IEEE,2015:939-950.
[25]CUI Y A,LI X,WANG Z X,et al.A Comparison on Methodologies of Sampling Online Social Media[J].Chinese Journal of Computers,2014,37(8):1859-1876.(in Chinese)
崔颖安,李雪,王志晓,等.在线社交媒体数据抽样方法的比较研究[J].计算机学报,2014,37(8):1859-1876.
[26]TANG J T,WANG T,WANG J.Shortest Path Approximate Algorithm for Complex Network Analysis[J].Journal of Software,2011,22(10):2279-2290.(in Chinese)
唐晋韬,王挺,王戟.适合复杂网络分析的最短路径近似算法[J].软件学报,2011,22(10):2279-2290.
[1] LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.
[2] JIANG Yong-sen, XIAO Quan and WANG Shou-jue. Non-parametric Dynamic Background Modeling Based on Direction Feature of Vector [J]. Computer Science, 2016, 43(3): 291-295.
[3] . Research on Hardware/Software Task Partitioning for Reconfigurable System [J]. Computer Science, 2012, 39(3): 276-278.
[4] WANG Kao-jie,ZHENG Xue-feng,SONG Yi-ding,AN Feng-liang. Local Cluster Based Biased Sampling of Trajectory Stream [J]. Computer Science, 2011, 38(5): 135-137.
[5] HU Xin-tao,LI Gang,GUO Lei. Cortical Surface Parcellation Using STAPLE Algorithm [J]. Computer Science, 2011, 38(3): 279-282.
[6] MA Zhen-yuan,ZHOU Jie,CHEN Chu,ZHANG Ling. Segmental Measurement Based Approach to Estimate Internet End-to-end Performance Parameter [J]. Computer Science, 2010, 37(3): 138-140.
[7] YU Bo ,ZHU Dong-hua ,LIU Song ,ZHENG Tao (School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China). [J]. Computer Science, 2009, 36(2): 207-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!