计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 249-255.doi: 10.11896/j.issn.1002-137X.2018.11.039
朱君鹏, 李晖, 陈梅, 戴震宇
ZHU Jun-peng, LI Hui, CHEN Mei, DAI Zhen-yu
摘要: 抽样作为一种有效的统计分析方法,常被用于大规模图数据分析领域以提升性能。现有的图抽样算法大多存在高度节点或低度节点过度入样的问题,较大程度地影响了算法的性能。复杂网络具有无标度特性,即节点的度服从幂律分布,节点个体之间存在较大差异。在基于点选择策略的抽样方法的基础上,通过结合节点的近似度分布策略,设计并实现了高效无偏的分层图抽样算法SNS。在3个真实的图数据集上的实验结果表明,SNS算法比其他图抽样算法保留了更多的拓扑属性,且执行效率比FFS更高。SNS算法在度的无偏性、抽样结果拓扑属性近似性方面的表现均优于现有算法。
中图分类号:
[1]LESKOVEC J,FALOUTSOS C.Sampling from large graphs [C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2006:631-636. [2]AHMED N K,NEVILLE J,KOMPELLA R.Network Samp- ling:From Static to Streaming Graphs[J].Acm Transactions on Knowledge Discovery from Data,2013,8(2):1-56. [3]WATTS D J,STROGATZ S H.Collective dynamics of ‘small-world’ networks[J].Nature,1998:393(6684):440-442. [4]FALOUTSOS M,FALOUTSOS P,FALOUTSOS C.On power-law relationships of the Internet topology[J].Acm Sigcomm Computer Communication Review,1999:29(4):251-262. [5]HAN J.Data Mining:Concepts and Techniques[M].San Francisco,CA,USA:Morgan Kaufmann Publishers Inc,2005. [6]HAMERLY G.Making k -means even faster[C] ∥SIAM International Conference on Data Mining 2010.2010:130-140. [7]JORGENSEN M.EM Algorithm[M]∥Encyclopedia of Envi- ronmetrics.John Wiley & Sons,Ltd,2006:267-292. [8]PARK H S,JUN C H.A simple and fast algorithm for K-Medoids clustering[J].Expert Systems with Applications,2009,36(2):3336-3341. [9]DASZYKOWSKI M,WALCZAK B.Density-Based Clustering Methods[M]∥Comprehensive Chemometrics.Elsevier,2010. [10]CAMPELLO R J G B,MOULAVI D,SANDER J.Density- Based Clustering Based on Hierarchical Density Estimates[C]∥Pacific-Asia Conference on Knowledge Discovery & Data Mi-ning.2013:160-172. [11]SNAP[OL].http://snap.stanford.edu/data/index.html. [12]DASGUPTA A,KUMAR R,SIVAKUMAR D.Social sampling[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:235-243. [13]GJOKA M,KURANT M,BUTTS C T,et al.Walking in Facebook:A Case Study of Unbiased Sampling of OSNs[C]∥INFOCOM,2010 Proceedings IEEE.IEEE,2010:1-9. [14]PALMER C R,FALOUTSOS C.Density biased sampling:an improved method for data mining and clustering[J].Acm Sigmod Record,2000,29(2):82-92. [15]CALLAGHAN M,CALLAGHAN M,CALLAGHAN M,et al.LinkBench:a database benchmark based on the Facebook social graph[C]∥ACM SIGMOD International Conference on Mana-gement of Data.ACM,2013:1185-1196. [16]FLESCA S,GRECO S.Querying graph databases[C]∥International Conference on Extending Database Technology:Advances in Database Technology.Springer-Verlag,2000:510-524. [17]MAASS S,MIN C,KASHYAP S,et al.Mosaic:Processing a Trillion-Edge Graph on a Single Machine[C]∥Twelfth Euro-pean Conference on Computer Systems.ACM,2017:527-543. [18]ZHAO J,WANG P,LUI J C S,et al.Sampling Online Social Networks by Random Walk with Indirect Jumps[OL].http://www.researchgate.net/publication/319391447_Sampling_Online_Social_Networks_by_Random_Walk_with_Indirect_Jumps. [19]WAGNER C,SINGER P,KARIMI F,et al.Sampling from Social Networks with Attributes[C]∥Conference www’17Proceedings of the 26th International Conference on World Wide Wep.2017:1181-1190. [20]HASAN M A.Methods and Applications of Network Sampling[C]∥SIAM Conference on Data Mining.2016. [21]YU L.Sampling and characterizing online social networks[C]∥IISA 2016.2016:245-249. [22]REZVANIAN A,MEYBODI M R.Sampling algorithms for weighted networks[J].Social Network Analysis & Mining,2016,6(1):1-22. [23]VOUDIGARI E,SALAMANOS N,PAPAGEORGIOU T,et al.Rank degree:An efficient algorithm for graph sampling[C]∥IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.IEEE,2016:120-129. [24]ZHAO J,LUI J C S,TOWSLEY D,et al.A tale of three graphs:Sampling design on hybrid social-affiliation networks[C]∥IEEE,International Conference on Data Engineering.IEEE,2015:939-950. [25]CUI Y A,LI X,WANG Z X,et al.A Comparison on Methodologies of Sampling Online Social Media[J].Chinese Journal of Computers,2014,37(8):1859-1876.(in Chinese) 崔颖安,李雪,王志晓,等.在线社交媒体数据抽样方法的比较研究[J].计算机学报,2014,37(8):1859-1876. [26]TANG J T,WANG T,WANG J.Shortest Path Approximate Algorithm for Complex Network Analysis[J].Journal of Software,2011,22(10):2279-2290.(in Chinese) 唐晋韬,王挺,王戟.适合复杂网络分析的最短路径近似算法[J].软件学报,2011,22(10):2279-2290. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 刘林云, 陈开颜, 李雄伟, 张阳, 谢方方. 基于卷积神经网络的旁路密码分析综述 Overview of Side Channel Analysis Based on Convolutional Neural Network 计算机科学, 2022, 49(5): 296-302. https://doi.org/10.11896/jsjkx.210300286 |
[3] | 王一超, 廖秋承, 左思成, 谢锐, 林新华. 一种ARM处理器面向高性能计算的性能评估 Performance Evaluation of ARM-ISA SoC for High Performance Computing 计算机科学, 2019, 46(8): 95-99. https://doi.org/10.11896/j.issn.1002-137X.2019.08.015 |
[4] | 李红军,崔西宁,牟明,韩伟. 一种面向分布式嵌入式计算机的性能评估模型 Research on Distributed Embedded Computer Performance Evaluation Model 计算机科学, 2017, 44(4): 153-156. https://doi.org/10.11896/j.issn.1002-137X.2017.04.033 |
[5] | 林新华,秦强,李硕,文敏华,松岗聪. 使用Stencil评估Intel AVX2 Vgather指令 Evaluating Intel AVX2 Vgather Instructions with Stencils 计算机科学, 2017, 44(1): 20-24. https://doi.org/10.11896/j.issn.1002-137X.2017.01.004 |
[6] | 王伟,王嘉郡,王明明,张文静,陈金广. 以网络性能为核心的移动自组网Flooding攻击防御技术 Defense Technology Based on Dynamic Space-Time Performance for Flooding Attacks in Mobile Ad Hoc Networks 计算机科学, 2017, 44(1): 159-166. https://doi.org/10.11896/j.issn.1002-137X.2017.01.031 |
[7] | 张成伟,程文青,黑晓军. 基于Android平台的3G移动网络测量研究及性能分析 Measurement Study of 3G Mobile Networks Using Android Platform 计算机科学, 2015, 42(2): 24-28. https://doi.org/10.11896/j.issn.1002-137X.2015.02.005 |
[8] | 蒋一波,王雨晨,王万良,张祯,陈琼. 一种基于机器学习的MANET网络入侵检测性能评估方法研究 Performance Analysis Method for Intrusion Detection in MANETs Based on Machine Learning Algorithms 计算机科学, 2013, 40(Z11): 170-174. |
[9] | 王会梅,鲜明,王国玉. 网络抗拒绝服务攻击性能的集对评估方法 Set Pair Analysis Method for Evaluating Denial of Service Attack Resistance Ability 计算机科学, 2012, 39(4): 53-55. |
[10] | 张丹,赵荣彩,单征,韩林,瞿进. 可重构系统中软硬任务划分方法研究 Research on Hardware/Software Task Partitioning for Reconfigurable System 计算机科学, 2012, 39(3): 276-278. |
[11] | 刘明骞,李兵兵,刘涵. 数字调制信号识别性能的评估方法 Performance Evaluation Algorithm of Digital Modulation Signals Recognition 计算机科学, 2011, 38(5): 64-66. |
[12] | 陈飞 商琳 骆斌 陈世福. 联系发现:一种新的数据挖掘方法综述 计算机科学, 2006, 33(11): 123-127. |
[13] | . 一种兼顾协议正确性验证和性能评估的Petri网方法 计算机科学, 2005, 32(12): 48-52. |
|