计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 83-90.doi: 10.11896/jsjkx.190900014
王旭1, 杨晓春2
WANG Xu1, YANG Xiao-chun2
摘要: 三角形计数旨在计算图中全局三角形和局部三角形的数量,是图数据挖掘中的一类重要工作。三角形的数量被广泛应用于角色识别、推荐系统、社区发现、垃圾邮件和欺诈检测等领域。在以流形式给出的图中,边具有时间性,同时现实生活中的图存在着大量的重复边。为充分利用图中的时间信息以挖掘网络知识,研究在多图流上计算滑动窗口内全局和局部三角形数量的问题,使用窗口机制同时研究多个窗口以利用隐含的时间关系获取更多信息。文中提出基于FLINK窗口操作的三角形计数算法和基于滑动窗口的三角形增量计数算法,以现有的边采样工作为基础,使用边集存储窗口历史数据实现一遍流计算,从而准确地计算面向多图流的滑动窗口内全局和局部三角形数量。基于FLINK窗口操作的三角形计数算法使用FLINK提供的窗口机制,基于滑动窗口的三角形增量计数算法,通过计算窗口滑入和滑出数据来实现窗口计数,避免了相邻两个窗口间重合边的大量重复计算,无缝地处理多个时间窗口,对于滑入和滑出数据中的重复数据,使用去重机制来进一步减小计算量。理论证明两种算法可以实现滑动窗口内三角形准确计数,并通过实验分析了窗口大小、滑动距离、数据分布和数据流速等因素对窗口处理时间的影响。与TRIEST算法相比,当窗口较小时,基于FLINK窗口操作的三角形计数算法和基于滑动窗口的三角形增量计数算法速度更快;当窗口较大时,保证了计算结果的准确性。
中图分类号:
[1]WELSER H T,GLEAVE E,FISHER D,et al.Visualizing the signatures of social roles in online discussion groups[J].Journal of Social Structure,2007,8(2):1-32. [2]TSOURAKAKIS C E,DRINEAS P,MICHELAKIS E,et al.Spectral counting of triangles via element-wise sparsifification and triangle-based link recommendation[J].Social Network Analysis & Mining,2011,1(2):75-81. [3]GLEICH D F,SESHADHRI C.Vertex neighborhoods,low conductance cuts,and good seeds for local community methods[C]//18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York NY:ACM Press,2012:597-605. [4]BECCHETTI L,BOLDI P,CASTILLO C,et al.Efficient semi-streaming algorithms for local triangle counting in massive graphs[C]//ACM Conference on Knowledge Discovery and Data Mining.New York NY:ACM Press,2008:16-24. [5]KAVASSERY-PARAKKAT N,HANJANI K M,PAVAN A.Improved Triangle Counting in Graph Streams:Power of Multi-Sampling[C]//IEEE International Conference on Advances in Social Networks Analysis and Mining.Washington,DC:IEEE Computer Society Press,2018:28-31,33. [6]JHA M,SESHADHRI C,PINAR A.Counting Triangles in Real-World Graph Streams:Dealing with Repeated Edges and Time Windows[C]//49th Asilomar Conference on Signals,Systems and Computers.Washington DC:IEEE Computer Society Press,2015:1507-1514. [7]BERA S K,CHAKRABARTI A.Towards Tighter Space Boundsfor Counting Triangles and Other Substructures in Graph Streams[C]//34th Symposium on Theoretical Aspects of Computer Science.Dagstuhl,Germany,2017,11:1-14. [8]LIM Y,JUNG M,KANG U.Memory-efficient and accuratesampling for counting local triangles in graph streams:from simple to multigraphs [J].ACM Transactions on Knowledge Discovery from Data,2018,12(1):1-28. [9]JUNG M,LEE S,LIM Y,et al.FURL:Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams[J].Data Mining and Knowledge Discovery,2019,33(5):1225-1253. [10]SHIN K.WRS:Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams[C]//2017 IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society Press,2017:1087-1092. [11]BULTEAU L,FROESE V,KUTZKOV K,et al.TriangleCounting in Dynamic Graph Streams[J].Algorithmica,2016,76(1):259-278. [12]SHIN K,KIM J,HOOI B,et al.Think Before You Discard:Ac- curate Triangle Counting in Graph Streams with Deletions[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases,Lecture Notes in Computer Science.2018:141-157. [13]DE STEFANI L,TEROLLI E,UPFAL E.Tiered Sampling:An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams[C]//2017 IEEE International Conference on Big Data.Washington,DC:IEEE Computer Society Press,2017:776-786. [14]WANG P,QI Y,SUN Y,et al.Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage[J].Proceedings of the VLDB Endowment,2017,11(2):162-175. [15]SHIN K,HAMMOUD M,LEE E,et al.Tri-Fly:Distributed Estimation of Global and Local Triangle Counts in Graph Streams[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining,Lecture Notes in Computer Science.2018:651-663. [16]SHIN K,LEE E,OH J,et al.DiSLR:Distributed Sampling with Limited Redundancy For Triangle Counting in Graph Streams[J].arXiv:1802.04249v1,2018. [17]PINGHUI W,PENG J,YIYAN Q,et al.A Streaming Algo-rithm of Approximating Global and Local Triangle Counts in Parellal[C]//35th IEEE International Conference on Data Engineering.Washington,DC:IEEE Computer Society Press,2019:758-769. [18]LORENZO D S,ALESSANDRO E,MATECO R,et al.TRIEST:Counting Local and Global Triangles in Fully-dynamic Streams with Fixed Memory Size[J].ACM Transactions on Knowledge Discovery from Data,2017,11(4):1-50. [19]JHA M.Counting Triangles in Graph Streams[J].Encyclopedia of Algorithms,2016:458-464. [20]VITTER J S.Random sampling with a reservoir[J].ACMTransactions on Mathematical Software,1985,11(1):37-57. [21]TANGWONGSAN K,PAVAN A,TIRTHAPURA S.ParallelTriangle Counting in Massive Streaming Graphs[C]//ACM international conference on Information and Knowledge Management.New York NY:ACM Press,2013:781-786. |
[1] | 龚建锋. 抗能量分析的带符号滑动窗口标量乘算法 Resisting Power Analysis Algorithm of Scalar Multiplication Based on Signed Sliding Window 计算机科学, 2021, 48(6A): 533-537. https://doi.org/10.11896/jsjkx.191200097 |
[2] | 宋莺, 钟忺, 孙宝林, 桂超. MANET中基于滑动窗口的网络编码协作算法 Sliding Window-based Network Coding Cooperative Algorithm in MANET 计算机科学, 2020, 47(11): 322-326. https://doi.org/10.11896/jsjkx.191000181 |
[3] | 彭成, 贺婧, 池昊. 一种确定滑动窗口规模的边界距离算法 Boundary Distance Algorithm for Determining Sliding Window Size 计算机科学, 2019, 46(6A): 482-487. |
[4] | 郭威, 于建江, 汤克明, 徐涛. 动态数据流分析的在线超限学习算法综述 Survey of Online Sequential Extreme Learning Algorithms for Dynamic Data Stream Analysis 计算机科学, 2019, 46(4): 1-7. https://doi.org/10.11896/j.issn.1002-137X.2019.04.001 |
[5] | 杜娟, 沈思昀. 基于改进多权值滑动窗口的立体匹配方法的实现及应用 Implementation and Application of Stereo Matching Method Based onImproved Multi-weight Sliding Window 计算机科学, 2019, 46(11A): 241-245. |
[6] | 毛义坪, 余磊, 官泽瑾. 基于分数阶微分的多聚焦图像融合 Multi-focus Image Fusion Based on Fractional Differential 计算机科学, 2019, 46(11A): 315-319. |
[7] | 李青,肖迎元,王晓晔,李玉坤. 无线传感器网络中基于聚簇结构的Skyline查询方法 Clustering Architecture-based Skyline Query Processing in Wireless Sensor Networks 计算机科学, 2017, 44(10): 177-181. https://doi.org/10.11896/j.issn.1002-137X.2017.10.033 |
[8] | 张建辉,王会青,孙宏伟,郭芷榕,白莹莹. 基于二分迭代SAX的时序相似性度量算法 Similarity Measure Algorithm of Time Series Based on Binary-dividing SAX 计算机科学, 2017, 44(1): 247-252. https://doi.org/10.11896/j.issn.1002-137X.2017.01.046 |
[9] | 李彦,周艺,刘雨声,梁智. 无线体域网中基于时隙滑动窗口的概率路由协议 Probability Routing Protocol Based on Slotted Sliding Window in Wireless Body Area Network 计算机科学, 2016, 43(Z11): 259-263. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.060 |
[10] | 王丽珍,周丽华,邓世昆. 一种基于滑动窗口模式匹配的加权预测方法 Weighted Prediction Method Based on Sliding Window and Pattern Matching 计算机科学, 2016, 43(Z11): 591-596. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.134 |
[11] | 钟忺,杨光,卢炎生. 基于双阈值滑动窗口子镜头分割和完全连通图的关键帧提取方法 Method of Key Frames Extraction Based on Double-threshold Values Sliding Window Sub-shot Segmentation and Fully Connected Graph 计算机科学, 2016, 43(6): 289-293. https://doi.org/10.11896/j.issn.1002-137X.2016.06.057 |
[12] | 曲晓雅,刘真. 基于阈值滑动窗口机制的虚拟机迁移判决算法 Dynamic Threshold Window Algorithm for Virtual Machines Migration Decision 计算机科学, 2016, 43(4): 64-69. https://doi.org/10.11896/j.issn.1002-137X.2016.04.013 |
[13] | 陈小东,孙力娟,韩崇,郭剑. 基于模糊聚类的数据流概念漂移检测算法 Detecting Concept Drift of Data Stream Based on Fuzzy Clustering 计算机科学, 2016, 43(4): 219-223. https://doi.org/10.11896/j.issn.1002-137X.2016.04.045 |
[14] | 吕太之. 基于滑动窗口Hurst指数的心电分析研究 Research of Heart Rate Variable Analysis Based on Sliding Window Hurst 计算机科学, 2016, 43(2): 259-262. https://doi.org/10.11896/j.issn.1002-137X.2016.02.054 |
[15] | 常东亚,严建峰,杨璐,刘晓升. 基于滑动窗口的主题模型 Sliding-window Based Topic Modeling 计算机科学, 2016, 43(12): 101-107. https://doi.org/10.11896/j.issn.1002-137X.2016.12.018 |
|