Computer Science ›› 2020, Vol. 47 ›› Issue (10): 83-90.doi: 10.11896/jsjkx.190900014

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study of Triangle Counting Algorithm with Sliding Windows Based on FLINK

WANG Xu1, YANG Xiao-chun2   

  1. 1 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
    2 School of Computer and Engineering,Northeastern University,Shenyang 110819,China
  • Received:2019-09-02 Revised:2019-11-04 Online:2020-10-15 Published:2020-10-16
  • About author:WANG Xu,born in 1997,postgraduate,is a member of China Computer Federation.Her main research interests include data stream processing and so on.
    YANG Xiao-chun,born in 1973,professor,doctoral supervisor,is a member of China Computer Federation.Her main research interests include databasetheo-ry and systems,data quality proces-sing,text and data stream processing,and AI enabled data systems.
  • Supported by:
    National Natural Science Foundation of China (61572122,61532021)

Abstract: Triangle Counting,calculating global and local triangle counts,is an important work in data mining,whose number is widely used in important role identification,recommendation systems,community discovery,spam and fraud detection etc.In the graphs presented as a stream of edges,edges are temporal,and there are a large of duplicate edges in real-world graphs.For full use of the time information in the graph and mining the network knowledge,this work studied the problem of estimating global and local triangle counts on a multigraph stream with sliding windows,that simultaneously studied multiple windows by window mechanism to obtain more information in implicit time relationships.A triangle counting algorithm based on FLINK window ope-ration and a triangle increment counting algorithm based on sliding window are proposed.Like the existing edge sampling work,the edge set is used to store window history data for accurately calculating global and local triangle counts on a multigraph stream with sliding windows in one-pass.The triangle counting algorithm based on FLINK window operator uses the window mechanism provided by FLINK.While the triangle increment counting algorithm based on sliding window realizes window counting by calculating slide-in and slide-out data through the window,reducing a large number of repeated calculations of coincident edges between adjacent windows,seamlessly processing multiple time windows,and for duplicate data in slide-in and slide-out,uses a deduplication mechanism to further reduce the calculation.The theory has been proven that both the algorithms can accurately count the triangles in the sliding window,and the effects of window size,sliding distance,data distribution and data flow rate on window processing time were analyzed through experiments.Compared with the TRIEST algorithm,when the window is small,the triangle counting algorithm based on FLINK window operation and the triangle increment counting algorithm based on sliding window have faster speed.When the window is large,the accuracy of the calculation result is guaranteed.

Key words: Accurate stream algorithm, FLINK, Graph stream mining, Sliding window, Triangle counting

CLC Number: 

  • TP301.6
[1]WELSER H T,GLEAVE E,FISHER D,et al.Visualizing the signatures of social roles in online discussion groups[J].Journal of Social Structure,2007,8(2):1-32.
[2]TSOURAKAKIS C E,DRINEAS P,MICHELAKIS E,et al.Spectral counting of triangles via element-wise sparsifification and triangle-based link recommendation[J].Social Network Analysis & Mining,2011,1(2):75-81.
[3]GLEICH D F,SESHADHRI C.Vertex neighborhoods,low conductance cuts,and good seeds for local community methods[C]//18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York NY:ACM Press,2012:597-605.
[4]BECCHETTI L,BOLDI P,CASTILLO C,et al.Efficient semi-streaming algorithms for local triangle counting in massive graphs[C]//ACM Conference on Knowledge Discovery and Data Mining.New York NY:ACM Press,2008:16-24.
[5]KAVASSERY-PARAKKAT N,HANJANI K M,PAVAN A.Improved Triangle Counting in Graph Streams:Power of Multi-Sampling[C]//IEEE International Conference on Advances in Social Networks Analysis and Mining.Washington,DC:IEEE Computer Society Press,2018:28-31,33.
[6]JHA M,SESHADHRI C,PINAR A.Counting Triangles in Real-World Graph Streams:Dealing with Repeated Edges and Time Windows[C]//49th Asilomar Conference on Signals,Systems and Computers.Washington DC:IEEE Computer Society Press,2015:1507-1514.
[7]BERA S K,CHAKRABARTI A.Towards Tighter Space Boundsfor Counting Triangles and Other Substructures in Graph Streams[C]//34th Symposium on Theoretical Aspects of Computer Science.Dagstuhl,Germany,2017,11:1-14.
[8]LIM Y,JUNG M,KANG U.Memory-efficient and accuratesampling for counting local triangles in graph streams:from simple to multigraphs [J].ACM Transactions on Knowledge Discovery from Data,2018,12(1):1-28.
[9]JUNG M,LEE S,LIM Y,et al.FURL:Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams[J].Data Mining and Knowledge Discovery,2019,33(5):1225-1253.
[10]SHIN K.WRS:Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams[C]//2017 IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society Press,2017:1087-1092.
[11]BULTEAU L,FROESE V,KUTZKOV K,et al.TriangleCounting in Dynamic Graph Streams[J].Algorithmica,2016,76(1):259-278.
[12]SHIN K,KIM J,HOOI B,et al.Think Before You Discard:Ac-
curate Triangle Counting in Graph Streams with Deletions[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases,Lecture Notes in Computer Science.2018:141-157.
[13]DE STEFANI L,TEROLLI E,UPFAL E.Tiered Sampling:An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams[C]//2017 IEEE International Conference on Big Data.Washington,DC:IEEE Computer Society Press,2017:776-786.
[14]WANG P,QI Y,SUN Y,et al.Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage[J].Proceedings of the VLDB Endowment,2017,11(2):162-175.
[15]SHIN K,HAMMOUD M,LEE E,et al.Tri-Fly:Distributed Estimation of Global and Local Triangle Counts in Graph Streams[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining,Lecture Notes in Computer Science.2018:651-663.
[16]SHIN K,LEE E,OH J,et al.DiSLR:Distributed Sampling with Limited Redundancy For Triangle Counting in Graph Streams[J].arXiv:1802.04249v1,2018.
[17]PINGHUI W,PENG J,YIYAN Q,et al.A Streaming Algo-rithm of Approximating Global and Local Triangle Counts in Parellal[C]//35th IEEE International Conference on Data Engineering.Washington,DC:IEEE Computer Society Press,2019:758-769.
[18]LORENZO D S,ALESSANDRO E,MATECO R,et al.TRIEST:Counting Local and Global Triangles in Fully-dynamic Streams with Fixed Memory Size[J].ACM Transactions on Knowledge Discovery from Data,2017,11(4):1-50.
[19]JHA M.Counting Triangles in Graph Streams[J].Encyclopedia of Algorithms,2016:458-464.
[20]VITTER J S.Random sampling with a reservoir[J].ACMTransactions on Mathematical Software,1985,11(1):37-57.
[21]TANGWONGSAN K,PAVAN A,TIRTHAPURA S.ParallelTriangle Counting in Massive Streaming Graphs[C]//ACM international conference on Information and Knowledge Management.New York NY:ACM Press,2013:781-786.
[1] GONG Jian-feng. Resisting Power Analysis Algorithm of Scalar Multiplication Based on Signed Sliding Window [J]. Computer Science, 2021, 48(6A): 533-537.
[2] SONG Ying, ZHONG Xian, SUN Bao-lin, GUI Chao. Sliding Window-based Network Coding Cooperative Algorithm in MANET [J]. Computer Science, 2020, 47(11): 322-326.
[3] PENG Cheng, HE Jing, CHI Hao. Boundary Distance Algorithm for Determining Sliding Window Size [J]. Computer Science, 2019, 46(6A): 482-487.
[4] GUO Wei, YU Jian-jiang, TANG Ke-ming, XU Tao. Survey of Online Sequential Extreme Learning Algorithms for Dynamic Data Stream Analysis [J]. Computer Science, 2019, 46(4): 1-7.
[5] MAO Yi-ping, YU Lei, GUAN Ze-jin. Multi-focus Image Fusion Based on Fractional Differential [J]. Computer Science, 2019, 46(11A): 315-319.
[6] DU Juan, SHEN Si-yun. Implementation and Application of Stereo Matching Method Based onImproved Multi-weight Sliding Window [J]. Computer Science, 2019, 46(11A): 241-245.
[7] LI Qing, XIAO Ying-yuan, WANG Xiao-ye and LI Yu-kun. Clustering Architecture-based Skyline Query Processing in Wireless Sensor Networks [J]. Computer Science, 2017, 44(10): 177-181.
[8] ZHANG Jian-hui, WANG Hui-qing, SUN Hong-wei, GUO Zhi-rong and BAI Ying-ying. Similarity Measure Algorithm of Time Series Based on Binary-dividing SAX [J]. Computer Science, 2017, 44(1): 247-252.
[9] LI Yan, ZHOU Yi, LIU Yu-sheng and LIANG Zhi. Probability Routing Protocol Based on Slotted Sliding Window in Wireless Body Area Network [J]. Computer Science, 2016, 43(Z11): 259-263.
[10] WANG Li-zhen, ZHOU Li-hua and DENG Shi-kun. Weighted Prediction Method Based on Sliding Window and Pattern Matching [J]. Computer Science, 2016, 43(Z11): 591-596.
[11] ZHONG Xian, YANG Guang and LU Yan-sheng. Method of Key Frames Extraction Based on Double-threshold Values Sliding Window Sub-shot Segmentation and Fully Connected Graph [J]. Computer Science, 2016, 43(6): 289-293.
[12] LI Yi-ming, NI Li-ping, FANG Qing-hua and LIU Hui-ting. Research of Text Data Streams Clustering Algorithm Based on Affinity Propagation [J]. Computer Science, 2016, 43(5): 223-229.
[13] CHEN Xiao-dong, SUN Li-juan, HAN Chong and GUO Jian. Detecting Concept Drift of Data Stream Based on Fuzzy Clustering [J]. Computer Science, 2016, 43(4): 219-223.
[14] LV Tai-zhi. Research of Heart Rate Variable Analysis Based on Sliding Window Hurst [J]. Computer Science, 2016, 43(2): 259-262.
[15] CHANG Dong-ya, YAN Jian-feng, YANG Lu and LIU Xiao-sheng. Sliding-window Based Topic Modeling [J]. Computer Science, 2016, 43(12): 101-107.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!