计算机科学 ›› 2017, Vol. 44 ›› Issue (3): 10-15.doi: 10.11896/j.issn.1002-137X.2017.03.003

• 云计算 • 上一篇    下一篇

云环境下的突发关键字查询算法

郑诗敏,秦小麟,刘亮,周倩   

  1. 南京航空航天大学计算机科学与技术学院 南京211106,南京航空航天大学计算机科学与技术学院 南京211106,南京航空航天大学计算机科学与技术学院 南京211106,南京航空航天大学计算机科学与技术学院 南京211106
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61373015,61300052),江苏高校优势学科建设工程资助

Algorithm for Bursty Term Query in Cloud Computing

ZHENG Shi-min, QIN Xiao-lin, LIU Liang and ZHOU Qian   

  • Online:2018-11-13 Published:2018-11-13

摘要: 基于Spark Streaming计算框架下的分布式突发关键字查询是监测流数据中关键字突发时间的热点研究问题。多数研究方法存储统计所有的关键字,并未考虑热点关键字。在数据呈爆炸式增长的背景下,获取热点关键字的突发时间更具有价值。针对这个问题,提出一种分布式突发关键字查询算法,该算法采用动态的更新策略,通过设置检查点的方法提取热点关键字,并在线性的时间内查询突发的时间范围。实验结果表明,该算法的性能比现有算法更优。

关键词: 云计算,突发关键字查询,流数据,Spark Streaming

Abstract: Distributed bursty term query under the framework of Spark Streaming is a hot research issue.It aims to detect bursty terms in data streams.Most studies of bursty term query count and save all terms without consideration of hot terms.Under the background of exploding in the data scale,it makes more sense to get bursty time of hot terms.To solve this problem,we presented a distributed bursty term query algorithm.The algorithm uses dynamic update strategy and a checkpoint mechanism to extract hot terms.Also it finds the bursty time range in linear time.Experimental results show that the proposed algorithm has better performance.

Key words: Cloud computing,Bursty term query,Data streams,Spark Streaming

[1] XIE R Q,ZHU F D,MA H,et al.A Real-time Online Observatory for Bursty and Viral Events[J].PVLDB,2014,7(13):1637-1640.
[2] YAMAMOTO Y,IWANUMA K,FUKUDA S.Resource-oriented approximation for frequent itemset mining from bursty data streams[M].SIGMOD,2014:205-216.
[3] LAPPAS T,VIEIRA M R,GUNOPULOS D,et al.On The Spatiotemporal Burstiness of Terms[J].PVLDB,2012,5(9):836-847.
[4] LAPPAS T,ARAI B,PLATAKIS M,et al.On burstiness-aware search for document sequences[C]∥KDD.2009:477-486.
[5] ZAHARIA V,DAS T,LI H Y,et al.Discretized streams:fault-tolerant streaming computation at scale[C]∥SOSP.2013:423-438.
[6] DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Commun.ACM (CACM),2008,51(1):107-113.
[7] XIANG L H,CHEN Y W,ZHANG Y L.High Order MatrixMultiplication by Mapreduce Algorithm Based on Hadoop Platform[J].Computer Science,2013,40(S1):96-98.(in Chinese) 向林泓,陈芋文,张昱琳.基于Hadoop平台的高阶矩阵相乘MapReduce算法研究[J].计算机科学,2013,40(S1):96-98.
[8] YAN Y L,DONG Y H,HE X M,et al.FSMBUS:A Frequent Subgraph Mining Algorithm in Single Large-Scale Graph Using Spark[J].Journal of Computer Research and Development,2015,52(8):1768-1783.(in Chinese) 严玉良,董一鸿,何贤芒,等.FSMBUS:一种基于Spark的大规模频繁子图挖掘算法[J].计算机研究与发展,2015,52(8):1768-1783.
[9] MANKU G S,MOTWANI R.Approximate Frequency Counts over Data Streams[J].Proceedings of the VLDB Endowment,2003,5(1):1699.
[10] DEMAINE E D,LPEZ-ORTIZ A,MUNRO J I.Frequency Estimation of Internet Packet Streams with Limited Space[C]∥ESA.2002:348-360.
[11] KARP R M,SHENKER S,PAPADIMTRIOU C H.A simple algorithm for finding frequent elements in streams and bags[J].ACM Trans.Database Syst.(TODS),2003,28:51-55.
[12] MISRA J,GRIES D.Finding Repeated Elements[J].Science of Computer Programming,1982,2(2):143-152.
[13] METWALLY A,AGRAWAL D,ABBADI A E.Efficient Computation of Frequent and Top-k Elements in Data Streams[C]∥ ICDT.2005:398-412.
[14] CHARIKAR M,CHEN K,FARACH-COLTON M.Finding Fre-quent Items in Data Streams[J].Theoretical Computer Science,2002,2(1):3-15.
[15] CORMODE G,MUTHUKRISHNAN S.What’s hot and what’snot:tracking most frequent items dynamically[J].ACM Trans.Database Syst.(TODS),2005,30(1):249-278.
[16] CORMODE G,MUTHUKRISHNAN S.An improved datastream summary:the count-min sketch and its applications[J].J.Algorithms (JAL),2005,55(1):58-75.
[17] KLEINBERG J.Bursty and hierarchical structure in streams[C]∥Proceeding of the Eighth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.2002:91-101.
[18] PUI G, FUNG C,YU J X,et al.Parameter Free Bursty Events Detection in Text Streams[C]∥VLDB.2005:181-192.
[19] HE Q,CHANG K Y,LIM E P.Analyzing feature trajectories for event detection[C]∥SIGIR.2007:207-214.
[20] VLACHOS M,MEEK C,VAGENA Z Z,et al.Identifying Similarities,Periodicities and Bursts for Online Search Queries[C]∥SIGMOD.2004:131-142.
[21] LAPPAS T,ARAI B,PLATAKIS M,et al.On burstiness-aware search for document sequences[C]∥ACM SIGKD International Conference on Knowledge Discovery and Data Mining.Paris,France,2009:477-486.
[22] DOBKIN D P,EPPSTEIN D.Computing the Discrepancy[C]∥Symposium on Computational Geometry.1993:47-52.
[23] ARLITT M,JIN T.World Cup Web Site Access Logs.http://ita.ee.lbl.gov/html/contrib/WorldCup.html.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!