Computer Science ›› 2013, Vol. 40 ›› Issue (6): 187-191.

Previous Articles     Next Articles

Clustering Models and Algorithms for Distributed Data Streams Based on Data Synopsis

MAO Guo-jun and CAO Yong-cun   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Mining data streams aims at discovering knowledge from a large of streaming data,in which enough efforts have been done in recent years.As a typical example,the data to be collected by a sensor is a format of data streams.However,in the technical environment of a sensor network,multiple sensors always are set and they collect data in a distributed way,so mining data streams with a distributed way is making a challenge issue.Most ongoing studies for mining distributed data streams are suffering from the problems of accuracy or efficiency.In this paper,the model for clustering a distributed data stream was discussed,including a new synopsis data structure for summarizing data streams and some effective algorithms for key mining phases.The reasons of presented algorithms were also discussed.Experimental results demonstrate that presented models and algorithms have less transmission cost and higher clustering qua-lity to mine the global pattern from distributed data streams.

Key words: Distributed data stream,Data synopsis,Incremental clustering,Global pattern

[1] Babcock B,Babu S,Datar M.Models and issues in data stream systems[C]∥Proceedings of the 21st ACM Symposium on Principles of Database Systems.Madison,WI,USA:ACM,2002:1-16
[2] Khalilian M,Mustapha N.Data stream clustering:challengesand issues[C]∥Proceedings of 2010International MultiConfe-rence of Engineering and Computer Scientists.Hong Kong,China:Newswood Limited International Association of Engineers,2010:566-569
[3] Rajasegarar S,Leckie C,Palaniswami M.Distributed anomalydetection in wireless sensor networks[C]∥Proceedings of the 10th IEEE Singapore International Conference on Communications Systems.Singapore,IEEE,2006:1-5
[4] Zhang Q,Liu J,Wang W.Approximate clustering on distributed data streams[C]∥Proceedings of IEEE 24th International Conference on Data Engineering.Cancun,Mexico:IEEE,2008:1131-1139
[5] Graham C,Muthukrishnan S,Zhuang W.Conquering the divide:continuous clustering of distributed data streams[C]∥Procee-dings of the 23rd International Conference on Data Engineering.Istanbul,Turkey:IEEE,2007:1036-1045
[6] Hajiee M.A new distributed clustering algorithm based on K-means algorithm[C]∥Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering.Piscata-way.NJ,USA:IEEE,2010:2408-2411
[7] Januzai E,Kriegel H P,Pfeifle M.DBDC:density based distributed clustering[C]∥Proceedings of Advances in Database Technology-EDBT 20049th International Conference on Extending Database Technology.Berlin,Germany:IEEE,2004:88-105
[8] Johnson E,Kargupta H.Collective,Hierarchical clustering from distributed,heterogeneous data[C]∥Proceedings of 2000Large-Scale Parallel Data Mining.London,UK:Springer-Verlag,2000:221-244
[9] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of KDD-2000Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,MA,USA:IEEE,2000:71-80
[10] Zhang T,Raghu R,Livny M.BIRCH:an efficient data clustering method for very large databases[J].Sigmod Record,1996,25(2):103-114
[11] Rodriques P P,Gama J,Lopes L.Clustering distributed sensor data streams[C]∥Proceedings of Machine Learning and Know-ledge Discovery in Databases.Antwerp,Belgium:Springer-Verlag,2008:282-297
[12] 郑铎,吴世伟.正态分布函数计算的建议及其反函数的非迭代算法[J].河海大学学报:自然科学版,1993(02):61-64
[13] 朱晓玲,姜浩.任意概率分布的伪随机数研究和实现[J].计算机技术与发展,2007,17(12):116-118
[14] O’Callaghan L,Mishra N,Meyerson A.Streaming-data algo-rithms for high-quality clustering[C]∥Proceedings of 18th International Conference on Data Engineering.Los Alamitos,CA,USA:IEEE,2002:685-94
[15] Gorawski M,Pluciennik-Psota E.Distributed data mining me-thodology for clustering and classification model[C]∥Procee-dings of 10th International Conference on Artificial Intelligence and Soft Computing.Berlin,Germany:The Institution of Engineering and Technology,2010:323-30
[16] 孙岳,毛国君,刘旭.基于多分类器的数据流中的概念漂移挖掘[J].自动化学报,2008,34(1):93-97
[17] 吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘[J].自动化学报,2010,36(5):674-684

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!