摘要: 数据流挖掘可有效解决大容量流式数据的知识发现问题,并已得到广泛研究。数据流的一个典型的例子是传感器采集的流式数据。然而,随着传感器网络的应用普及,这些流式数据在很多情况下是分布式采集和管理的,这就必然导致分布式地挖掘数据流的需求。分布式数据流挖掘的最大障碍是由分布式而导致的挖掘质量或者效率问题。为适应分布式数据流的聚类挖掘,探讨了分布式数据流的挖掘模型,并且基于该模型设计了对应的概要数据结构和关键的挖掘算法,给出了算法的理论评估或者实验验证。实验说明,提出的模型和算法可以有效地减少数据通信代价,并且能保证较高的全局模式的聚类质量。
[1] Babcock B,Babu S,Datar M.Models and issues in data stream systems[C]∥Proceedings of the 21st ACM Symposium on Principles of Database Systems.Madison,WI,USA:ACM,2002:1-16 [2] Khalilian M,Mustapha N.Data stream clustering:challengesand issues[C]∥Proceedings of 2010International MultiConfe-rence of Engineering and Computer Scientists.Hong Kong,China:Newswood Limited International Association of Engineers,2010:566-569 [3] Rajasegarar S,Leckie C,Palaniswami M.Distributed anomalydetection in wireless sensor networks[C]∥Proceedings of the 10th IEEE Singapore International Conference on Communications Systems.Singapore,IEEE,2006:1-5 [4] Zhang Q,Liu J,Wang W.Approximate clustering on distributed data streams[C]∥Proceedings of IEEE 24th International Conference on Data Engineering.Cancun,Mexico:IEEE,2008:1131-1139 [5] Graham C,Muthukrishnan S,Zhuang W.Conquering the divide:continuous clustering of distributed data streams[C]∥Procee-dings of the 23rd International Conference on Data Engineering.Istanbul,Turkey:IEEE,2007:1036-1045 [6] Hajiee M.A new distributed clustering algorithm based on K-means algorithm[C]∥Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering.Piscata-way.NJ,USA:IEEE,2010:2408-2411 [7] Januzai E,Kriegel H P,Pfeifle M.DBDC:density based distributed clustering[C]∥Proceedings of Advances in Database Technology-EDBT 20049th International Conference on Extending Database Technology.Berlin,Germany:IEEE,2004:88-105 [8] Johnson E,Kargupta H.Collective,Hierarchical clustering from distributed,heterogeneous data[C]∥Proceedings of 2000Large-Scale Parallel Data Mining.London,UK:Springer-Verlag,2000:221-244 [9] Domingos P,Hulten G.Mining high-speed data streams[C]∥Proceedings of KDD-2000Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,MA,USA:IEEE,2000:71-80 [10] Zhang T,Raghu R,Livny M.BIRCH:an efficient data clustering method for very large databases[J].Sigmod Record,1996,25(2):103-114 [11] Rodriques P P,Gama J,Lopes L.Clustering distributed sensor data streams[C]∥Proceedings of Machine Learning and Know-ledge Discovery in Databases.Antwerp,Belgium:Springer-Verlag,2008:282-297 [12] 郑铎,吴世伟.正态分布函数计算的建议及其反函数的非迭代算法[J].河海大学学报:自然科学版,1993(02):61-64 [13] 朱晓玲,姜浩.任意概率分布的伪随机数研究和实现[J].计算机技术与发展,2007,17(12):116-118 [14] O’Callaghan L,Mishra N,Meyerson A.Streaming-data algo-rithms for high-quality clustering[C]∥Proceedings of 18th International Conference on Data Engineering.Los Alamitos,CA,USA:IEEE,2002:685-94 [15] Gorawski M,Pluciennik-Psota E.Distributed data mining me-thodology for clustering and classification model[C]∥Procee-dings of 10th International Conference on Artificial Intelligence and Soft Computing.Berlin,Germany:The Institution of Engineering and Technology,2010:323-30 [16] 孙岳,毛国君,刘旭.基于多分类器的数据流中的概念漂移挖掘[J].自动化学报,2008,34(1):93-97 [17] 吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘[J].自动化学报,2010,36(5):674-684 |
No related articles found! |
|