计算机科学 ›› 2011, Vol. 38 ›› Issue (Z10): 146-149.

• CRSSC-CWI-CGrC2015 • 上一篇    下一篇

基于云计算的Web数据挖掘

程苗   

  1. (中国科学技术大学管理学院 合肥230026)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Web Data Mining Based on Cloud-computing

CHENG Miao   

  • Online:2018-11-16 Published:2018-11-16

摘要: 因特网是一个巨大的、分布广泛的信息服务中心,其上产生的海量数据通常是地理上分布、异构、动态的,复杂性也越来越高,若用已有的集中式数据挖掘方法则不能满足应用的要求。为了解决这些问题,提出了一种基于云计算的Web数据挖掘方法:将海量数据和挖掘任务分解到多台服务器上并行处理。采用Hadoop开源平台,建立一个基于Apriori算法的并行关联规则挖掘算法来验证了该系统的高效性。还提出“计算向存储迁移”的设计思想,将计算在数据存储节点就地执行,从而避免了大量数据在网络上的传递,不会占用大量带宽。

关键词: 云计算,数据挖掘,Map/Reduce,关联规则

Abstract: Internet is a huge and widely distributed information service center, the vast amounts of data generated on the Internet arc usually geographically distributed, heterogeneous, dynamic and become more complex, it can not meet the requirements if we use the existing centralized data mining methods. To solve these problems,proposed a cloud computing-based Web data mining method, the massive data and mining tasks will be decomposed on multiple computers parallely processed. We use open platform-Hadoop to establish a parallel association rules mining algorithm based on Apriori, and it tests and veriftes the efficiency of system. This paper proposed a design thinking that "migrate the calculation to the store",the calculation will be implemented on the local storage nodes, thus it can avoid the large amount of data transmission on the network, and will not take a lot of bandwidth.

Key words: Cloud-computing, Data mining, Map/Reduce, Association rules

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!