Computer Science ›› 2015, Vol. 42 ›› Issue (Z6): 470-473.

Previous Articles     Next Articles

Parallel PSO-kmeans Algorithm Implementing Web Log Mining Based on Hadoop

MA Han-da, HAO Xiao-yu and MA Ren-qing   

  • Online:2018-11-14 Published:2018-11-14

Abstract: With the rapid development of Internet technology,Web log mining based on a single node becomes very difficult.The emergence of Hadoop cloud platform provides a new solution to this problem.However,the traditional Web log mining clustering algorithm k-means is sensitive to the initial cluster centers selection,so it will easily affect the accuracy of clustering.Thus for this problem,this paper proposed a k-means algorithm based on particle swarm optimization which makes the k-means algorithm not be affected by the initial cluster centers.And the algorithm is realized in the Hadoop MapReduce programming platform.Experimental results show that: compared with traditional k-means algorithm the proposed algorithm has the higher clustering accuracy,and compared with stand-alone serial algorithm, the operating efficiency improved greatly.

Key words: Hadoop,k-means,PSO,MapReduce,Web log mining

[1] 杨怡玲,管旭东,陆丽娜.一个简单的Web日志挖掘系统[J].上海交通大学学报,2000,4(7):35-37
[2] 孙玲芳,夏聪.Web使用挖掘在用户行为分析中的应用[J].江苏科技大学学报:自然科学版,2011,25(3):258-261
[3] 毛严奇,彭沛夫.基于MapReduce 的 Web 日志挖掘预处理[J].计算机与现代化,2013(9):35-36
[4] Wang J,Su X.An improved K-Means clustering algorithm[C]∥2011 IEEE 3rd International Conference on Communication Software and Networks(ICCSN).IEEE,2011:44-46
[5] 吕奕清,林锦贤.基于MPI的并行PSO混合K均值聚类算法[J].计算机应用,2011,31(2):428-431
[6] 傅涛,孙亚民.基于PSO的K-means算法及其在网络入侵检测中的应用[J].计算机科学,2011,8(5):54-55
[7] 周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,23(7):18-20
[8] 周诗慧,殷建.Hadoop平台下的并行Web日志挖掘算法[J].计算机工程,2013,9(6):43-46
[9] Shvachko K,Kuang H,Radia S,et al.The hadoop distributedfile system[C]∥2010 IEEE 26th Symposium on Mass Storage Systems and Technologies(MSST).IEEE,2010:1-10
[10] 宋莹,沈奇威,王晶.基于Hadoop的Web日志预处理的设计与实现[J].电信工程技术与标准化,2011,4(11):85-86
[11] 张晓强.MapReduce在Web日志挖掘中的应用[D].成都:电子科技大学,2011
[12] 彭长生.基于Fisher判别的分布式K-Means聚类算法[J].江苏大学学报:自然科学版,2014,4(35):422-423
[13] Kennedy J,Eberhart R C.Particle swarm optimization[C]∥Proceedings of IEEE international conference on neural networks.Perth:[s.n.],1995:1942-1948
[14] 谢秀华,李陶深.一种基于改进PSO的K-means优化聚类算法[J].计算机技术与发展,2014,4(2):35-37
[15] McNabb A W,Monson C K,Seppi K D.Parallel pso using mapreduce[C]∥IEEE Congress on Evolutionary Computation,2007(CEC 2007).IEEE,2007:7-14

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!