计算机科学 ›› 2015, Vol. 42 ›› Issue (Z6): 459-461.

• 数据挖掘 • 上一篇    下一篇

并行Fp-growth算法在搜索引擎中的应用

黄剑,李明奇,郭文强   

  1. 电子科技大学数学科学学院 成都611731,电子科技大学数学科学学院 成都611731,新疆财经大学计算机科学与工程学院 乌鲁木齐830012
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61163066)资助

Parallel Fp-growth Algorithm in Search Engines

HUANG Jian, LI Ming-qi and GUO Wen-qiang   

  • Online:2018-11-14 Published:2018-11-14

摘要: 针对用户历史检索过程产生的Web日志文件,研究其查询词和点击链接是否为频繁集,以及在分布式条件下频繁集挖掘的效率问题。基于Hadoop框架,设计了并行Fp-growth算法,对搜索引擎Web日志进行挖掘。仿真实验结果显示,满足支持度的查询词和点击链接频繁集在Web日志中普遍存在。随着Hadoop节点数的增加,并行Fp-growth算法性能将得到大幅提高。由此,频繁集挖掘效率得到明显提高,且数据量越大,效率提升越明显。

Abstract: Web log file is generated for the user history retrieval process.The paper studied whether the query words and click on the link belong to frequent itemsets,and frequent itemsets mining efficiency under the distributed conditions.Based on Hadoop framework,we designed a parallel Fp-growth algorithm to mine the search engine web log.Si-mulation results show that those query words and click on the link frequent itemsetss satisfying the given support are prevalent in web logs.With the increase of the number of nodes in the Hadoop,the performance of parallel Fp-growth algorithm will be improved greatly.Thus,the mining efficiency of frequent itemsets is significantly improved.Simulation results also show if the amount of data is greater,the improvement is more obvious.

Key words: Log file,Frequent itemset,Hadoop,Fp-growth

[1] 董志安,吕学强.基于百度搜索日志的用户行为分析[J].计算机应用与软件,2013,7(2):17-20
[2] 陈富赞,刘青,李敏强,等.一种基于会话聚类算法的Web使用挖掘方法[J].系统工程学报,2012,1(7):129-136
[3] 刘建国,周涛,汪秉宏.个性化推荐系统的研究进展[J].自然科学进展,2009,1(10):1-15
[4] 蓝祺花,吴博.频繁项集挖掘算法研究[J].计算机与现代化,2009,3(9):60-65
[5] 吕婉琪,钟诚,唐印浒,等.Hadoop分布式架构下大数据集的并行挖掘[J].计算机技术与发展,2014,4(1):22-25,30
[6] 周诗慧,殷建.Hadoop平台下的并行Web日志挖掘算法[J].计算机工程,2013,6(3):43-46
[7] 张俊,李鲁群,周熔.基于Lucene的搜索引擎的研究与应用[J].计算机技术与发展,2013,23(6):230-232
[8] Naganathan E R,Narayanan S,Kumar K R.FP-Growth Based New Normalization Technique for Subgraph Ranking[J].International Journal of Database Management Systems,2011,31
[9] Jiao Ming-hai,Yan Ping,Jiang Hui-yan.Research and application on Web information retrieval based on improved FP-growth algorithm[J].Wuhan University Journal of Natural Sciences,2006,11(5):1065-1068
[10] 章志刚,吉根林.一种基于FP-Growth的频繁项集并行挖掘算法[J].计算机工程应用,2014,2(2):103-106

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!