Computer Science ›› 2015, Vol. 42 ›› Issue (Z6): 459-461.

Previous Articles     Next Articles

Parallel Fp-growth Algorithm in Search Engines

HUANG Jian, LI Ming-qi and GUO Wen-qiang   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Web log file is generated for the user history retrieval process.The paper studied whether the query words and click on the link belong to frequent itemsets,and frequent itemsets mining efficiency under the distributed conditions.Based on Hadoop framework,we designed a parallel Fp-growth algorithm to mine the search engine web log.Si-mulation results show that those query words and click on the link frequent itemsetss satisfying the given support are prevalent in web logs.With the increase of the number of nodes in the Hadoop,the performance of parallel Fp-growth algorithm will be improved greatly.Thus,the mining efficiency of frequent itemsets is significantly improved.Simulation results also show if the amount of data is greater,the improvement is more obvious.

Key words: Log file,Frequent itemset,Hadoop,Fp-growth

[1] 董志安,吕学强.基于百度搜索日志的用户行为分析[J].计算机应用与软件,2013,7(2):17-20
[2] 陈富赞,刘青,李敏强,等.一种基于会话聚类算法的Web使用挖掘方法[J].系统工程学报,2012,1(7):129-136
[3] 刘建国,周涛,汪秉宏.个性化推荐系统的研究进展[J].自然科学进展,2009,1(10):1-15
[4] 蓝祺花,吴博.频繁项集挖掘算法研究[J].计算机与现代化,2009,3(9):60-65
[5] 吕婉琪,钟诚,唐印浒,等.Hadoop分布式架构下大数据集的并行挖掘[J].计算机技术与发展,2014,4(1):22-25,30
[6] 周诗慧,殷建.Hadoop平台下的并行Web日志挖掘算法[J].计算机工程,2013,6(3):43-46
[7] 张俊,李鲁群,周熔.基于Lucene的搜索引擎的研究与应用[J].计算机技术与发展,2013,23(6):230-232
[8] Naganathan E R,Narayanan S,Kumar K R.FP-Growth Based New Normalization Technique for Subgraph Ranking[J].International Journal of Database Management Systems,2011,31
[9] Jiao Ming-hai,Yan Ping,Jiang Hui-yan.Research and application on Web information retrieval based on improved FP-growth algorithm[J].Wuhan University Journal of Natural Sciences,2006,11(5):1065-1068
[10] 章志刚,吉根林.一种基于FP-Growth的频繁项集并行挖掘算法[J].计算机工程应用,2014,2(2):103-106

No related articles found!
Full text



No Suggested Reading articles found!