Computer Science ›› 2017, Vol. 44 ›› Issue (7): 262-266.doi: 10.11896/j.issn.1002-137X.2017.07.046

Reseach on Improved Apriori Algorithm Based on Hadoop

HUANG Jian, LI Ming-qi and GUO Wen-qiang   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Traditional mining based on parallel Apriori algorithms needs much more time in data IO with the increasing size of large transaction database.In this paper,we improved Apriori algorithm in three aspects:compression in the transaction,reducing the number of scanning areas,and simplifying the candidate set generation .We proposed “0” and “1” as the entries to describe the transaction Boolean matrix model,and introduced the weight dimensions to compress the matrix size of the transaction.Meanwhile,dynamic pruning matrix is adopted,and “and” operation of matrix is applied to generate a candidate set.The experiments of the improved algorithm running parallel in Hadoop framework show that the algorithm is suitable for large-scale data mining,and the algorithm has good scalability and effectiveness.

Key words: Apriori algorithm,Transaction database,Boolean matrix,Hadoop

