一种基于压缩矩阵的Apriori算法改进研究

计算机科学 ›› 2013, Vol. 40 ›› Issue (12): 75-80.

一种基于压缩矩阵的Apriori算法改进研究

罗丹,李陶深

广西大学计算机与电子信息学院南宁530004;广西大学计算机与电子信息学院南宁530004

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然科学基金项目(60973074)资助

Research on Improved Apriori Algorithm Based on Compressed Matrix

LUO Dan and LI Tao-shen

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 针对已有基于矩阵的Apriori算法存在的问题,提出了一种改进的基于压缩矩阵的Apriori算法。算法进行了以下方面的改进:增加了两个数组,分别用于记录矩阵行与列中1的个数,使得算法在压缩矩阵时减少了扫描矩阵的次数；在压缩矩阵中,通过增加删除不能连接的项集和非频繁的项集的操作,使得矩阵压缩得更小,提高了空间效率；改变了删除事务列的条件和算法结束的条件,以减少挖掘结果的误差和算法循环的次数。算法性能分析和实验分析证明,改进后的算法能有效地挖掘频繁项集,并且比现有的算法具有更高的计算效率。

关键词: 数据挖掘,频繁项集,Apriori算法,压缩矩阵

Abstract: Aiming at the deficiency of the existing Apriori algorithm,an improved Apriori algorithm based on compressed matrix called NCM_Apriori_1was proposed．The improvements of this algorithm are as follows:(1) adding two arrays to record the counts of 1in the row and column,so that the number of scanning the matrix can be reduced during compressing,(2)deleting the unnecessary itemsets which can’t be connected as well as the infrequent ones in compressing matrix to minify the scale of matrix and improve space utilization,(3)changing the condition of deleting the unnecessary transactions to reduce the errors of the mine result,and changing the stopping condition to make the number of cycle decreased．Algorithm performance analysis and experiments results prove that the improved algorithm can mine frequent itemsets effectively and has better efficiency of computing than existing Apriori algorithms based on compressed matrix.

Key words: Data mining,Frequent itemsets,Apriori algorithm,Compressed matrix

罗丹,李陶深. 一种基于压缩矩阵的Apriori算法改进研究[J]. 计算机科学, 2013, 40(12): 75-80. https://doi.org/

LUO Dan and LI Tao-shen. Research on Improved Apriori Algorithm Based on Compressed Matrix[J]. Computer Science, 2013, 40(12): 75-80. https://doi.org/

参考文献

[1] Agrawal R,Imielinaki T,Swami A．Mining association rules between sets items in large databases[C]∥Proceedings of the ACM SIGMOD Conference on Management of Date．Washington,D.C,1993:207-216
[2] Han J,Pei J,Yin Y．Mining Frequent Patterns without Candidate Generations[C]∥Proceedings of the 2000ACM SIGMOD International Conference on Management of Data．Dallas,Texas,USA,2000:1-12
[3] Zaki M J．Scalable Algorithms for Association Mining[J]．IEEE Transactions on Knowledge and Data Engineering,2000,12(3):372-390
[4] Park J S,Chen M-S,Yu P S．An effective hash-based algorithms for mining association rules[C]∥Proceedings of the 1995ACM SIGMOD International Conference on Management of Data．San Jose,California:ACM Press,1995:175-186
[5] 刘步中．基于频繁项集挖掘算法的改进与研究[J]．计算机应用研究,2012,29(2):475-477
[6] Wang Feng,Li Yong-hua．An improved Apriori algorithm based on the matix[C]∥2008International Seminat on Future BioMedical Information Engineering．Wuhan,2008:152-155
[7] 吕桃霞,刘培玉．一种基于矩阵的强关联规则生成算法[J]．计算机应用研究,2011,28(4):1301-1303
[8] 张笑达,徐立臻．一种改进的基于矩阵的频繁项集挖掘算法[J]．计算机技术与发展,2010,20(4):93-96
[9] 付沙,廖明华,宋丹．基于压缩矩阵方式的Apriori改进算法[J]．微电子学与计算机,2012,29(6):28-32
[10] Khare N,Adlakha N,Pardasani K R．An Algorithm for Mining Multidimensional Association Rules using Boolean Matrix[C]∥2010International Conference in Recent Trends in Information,Telecommunication and Computing．Kochi,Kerala,2010:95-99
[11] Liu Hui-zhen,Dai Shang-ping,Jiang Hong．Quantitative association rules mining algorithm based on matrix[C]∥2009International Conference on Computational Intelligence and Software Engineering．Wuhan,2009:1-4
[12] 何丽．基于规模压缩的关联规则数据挖掘算法研究[J]．计算机科学,2007,4(9):148-150
[13] 闫珍,皮德常,吴文昊．高维稀疏数据频繁项集挖掘算法的研究[J]．计算机科学,2011,8(6):183-186
[14] 李瑞,康良玉,耿浩．基于数组的关联规则算法的改进[J]．科学技术与工程,2008,8(21):5846-5849

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed