高维稀疏数据频繁项集挖掘算法的研究

计算机科学 ›› 2011, Vol. 38 ›› Issue (6): 183-186.

高维稀疏数据频繁项集挖掘算法的研究

闫珍，皮德常，吴文昊

(南京航空航天大学信息科学与技术学院南京210016)； (复旦大学计算机科学与技术学院上海200433)

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国防技术基础研究和国家高技术研究发展计划(863计划)项目(2007AAO1Z404)资助。

Research on Frequent Itemsets Mining Algorithm Based on High-dimensional Sparse Dataset

YAN Zhen， PI De-chang，WU Wen-hao

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 传统挖掘算法不适用于挖掘高维稀疏数据集。提出了一种针对高维稀疏数据的频繁项集挖掘算法FIHS。FIHS引入了一种新的数据结构用来存储频繁项集，该结构不但可以减少存储空间，而且可以降低计数代价。该算法只需扫描一次数据集，通过优化连接剪枝操作避免产生非频繁的候选项集，基于K-频繁项集使用“与”、“或”操作产生K+1-频繁项集，且数据结构易于维护。理论分析和实验表明，该算法用于高维稀疏数据集上具有挖掘速度快，存储空间少等优点。

关键词: 高维数据，稀疏数据，频繁项集，存储结构

Abstract: The traditional mining algorithms arc not applicable to mine high-dimensional sparse dataset,a new frequent itemsets mining algorithm based on high-dimensional sparse dataset named FIRS (Frequent mining algorithm based on High-dimensional Sparse dataset) was proposed in this paper. FIHS adopts a new data structure to store frequcnt itemsets, using this structure can reduce the storage space and the cost of counting. FIHS can avoid generating infrectuent candidate itemsets through optimizing the operation of connection and pruning,which rectuires scan the dataset once. What's more,just by applying ANIX)R operation,frequcnt K+1-itemsets can be created according to frequent K-itemsets, and the maintenance of the data structure is simple. According to theoretical analysis and experiments, the improved algorithm enjoys many advantages aiming at high-dimensional sparse dataset, such as quick mining, less memory spacc,etc.

Key words: High-dimensional data,Sparse data,Frequent itemsets,Data structure

闫珍，皮德常，吴文昊. 高维稀疏数据频繁项集挖掘算法的研究[J]. 计算机科学, 2011, 38(6): 183-186. https://doi.org/

YAN Zhen， PI De-chang，WU Wen-hao. Research on Frequent Itemsets Mining Algorithm Based on High-dimensional Sparse Dataset[J]. Computer Science, 2011, 38(6): 183-186. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed