计算机科学 ›› 2010, Vol. 37 ›› Issue (12): 145-148.

• 数据库与数据挖掘 • 上一篇    下一篇

一种结合散列与位表挖掘频繁项目集算法

任永功,宋奎勇,寇香霞   

  1. (辽宁师范大学计算机与信息技术学院 大连116029)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(60603047),辽宁省科技计划项目(200821 607),大连市优秀青年科技人才基金(2008J23JH026),教育部留学回国人员科研启动基金资助。

Algorithm Combination of Hash and BitTable for Mining Frequent Itemsets

REN Yong-gong,SONG Kui-yong,KOU Xiang-xia   

  • Online:2018-12-01 Published:2018-12-01

摘要: 在频繁项集的挖掘中,很多算法都是基于Apriori的。这些算法有两个共同的问题:一是把整个数据库装入内存,占用大量的空间;二是在产生候选项集和计算支持度时花费了大量的时间。为了提高效率,提出了一种基于位表挖掘频繁项目集的算法Hash-BFI。按照水平和垂直的方向把数据库压缩到位表内,以大大节省内存空间。引入散列函数计算频繁二项集,完全通过AND, OR运算得到候选项集和计算候选项集支持度,并进行剪枝,从而提高了算法效率。

关键词: Apriori,频繁项集,位表,散列

Abstract: In the frectuent itemsets mining, many algorithms are based on Apriori. These algorithms have two common problems. First,a lot of memory space are occupied by the entire database which must be loaded. Second,The processes of generating candidate itemset and computing support spend a lot of time. In order to improve efficiency, a BitTablc based form mining frequent itemsets algorithm Hash-BFI was proposed. The database was compressed into the BitTable in accordance with horizontal and vertical direction saving lots of place, used the hash function to compute the frequent two itemsets,also completely utilized AND,OR operation to generate candidate itemset and compute support for candidate itemset,and producted a pruning. All these meatures improve the efficiency of algorithm.

Key words: Apriori, Frequent itemsets, BitTable, Hash

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!