计算机科学 ›› 2011, Vol. 38 ›› Issue (4): 216-220.

• 数据库与数据挖掘 • 上一篇    下一篇

基于垂直数据分布的大型稠密数据库快速关联规则挖掘算法

崔建,李强,杨龙坡   

  1. (空军雷达学院预警监视情报系 武汉430019)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(60736009)资助。

Fast Algorithm for Mining Association Rules Based on Vertically Distributed Data in Large Dense Databases

CUI Jian,LI Qiang,YANG Long-po   

  • Online:2018-11-16 Published:2018-11-16

摘要: 为进一步解决对大型事务数据库进行关联规则挖掘时产生的CPU时间开销大和I/O操作频繁的问题,给出了一种基于垂直数据分布的改进关联规则挖掘算法,称为VARMLDb算法。该算法首先有效地把数据库分为内存可以满足要求的若干划分,然后结合有向无环图和垂直数据形式diffse、差集来存储和计算频繁项集,极大地减少了存储中间结果所需的内存大小,解决了传统垂直数据挖掘算法对稠密数据库挖掘效率低下的问题,使该算法可有效地适用于大型稠密数据库的关联规则挖掘。整个算法吸取CARMA算法的优势,只需扫描两次数据库便可完成挖掘过程。实验结果表明该算法是正确的,在大型稠密数据库中,VARMLDb算法具有较高的执行效率。

关键词: CARMA算法,DAG, diffset差集,垂直数据分布,稠密数据库

Abstract: To further reduce both CPU and I/O overhead in the process of mining the association rules on the large transaction database by the traditional algorithm, an improved algorithm of association rule mining based on vertical data layout named VARMLDb(Vertical Association Rule Mining for Large Databases) was suggested. In the proposed algorithm,after dividing the database into several partitions each of that is suitable for the current memory, the algorithm combines directed acyclic graphs and diffset(difference of tidlist sets) which belongs vertical data layout structure for storing and computing frequent item sets, which not only greatly cuts down the required memory size used to save intermediate results but also solves the low efficiency problem during the mining dense database by traditional vertical data mining algorithm, so that the algorithm is more effective for large dense databases. As a result of drawing the advantages of CARMA(continuous association rule mining) algorithm, the algorithm needs to scan the database for only twice.Experimental results show that the algorithm is correct, and in the large dense transaction databases, VARMI_Db algorithm has higher implementation efficiency. Continuous association rule mining algorithm, Directed acyclic graphs, Diffset plumb, Vertically distributed data, Dense database

Key words: Continuous association rule mining algorithm, Directed acyclic graphs, Diffset plumb, Vertically distributed data, Dense databases

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!