计算机科学 ›› 2009, Vol. 36 ›› Issue (7): 247-251.doi: 10.11896/j.issn.1002-137X.2009.07.061

• 人工智能 • 上一篇    下一篇

一种高效的离线数据流频繁模式挖掘算法

侯伟,吴晨生,杨炳儒,方炜炜   

  1. (北京科技大学信息工程学院 北京100083);(北京市科学技术情报研究所 北京100037)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(60675030)资助。

Efficient Algorithm for Mining Frequent Patterns over Offline Data Streams

HOU Wei,WU Chen-sheng,YANG Bing-ru,FANG Wei-wei   

  • Online:2018-11-16 Published:2018-11-16

摘要: 数据流频繁模式挖掘是当前数据挖掘领域中的研究热点之一,数据流连续性、无序性、无界性及实时性的特点为挖掘算法在时间及空间性能方面提出了更高的要求。数据流中模式频度的震荡现象,迫使现有算法对概要数据结构频繁维护,致使其时间、空间效率均受到较大影响。构造了具备较高空间性能的概要数据结构SP-tree,同时定义了震荡性因子x以量化震荡信息,提出了一种高效的离线数据流频繁模式挖掘算法SPDS,有效降低了数据震荡对算法性能的影响;在处理新到数据集时,算法采取分而治之的分离映射策略,进一步提升了时间效率;同时在查询结果方面提高了部分模式的计数精度。

关键词: 数据挖掘,数据流,频繁模式,震荡性因子

Abstract: Mining frequent patterns from data streams is one of the hottest research topics in data mining nowadays.The features of data streams, such as consecution, disorder and real-time, raise requirements for higher time and space performance of mining algorithms. Vibration of pattern frectuency in data streams, compels the present algorithms to revise the synopsis structure continually,and leads up to disadvantage impact on both time and space efficiency. A more scalable synopsis structure SP-tree was designed firstly, meanwhile the concept of vibration factor X was given for main-twining vibrational information. Then an efficient algorithm for mining frequent patterns over offline data streams SPDS was proposed, which relieves the performance from the impact of vibration effectively, and increases the count accuracy of partial patterns. This algorithm adopts a dividcand-conquer mechanism to mine the current datasct, thereby improves itself further.

Key words: Data ming,Data stream,Frecauent pattern(FP),Vibration factor

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!