计算机科学 ›› 2011, Vol. 38 ›› Issue (10): 157-160.

• 数据库与数据挖掘 • 上一篇    下一篇

基于△-tree的高维数据相似连接算法

刘艳,郝忠孝   

  1. (哈尔滨理工大学计算机科学与技术学院 哈尔滨150080);(长春大学计算机科学技术学院 长春130022);(哈尔滨工业大学计算机科学与技术学院 哈尔滨150001)
  • 出版日期:2018-11-16 发布日期:2018-11-16

△-tree Based Similarity ,Join Algorithm for High-dimensional Data

LIU Yan,HAO Zhong-xiao   

  • Online:2018-11-16 Published:2018-11-16

摘要: 为了解决主存中高维数据相似连接问题,基于高效索引合△-tree提出了连接两个不同数据集的主存相似连接 算法△-tree-join*。该算法采用自顶向下的模式,充分利用△-tree的特性,使用较少的维数计算聚类之间的距离及数 据点与聚类之间的距离,通过该距离过滤掉不必要的节点和数据点,减少计算量,提高连接效率。实验结果表明,△- tree-join‘是一种更适合主存的相似连接算法,比目前这方面最先进的算法EGO及EGO‘具有更优的性能。

关键词: 相似连接,高维空间,主存,数据挖掘,相似搜索

Abstract: Similarity joins arc used in a variety of fields, such as clustering, text mining, and multimedia databases. In or- der to solve the proplemes of high-dimensional similarity joins in main-memory environment, a novel similarity join algo- rithm called △-tree-join* that can efficiently combine two different database sets based on p-tree was presented. △-tree has been proven to be an efficient index method in main-memory. △-tree-join* adopted the top-down join scheme and made full use of the properties of p-tree to compute the distances between clusters and between point and cluster with fewer number of dimensions,so as to filter unnecessary nodes or points,reduce computations and improve joins efficien- cy. Experiments on both synthetic clustered dataset and real datasets were conducted, and the results demonstrate that △-tree-join* is more suitable for main-memory similarity joins, and it performs well compared with the two state-of-the- art similarity join methods EGO and EGO".

Key words: Similarity join, High-dimensional space, Main-memory, Data mining, Similarity search

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!