基于△-tree的高维数据相似连接算法

计算机科学 ›› 2011, Vol. 38 ›› Issue (10): 157-160.

基于△-tree的高维数据相似连接算法

刘艳,郝忠孝

(哈尔滨理工大学计算机科学与技术学院哈尔滨150080);(长春大学计算机科学技术学院长春130022);(哈尔滨工业大学计算机科学与技术学院哈尔滨150001)

出版日期:2018-11-16 发布日期:2018-11-16

△-tree Based Similarity ,Join Algorithm for High-dimensional Data

LIU Yan,HAO Zhong-xiao

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 为了解决主存中高维数据相似连接问题，基于高效索引合△-tree提出了连接两个不同数据集的主存相似连接算法△-tree-join*。该算法采用自顶向下的模式，充分利用△-tree的特性，使用较少的维数计算聚类之间的距离及数据点与聚类之间的距离，通过该距离过滤掉不必要的节点和数据点，减少计算量，提高连接效率。实验结果表明，△- tree-join‘是一种更适合主存的相似连接算法，比目前这方面最先进的算法EGO及EGO‘具有更优的性能。

关键词: 相似连接，高维空间，主存，数据挖掘，相似搜索

Abstract: Similarity joins arc used in a variety of fields, such as clustering, text mining, and multimedia databases. In or- der to solve the proplemes of high-dimensional similarity joins in main-memory environment, a novel similarity join algo- rithm called △-tree-join* that can efficiently combine two different database sets based on p-tree was presented. △-tree has been proven to be an efficient index method in main-memory. △-tree-join* adopted the top-down join scheme and made full use of the properties of p-tree to compute the distances between clusters and between point and cluster with fewer number of dimensions,so as to filter unnecessary nodes or points,reduce computations and improve joins efficien- cy. Experiments on both synthetic clustered dataset and real datasets were conducted, and the results demonstrate that △-tree-join* is more suitable for main-memory similarity joins, and it performs well compared with the two state-of-the- art similarity join methods EGO and EGO".

刘艳,郝忠孝. 基于△-tree的高维数据相似连接算法[J]. 计算机科学, 2011, 38(10): 157-160. https://doi.org/

LIU Yan,HAO Zhong-xiao. △-tree Based Similarity ,Join Algorithm for High-dimensional Data[J]. Computer Science, 2011, 38(10): 157-160. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed