计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 494-497.doi: 10.11896/j.issn.1002-137X.2017.11A.105

• 综合、交叉与应用 • 上一篇    下一篇

CS-Chord:基于聚类分离的分布式高维向量索引

袁鑫攀,汪灿飞,龙军,彭成   

  1. 湖南工业大学计算机学院 株洲412007;湖南工业大学智能信息感知及处理技术湖南省重点实验室 株洲412007,湖南工业大学计算机学院 株洲412007;湖南工业大学智能信息感知及处理技术湖南省重点实验室 株洲412007,中南大学信息科学与工程学院 长沙410083,湖南工业大学计算机学院 株洲412007;湖南工业大学智能信息感知及处理技术湖南省重点实验室 株洲412007
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(61402165),湖南省自然科学基金项目(2015JJ3058),湖南省重点研发计划(2016JC2018),国家基金应急管理项目(S1651002),2016年湖南工业大学研究生校级创新基金项目(CX1606)资助

CS-Chord:Distributed High Dimensional Vector Index Based on Clustering Separation

YUAN Xin-pan, WANG Can-fei, LONG Jun and PENG Cheng   

  • Online:2018-12-01 Published:2018-12-01

摘要: M-Chord是一种基于P2P网络的高维向量索引,其聚类边缘的向量容易与搜索圆频繁相交,使得查找的区域增多,降低了M-Chord的效率。提出一种基于聚类分离的分布式高维向量索引(CS-Chord),将边缘区域的高频检索向量从Chord环中分离出来,集中存储在服务器上,中心区域的向量仍存储于Chord环中,节省了大量资源的定位时间,从而提高检索效率。实验结果表明:在查询半径为0.2时,CS-Chord距离计算次数约为2000,比M-Chord减少了约2500次;CS-Chord消息转发次数约降低150次,仅为M-Chord的50%。

关键词: 高维向量,聚类,Chord,分布式索引

Abstract: M-Chord is a high dimensional vector index based on the P2P network,and the sparse vector of the clustering edge is easy to intersect with the query circle,which makes the area of the query increased,and the efficiency of M-Chord is reduced.Based on the idea of M-Chord,this paper proposed a distributed high dimensional vector index (CS-Chord) based on clustering,which separates the edge vectors from the dense vectors.The vector of the central region is still stored in the Chord loop,which saves a lot of resources locating time and improves the efficiency of retrieval.Experimental results show that when the query radius is 0.2,the number of CS-Chord distance computation 2000,reducing about 2500 times compared with M-Chord.The message forwarding times of CS-Chord decreases about 150 times,only 50% of M-Chord.

Key words: High-dimensional vector,Cluster,Chord,Distributed index

[1] SCHUH M A,WYLIE T,BANDA J M,et al.A comprehensive study of iDistance partitioning strategies for knn queries and high-dimensional data indexing[M]∥Big Data.Springer Berlin Heidelberg,2013:238-252.
[2] ZHOU K,HOU Q,WANG R,et al.Real-time KD-tree con-struction on graphics hardware[J].ACM Transactions on Graphics,2008,27(5):126.
[3] ARGE L,BERG M D,HAVERKORT H.The priority R-tree:A practically efficient and worst-case optimal R-tree[J].Acm Transactions on Algorithms,2008,4(1):1-30.
[4] LIU X.The Dirac Operator on Regular Metric Trees[J].Phy-sics,2015,226:165-178.
[5] NOVAK D,ZEZULA P.M-Chord:a scalable distributed similarity search structure[C]∥International Conference on Scalable Information Systems(Infoscale 2006).Hong Kong,2006:1-10.
[6] ALY M,MUNICH M,PERONA P.Distributed Kd-Trees for Ultra Large Scale Object Recognition[C]∥British Machine Vision Conference.2011:1-11.
[7] YIN W,ZHU M,JIANG L.R-Chord:A distributed similarityretrieval system with RPCID[C]∥IEEE International Confe-rence on Network Infrastructure and Digital Content,2009.IcNidc,2009:393-399.
[8] 吴纯青,任沛阁,王小峰.基于语义的网络大数据组织与搜索[J].计算机学报,2015,38(1):1-17.
[9] STOICA I,MORRIS R,KARGER D,et al.Chord:A scalablepeer-to-peer lookup service for internet applications[J].IEEE/ACM Transactions on Networking,2003,11(4):149-160.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!