计算机科学 ›› 2016, Vol. 43 ›› Issue (10): 190-192.doi: 10.11896/j.issn.1002-137X.2016.10.035

• 软件与数据库技术 • 上一篇    下一篇

基于CUDA的并行K-近邻连接算法实现

潘茜,张育平,陈海燕   

  1. 南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家重点基础研究发展计划(973计划)(2014CB744900),南京航空航天大学研究生创新基地开放基金(KFJJ201460)资助

Implementation of Parallel K-Nearest Neighbor Join Algorithm Based on CUDA

PAN Qian, ZHANG Yu-ping and CHEN Hai-yan   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对大规模空间数据的K-近邻连接查询问题,设计了一种CUDA编程模型下K-近邻连接算法的并行优化方法。将K-近邻连接算法的并行过程分两个阶段:1)对参与查询的数据集P和Q分别建立R-Tree索引;2)基于R-Tree索引进行KNNJ查询。首先根据结点所在位置划分最小外包框,在CUDA下基于递归网格排序算法创建R-Tree索引。然后在CUDA下基于R-Tree索引进行KNNJ查询,其中涉及并行求距离和并行距离排序两个阶段:求距离阶段利用每一个线程计算任意两点之间的距离,点与点之间距离的求取无依赖并行;排序阶段将快速排序基于CUDA以实现并行化。实验结果表明,随着样本量的不断增大,基于R-Tree索引的并行K-近邻连接算法的优势更加明显,具有高效性和可扩展性。

关键词: CUDA,K-近邻连接,空间查询,并行计算,R-Tree索引

Abstract: In order to solve the problem of K-nearest neighbor join query in large scale spatial data,a parallel optimization method of K-nearest neighbor join algorithm based on CUDA programming model was designed.The parallel process of K-nearest neighbor join algorithm is divided into two stages.One is to establish the R-Tree index for the data set Q and P participate in the query,and the other is to carry out the KNNJ query based on R-Tree index.Firstly,MBR is created according to the location of nodes,and the R-Tree index is created based on SRT by CUDA.Then,the KNNJ query is made based on the R-Tree index,including parallel computing and parallel sorting.The distance between two points can be calculated by each thread on the parallel,and quicksort is executed in parallel on the CUDA.Experimental results show that with the increase of sample size,the advantages of parallel K-nearest neighbor algorithm are more obvious,which has high efficiency and scalability.

Key words: CUDA,K-neighbor join,Spatial query,Parallel computing,R-Tree index

[1] Bohm C,Krebs F.The k-nearest neighbor join:Turbo charging the KDD process[J].Knowledge Information System,2004:6(6):728-749
[2] Xia C Y,Lu H J,Coi B C,et al.Gorder:An efficient method for KDD joins processing[C]∥Proc.of the 30th Int’l Conf.on VeryLarge Data Bases.2004:756-767
[3] Yu C,Cui B,Wang S G,et al.Efficient index-based KNN join processing for high-dimensional data[J].Information and Software Technology,2007,9(4):332-344
[4] Yao B,Li F F,Kumar P.K nearest neighbor queries and KNN-joins in large relational databases (almost) for free[C]∥Proc.of the 26th Int’l Conf.on Data Engineering (ICDE).2010:4-15
[5] Wu En-hua.Technology,current situation and challenge ofgraphics processor are used for general computing[J].Journal of Software,2004,15(10):1493-1504(in Chinese) 吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,5(10):1493-1504
[6] Xu Xue-gui,Zhang Qing.High efficiency parallel remote sensing image processing based on CUDA[J].Geospatial Information,2011,9(6):47-54(in Chinese)许雪贵,张清.基于CUDA的高效并行遥感影像处理[J].地理空间信息,2011,9(6):47-54
[7] Dong Luo,Ge Wan-cheng,Chen Kang-li.Application Research of CUDA parallel computing [J].Information Technology,2010,4(5):11-15(in Chinese)董荦,葛万成,陈康力.CUDA并行计算的应用研究[J].信息技术,2010,4(5):11-15
[8] Liu Yi,Jing Ning,Chen Luo,et al.Algorithm for Processing k-Nearest Join Based on R-Tree in MapReduce[J].Journal of Software,2013,24(8):1836-1851(in Chinese)刘义,景宁,陈荦,等.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,24(8):1836-1851
[9] Sosutha S,Mohana D.Heterogeneous parallel computing usingCUDA for chemical process [J].Procedia Computer Science,2015,47(1):237-246
[10] Leutenegger S T,Lopez M A,Edgington J.STR:A Simple and Efficient Algorithm for R-tree Packing [C]∥The 13th International Conference on Data Engineering.Birmingham,England,1997:497-506
[11] Hoare C A R.Quicksort [J].The Computer J.,1962,15(1):10-15

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!