计算机科学 ›› 2015, Vol. 42 ›› Issue (5): 204-210.doi: 10.11896/j.issn.1002-137X.2015.05.041
王 飞,秦小麟,刘 亮,沈 尧
WANG Fei, QIN Xiao-lin, LIU Liang and SHEN Yao
摘要: k-近邻连接查询是空间数据库中一种常用的操作,该查询处理过程涉及连接和最近邻查询两个复杂操作。传统的集中式k-近邻连接查询算法已不能适应当前呈爆炸式增长的数据规模,设计分布式k-近邻连接查询算法成为了目前亟需解决的问题。现有的分布式k-近邻连接查询算法都包括了多轮串行的MapReduce任务,而每个MapReduce任务均需要读写分布式文件系统,导致MapReduce不能有效表达多个任务之间的依赖关系,因此算法效率低下。首先提出了一种基于数据流的计算框架,该框架建立在MapReduce之上,将数据处理过程按照数据流图建模。在该框架基础上,提出了一种高效的k-近邻连接算法,它利用空间填充曲线将多维数据映射为一维数据,从而将k-近邻连接查询转化为一维范围查询。实验结果表明,该算法的可扩展性较高,且效率比现有算法更优。
[1] Bhm C,Krebs F.The k-nearest neighbour join:Turbo charging the KDD process[J].Knowledge and Information Systems,2004,6(6):728-749 [2] Kavraki L E,Plaku E.Distributed Computation of the knnGraph for Large High-Dimensional Point Sets[J].Parallel Distributed Computation,2007,7(3):346-359 [3] Sardana D,Bhatnagar R.Graph Clustering Using Mutual K-Nearest Neighbors[M]∥Active Media Technology.Springer International Publishing,2014:35-48 [4] Xia C,Lu H,Ooi B C,et al.Gorder:an efficient method for KNN join processing[C]∥Proceedings of the Thirtieth international conference on VLDB Endowment.2004:756-767 [5] Yu C,Cui B,Wang S,et al.Efficient index-based KNN join processing for high-dimensional data[J].Information and Software Technology,2007,49(4):332-344 [6] Cheung K L,Fu A W C.Enhanced nearest neighbour search on the R-tree[J].ACM SIGMOD Record,1998,27(3):16-21 [7] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113 [8] Zhang C,Li F,Jestes J.Efficient parallel kNN joins for large data in MapReduce[C]∥Proceedings of the 15th International Conference on Extending Database Technology.ACM,2012:38-49 [9] Lu W,Shen Y,Chen S,et al.Efficient processing of k nearest neighbor joins using mapreduce[J].Proceedings of the VLDB Endowment,2012,5(10):1016-1027 [10] 刘义,景宁,陈荦,等.MapReduce 框架下基于 R-树的 k-近邻连接算法[J].软件学报,2013,4(8):1836-1851 [11] Apache.Hadoop[EB/OL].2014-4-10[2014-4-22].http://ha-doop.apache.org/ [12] Liu Y,Cui J,Huang Z,et al.SK-LSH:An Efficient Index Structure for Approximate Nearest Neighbor Search[J].Proceedings of the VLDB Endowment,2014,7(9):745-756 [13] Choi W,Oh S.Fast nearest neighbor search algorithm using the cache technique[J].Advanced Robotics,2013,27(15):1175-1187 [14] Gieseke F,Heinermann J,Oancea C,et al.Buffer kd trees:processing massive nearest neighbor queries on GPUs[C]∥Proceedings of The 31st International Conference on Machine Learning.2014:172-180 [15] Luo G,Naughton J F,Ellmann C J.A non-blocking parallel spatial join algorithm[C]∥ Proceedings of the 2002 IEEE 18th International Conference on Data Engineering.2002:697-705 [16] 何洪辉,王丽珍,周丽华.pgi-distance:一种高效的并行KNN-join处理方法[J].计算机研究与发展,2007,4(10):1774-1781 [17] Mutenda L,Kitsuregawa M.Parallel r-tree spatial join for ashared-nothing architecture[C]∥Proceedings of the 1999 IEEE International Symposium on Database Applications in Non-Traditional Environments.1999:423-430 [18] Yu C,Cui B,Wang S,et al.Efficient index-based KNN join processing for high-dimensional data[J].Information and Software Technology,2007,49(4):332-344 [19] Yao B,Li F,Kumar P.K nearest neighbor queries andknn-joins in large relational databases (almost) for free[C]∥ Proceedings of the 2010 IEEE 26th International Conference on Data Engineering.2010:4-15 |
No related articles found! |
|