Computer Science ›› 2015, Vol. 42 ›› Issue (5): 204-210.doi: 10.11896/j.issn.1002-137X.2015.05.041

Previous Articles     Next Articles

Algorithm for k-Nearest Neighbor Join Based on Data Stream

WANG Fei, QIN Xiao-lin, LIU Liang and SHEN Yao   

  • Online:2018-11-14 Published:2018-11-14

Abstract: kNN join is a frequently used operation in spatial database.It involves both the join and the NN search.Data scale is exploding,and traditional centralized algorithm can not meet the requirements.It is an urgent problem to design distributed kNN join algorithm currently.Existing distributed algorithms include several rounds of serial MapReduce tasks,but each MapReduce task reads and writes data from distributed file system.It is inefficient when expressing dependencies between jobs,and these algorithms are inefficient.Firstly,we proposed a framework based on data stream on MapReduce.This framework models data handle process according to the data flow diagram,and we proposed an efficient kNN join algorithm on the framework.The algorithm maps multi-dimensional data sets into one dimension using space-filling curves (z-values),and transforms kNN joins into a sequence of one-dimensional range searches.Experimental results demonstrate that the algorithm can efficiently resolve the large scale kNN join spatial query.

Key words: kNN join,Data stream,MapReduce,Framework

[1] Bhm C,Krebs F.The k-nearest neighbour join:Turbo charging the KDD process[J].Knowledge and Information Systems,2004,6(6):728-749
[2] Kavraki L E,Plaku E.Distributed Computation of the knnGraph for Large High-Dimensional Point Sets[J].Parallel Distributed Computation,2007,7(3):346-359
[3] Sardana D,Bhatnagar R.Graph Clustering Using Mutual K-Nearest Neighbors[M]∥Active Media Technology.Springer International Publishing,2014:35-48
[4] Xia C,Lu H,Ooi B C,et al.Gorder:an efficient method for KNN join processing[C]∥Proceedings of the Thirtieth international conference on VLDB Endowment.2004:756-767
[5] Yu C,Cui B,Wang S,et al.Efficient index-based KNN join processing for high-dimensional data[J].Information and Software Technology,2007,49(4):332-344
[6] Cheung K L,Fu A W C.Enhanced nearest neighbour search on the R-tree[J].ACM SIGMOD Record,1998,27(3):16-21
[7] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[8] Zhang C,Li F,Jestes J.Efficient parallel kNN joins for large data in MapReduce[C]∥Proceedings of the 15th International Conference on Extending Database Technology.ACM,2012:38-49
[9] Lu W,Shen Y,Chen S,et al.Efficient processing of k nearest neighbor joins using mapreduce[J].Proceedings of the VLDB Endowment,2012,5(10):1016-1027
[10] 刘义,景宁,陈荦,等.MapReduce 框架下基于 R-树的 k-近邻连接算法[J].软件学报,2013,4(8):1836-1851
[11] Apache.Hadoop[EB/OL].2014-4-10[2014-4-22].http://ha-doop.apache.org/
[12] Liu Y,Cui J,Huang Z,et al.SK-LSH:An Efficient Index Structure for Approximate Nearest Neighbor Search[J].Proceedings of the VLDB Endowment,2014,7(9):745-756
[13] Choi W,Oh S.Fast nearest neighbor search algorithm using the cache technique[J].Advanced Robotics,2013,27(15):1175-1187
[14] Gieseke F,Heinermann J,Oancea C,et al.Buffer kd trees:processing massive nearest neighbor queries on GPUs[C]∥Proceedings of The 31st International Conference on Machine Learning.2014:172-180
[15] Luo G,Naughton J F,Ellmann C J.A non-blocking parallel spatial join algorithm[C]∥ Proceedings of the 2002 IEEE 18th International Conference on Data Engineering.2002:697-705
[16] 何洪辉,王丽珍,周丽华.pgi-distance:一种高效的并行KNN-join处理方法[J].计算机研究与发展,2007,4(10):1774-1781
[17] Mutenda L,Kitsuregawa M.Parallel r-tree spatial join for ashared-nothing architecture[C]∥Proceedings of the 1999 IEEE International Symposium on Database Applications in Non-Traditional Environments.1999:423-430
[18] Yu C,Cui B,Wang S,et al.Efficient index-based KNN join processing for high-dimensional data[J].Information and Software Technology,2007,49(4):332-344
[19] Yao B,Li F,Kumar P.K nearest neighbor queries andknn-joins in large relational databases (almost) for free[C]∥ Proceedings of the 2010 IEEE 26th International Conference on Data Engineering.2010:4-15

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!