Computer Science ›› 2016, Vol. 43 ›› Issue (7): 251-254.doi: 10.11896/j.issn.1002-137X.2016.07.045

PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset

GOU Jie, MA Zi-tang and ZHANG Zhe-cheng   

  • Online:2018-12-01 Published:2018-12-01

Abstract: In order to improve the outlier detection algorithm’s efficiency of dealing with large-scale data set,a parallel outlier detection based on K-nearest neighborhood was put forward.This algorithm can find the K-nearest neighborhood and calculate the degrees of outliers by using partitioning strategy for pretreatment of data sets,and then it merges the results and selects outliers.The algorithm is designed to suit for the MapReduce programming model to implement parallelization and improve the computational efficiency of dealing with large-scale data sets.The experimental results show that the PODKNN has the advantages of high speedup and good scalability.

Key words: Data mining,Outlier detection,K-nearest neighborhood,MapReduce

