计算机科学 ›› 2011, Vol. 38 ›› Issue (10): 177-180.

• 数据库与数据挖掘 • 上一篇    下一篇

一种基于加权KNN的大数据集下离群检测算法

王茜,杨正宽   

  1. (重庆大学计算机学院 重庆400044)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Algorithm for Outlier Detection in Large Dataset Based on Wei沙ted KNN

WANG Qian,YANG Zheng-kuan   

  • Online:2018-11-16 Published:2018-11-16

摘要: 传统KNN算法是在基于距离的离群检测算法的基础上提出的一种在大数据集下进行离群点挖掘的算法, 然而KNN算法只以最近的第k个部居的距离作为判断是否是离群点的标准有时也失准确性。给出了一种在大数据 集下基于KNN的离群点检测算法,即在传统KNN方法的基础上为每个数据点增加了权重,权重值为与最近的k个 邻居的平均距离,离群点为那些与第k个部居的距离最大且相同条件下权重最大的点。算法能提高离群点检测的准 确性,通过实验验证了算法的可行性,并与传统KNN算法的性能进行了对比。

关键词: 离群点,数据挖掘,权重,划分

Abstract: Traditional KNN is an advanced algorithm based on the distance of outlicr detection algorithm on large data- set. However this algorithm only uses the k`h nearest neighbor as the criterion for outher which is inaccurate under cer- lain condition. This paper presented a weighted KNN outlier detection algorithm for large datasets. In this algorithm, a weight factor is presented. It represents the average distance of its k nearest neighbors. The outlicrs arc those having the largest distance with it's k`h neighbor and having the biggest weight under the same condition. The algorithm improves the accuracy of the outlicr detection algorithm. Experiment result shows that the algorithm is feasible compared with the traditional KNN.

Key words: Outfier,Data mining,Weight,Partition

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!