计算机科学 ›› 2013, Vol. 40 ›› Issue (8): 181-185.

• 软件与数据库技术 • 上一篇    下一篇

NLOF:一种新的基于密度的局部离群点检测算法

王敬华,赵新想,张国燕,刘建银   

  1. 华中师范大学计算机学院 武汉430079;华中师范大学计算机学院 武汉430079;华中师范大学计算机学院 武汉430079;华中师范大学计算机学院 武汉430079
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(61170017)资助

NLOF:A New Density-based Local Outlier Detecting Algorithm

WANG Jing-hua,ZHAO Xin-xiang,ZHANG Guo-yan and LIU Jian-yin   

  • Online:2018-11-16 Published:2018-11-16

摘要: 基于密度的局部离群点检测算法(LOF)的时间复杂度较高且不适用于大规模数据集和高维数据集的离群点检测。通过对LOF算法的分析,提出了一种新的局部离群点检测算法NLOF,该算法的主要思想如下:在数据对象邻域查询过程中,尽可能地利用已知信息优化邻近对象的邻域查询操作,有关邻域的计算查找都采用这种思想。首先通过聚类算法DBSCAN对数据集进行预处理,得到初步的异常数据集。然后利用LOF算法中计算局部异常因子的方法计算初步异常数据集中对象的局部异常程度。在计算数据对象的局部异常因子的过程中,引入去一划分信息熵增量,用去一划分信息熵差确定属性的权重,対属性的权值做具体的量化,在计算各对象之间的距离时采用加权距离。 在真实数据集上 对NLOF算法进行了充分的验证。结果显示,该算法能够提高离群点检测的精度,降低时间复杂度,实现有效的局部离群点的检测。

关键词: 数据挖掘,离群点检测,信息熵,聚类

Abstract: The time complexity of the density-based outlier detecting algorithm (LOF algorithm) is not ideal,which effects its applications in large scale datasets and high dimensional datasets.Under such circumstances,a new density-based outlier detecting algorithm (NLOF algorithm) was introduced.The main idea of the NLOF algorithm is as follows:the known information is used as much as possible to optimize the neighborhood query operation of adjacent objects in the process of neighborhood searching of a data object.This method is adopted in neighborhood computing and searching in this paper.Firstly,clustering algorithm is taken as a preprocessing,and local outlier factors are calculated only for the data objects out of clusters.Secondly,the local outlier factors for the data objects out of clusters are calculated in the same way as calculating the local outlier factors in LOF algorithm.Leave-one partition information gain is introduced in the process of calculating the local outlier factor.The weight of attribute is determined by leave-one partition information gain.The weighted distance is used in calculating distances between objects.Extensive experimental results show the advantages of the proposed method.NLOF algorithm can improve the outlier detection accuracy,reduce the time complexity and realize the effective local outlier detection.

Key words: Data mining,Outlier detection,Information entropy,Clustering

[1] Hawkins D.Identification of Outliers [M].Londen:Chapman and Hall,1980:188
[2] Han Sang-jun,Cho S-B,et al.Evolutionary Neural Networks for Anomaly Detection Based on the Behavior of a Program [J].IEEE Transactions on Systems,Man,and Cybernetics-Part B:Cybernetics,2006,36(3):559-570
[3] Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mi-ning outliers from large data sets[J].ACM Sigmoid Record,2000,9(2):427-438
[4] Hung Wen-liang,Yang Min-shen.An Omission Approach forDetecting Outliers in Fuzzy Regression Models [J].Fuzzy Sets and Systems,2006,157(23):3109-3122
[5] Liu Xiao-hui,Cheng Gong-xian,Wu J X.Analyzing OutliersCautiously[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(2):432-437
[6] Breunig M,Kriegel H P,Ng R,et al.LOF:Identifying Density-based Local Outliers,2000[C]∥Proc.of the ACM SIGMOD International Conference on Management of Data.[s.1.]:ACM press,2000:93-104
[7] Tang J,Chen Z,Fu A,et al.Enhancing effectiveness of outlier detections for low-density patterns,2002[C]∥Proceeding of Advances in Knowledge Discovery and Data Mining 6th Pacific Asia Conference,Lecture Notes in Computer Science.Taipei,China,2002:535-548
[8] Ni Wei-wei,Chen Geng,Lu Jie-ping,et al.Local Entropy BasedWeighted Subspace Outlier Mining Algorithm[J].Journal of Computer Research Development,2008,45(7):1189-1194
[9] Papadimitirou S,Kitagawa H,Gibbons P B,et al.LOCI:Fastoutlier detection using the local correlation integral [C]∥Proc of the 19th Int Conf on Data Engineering.Los Alamitos:IEEE Computer Society,2003:315-326
[10] 薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463
[11] 胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF[J].计算机研究与发展,2010,7(12):2110-2116
[12] 张净,孙志挥,等.基于信息论的高维海量数据离群点挖掘[J].计算机科学,2011,8(7):148-161

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!