摘要: 基于密度的局部离群点检测算法(LOF)的时间复杂度较高且不适用于大规模数据集和高维数据集的离群点检测。通过对LOF算法的分析,提出了一种新的局部离群点检测算法NLOF,该算法的主要思想如下:在数据对象邻域查询过程中,尽可能地利用已知信息优化邻近对象的邻域查询操作,有关邻域的计算查找都采用这种思想。首先通过聚类算法DBSCAN对数据集进行预处理,得到初步的异常数据集。然后利用LOF算法中计算局部异常因子的方法计算初步异常数据集中对象的局部异常程度。在计算数据对象的局部异常因子的过程中,引入去一划分信息熵增量,用去一划分信息熵差确定属性的权重,対属性的权值做具体的量化,在计算各对象之间的距离时采用加权距离。 在真实数据集上 对NLOF算法进行了充分的验证。结果显示,该算法能够提高离群点检测的精度,降低时间复杂度,实现有效的局部离群点的检测。
[1] Hawkins D.Identification of Outliers [M].Londen:Chapman and Hall,1980:188 [2] Han Sang-jun,Cho S-B,et al.Evolutionary Neural Networks for Anomaly Detection Based on the Behavior of a Program [J].IEEE Transactions on Systems,Man,and Cybernetics-Part B:Cybernetics,2006,36(3):559-570 [3] Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mi-ning outliers from large data sets[J].ACM Sigmoid Record,2000,9(2):427-438 [4] Hung Wen-liang,Yang Min-shen.An Omission Approach forDetecting Outliers in Fuzzy Regression Models [J].Fuzzy Sets and Systems,2006,157(23):3109-3122 [5] Liu Xiao-hui,Cheng Gong-xian,Wu J X.Analyzing OutliersCautiously[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(2):432-437 [6] Breunig M,Kriegel H P,Ng R,et al.LOF:Identifying Density-based Local Outliers,2000[C]∥Proc.of the ACM SIGMOD International Conference on Management of Data.[s.1.]:ACM press,2000:93-104 [7] Tang J,Chen Z,Fu A,et al.Enhancing effectiveness of outlier detections for low-density patterns,2002[C]∥Proceeding of Advances in Knowledge Discovery and Data Mining 6th Pacific Asia Conference,Lecture Notes in Computer Science.Taipei,China,2002:535-548 [8] Ni Wei-wei,Chen Geng,Lu Jie-ping,et al.Local Entropy BasedWeighted Subspace Outlier Mining Algorithm[J].Journal of Computer Research Development,2008,45(7):1189-1194 [9] Papadimitirou S,Kitagawa H,Gibbons P B,et al.LOCI:Fastoutlier detection using the local correlation integral [C]∥Proc of the 19th Int Conf on Data Engineering.Los Alamitos:IEEE Computer Society,2003:315-326 [10] 薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463 [11] 胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF[J].计算机研究与发展,2010,7(12):2110-2116 [12] 张净,孙志挥,等.基于信息论的高维海量数据离群点挖掘[J].计算机科学,2011,8(7):148-161 |
No related articles found! |
|