计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 260-266.doi: 10.11896/j.issn.1002-137X.2019.03.039

• 人工智能 • 上一篇    下一篇

一种基于谱嵌入和局部密度的离群点检测算法

李长镜1,赵书良1,池云仙2   

  1. (河北师范大学数学与信息科学学院 石家庄 050024)1
    (河北师范大学资源与环境科学学院 石家庄 050024)2
  • 收稿日期:2018-02-11 修回日期:2018-05-28 出版日期:2019-03-15 发布日期:2019-03-22
  • 作者简介:李长镜(1990-),男,硕士生,主要研究方向为数据挖掘、大数据;池云仙(1987-),女,博士生,主要研究方向为数据挖掘、地理信息处理。
  • 基金资助:
    国家自然科学基金(71271067),国家社科基金重大项目(13&ZD091),河北省高等学校科学技术研究项目(QN2014196)资助

Outlier Detection Algorithm Based on Spectral Embedding and Local Density

LI Chang-jing1,ZHAO Shu-liang1,CHI Yun-xian2   

  1. (College of Mathematics and Information Science,Hebei Normal University,Shijiangzhuang 050024,China)1
    (College of Resources and Environmental Science,Hebei Normal University,Shijiangzhuang 050024,China)2
  • Received:2018-02-11 Revised:2018-05-28 Online:2019-03-15 Published:2019-03-22

摘要: 离群点检测问题是数据挖掘领域的研究热点之一。现有的检测算法主要应用于离群点位于初始属性子空间或底层子空间各种线性组合等情况,当离群点嵌入局部非线性子空间时,进行离群点有效检测的难度很大。为此,文中分析了典型的谱嵌入算法在离群点检测上存在的不足,然后以局部密度为基础,提出了一种基于谱嵌入和局部密度的离群点检测算法。该算法采用迭代策略对不重要的特征向量进行高效筛查,以发现有助于检测出局部非线性子空间离群点的特征向量,并利用上一次迭代获得的基于局部密度的谱嵌入结果来改进下一次迭代的相似度图,经过多次迭代可以将离群点从正常点中分离。仿真实验结果表明,所提算法的检测精度优于当前其他典型算法,且该算法对参数的设置不敏感。

关键词: 迭代策略, 检测精度, 局部密度, 离群点检测, 谱嵌入, 相似度图

Abstract: Outlier detection is one of the hot topics in the field of data mining.The existing detection algorithms are mainly applied to the cases where outliers lie in initial attribute subspace or various linear combinations of underlying subspace,when the outliers are embedded in local nonlinear subspace,it is very difficult to detect the outliers effectively.To solve this problem,the shortcomings of typical spectral embedding algorithm for outlier detection were firstly analyzed,and then on the basis of local density,an outlier detection algorithm based on spectral embedding and local density was proposed.The algorithm which uses iterative strategy can efficiently screen unimportant eigenvectors and discover eigenvectors that are relevant for finding outliers hidden in local non-linear subspaces,and the local density-based spectral embedding from a previous iteration is used for improving the similarity graph for the next iteration,such that outliers are gradually segregated from inliers during these iterations.The simulation results show that the detection accuracy of the proposed algorithm is better than other typical algorithms,and it is not sensitive to the parameter setting.

Key words: Detection accuracy, Iterative strategy, Local density, Outlier detection, Similarity graph, Spectral embedding

中图分类号: 

  • TP393
[1]RAHMANI M,ATIA G K.Randomized robust subspace reco-
very and outlier detection for high dimensional data matrices[J].IEEE Transactions on Signal Processing,2017,65(6):1580-1594.
[2]FAN F F,LI Z H,CHEN Q,et al.An Outlier-detection Based Approach for Automatic Entity Matching[J].Chinese Journal of Computers,2017,40(10):2197-2211.(in Chinese)
樊峰峰,李战怀,陈群,等.一种基于离群点检测的自动实体匹配方法[J].计算机学报,2017,40(10):2197-2211.
[3]TEMPL M,HRON K,FILZMOSER P.Exploratory tools for outlier detection in compositional data with structural zeros[J].Journal of Applied Statistics,2017,44(4):734-752.
[4]YANG J H,DENG T Q.A One-Cluster Kernel PCM Based
SVDD Method for Outlier Detection [J].Acta Electronica Sinica,2017,45(4):813-819.(in Chinese)
杨金鸿,邓廷权.一种基于单簇核PCM的SVDD离群点检测方法[J].电子学报,2017,45(4):813-819.
[5]RO K,ZOU C,WANG Z,et al.Outlier detection for high-dimensional data[J].Biometrika,2015,102(3):589-599.
[6]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[J].ACM Sigmod Record,2010,29(2):93-104.
[7]KRIEGEL H P,ZIMEK A.Angle-based outlier detection in
high-dimensional data[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Las Vegas,Nevada,USA:ACM Press,2008:444-452.
[8]DANG X H,MICENKOV B,ASSENT I,et al.Outlier detection with space transformation and spectral analysis[C]∥Proceedings of the 13th SIAM International Conference on Data Mining.Austin,Texas,USA:IEEE Press,2013:225-233.
[9]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analysis and an algorithm[C]∥26th Annual Conference on Neural Information Processing Systems 2012.Lake Tahoe,Nevada,United States:IEEE Press,2012:849-856.
[10]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,22(8):888-905.
[11]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30(4):891-927.
[12]YANG Y,MA Z,YANG Y,et al.Multitask spectral clustering by exploring intertask correlation[J].IEEE Transactions on Cybernetics,2015,45(5):1083-1094.
[13]BI W,CAI M,LIU M,et al.A big data clustering algorithm for mitigating the risk of customer churn[J].IEEE Transactions on Industrial Informatics,2016,12(3):1270-1281.
[14]GU Y,LIU T,JIA X,et al.Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification[J].IEEE Transactions on Geoscience and Remote Sensing,2016,54(6):3235-3247.
[1] 李鹏, 刘力军, 黄永东.
基于地标表示的联合谱嵌入和谱旋转的谱聚类算法
Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation
计算机科学, 2021, 48(6A): 220-225. https://doi.org/10.11896/jsjkx.210100167
[2] 汤鑫瑶, 张正军, 储杰, 严涛.
基于自然最近邻的密度峰值聚类算法
Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor
计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112
[3] 钟颖宇, 陈松灿.
高阶多视图离群点检测
High-order Multi-view Outlier Detection
计算机科学, 2020, 47(9): 99-104. https://doi.org/10.11896/jsjkx.200600170
[4] 陈春涛, 陈优广.
基于影响空间的稳健密度峰值聚类算法
Influence Space Based Robust Fast Search and Density Peak Clustering Algorithm
计算机科学, 2019, 46(11): 216-221. https://doi.org/10.11896/jsjkx.181001846
[5] 苟杰,马自堂,张喆程.
PODKNN:面向大数据集的并行离群点检测算法
PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset
计算机科学, 2016, 43(7): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2016.07.045
[6] 杨 军,田振华,李龙杰,王小鹏.
基于网格Laplace的三维几何模型分割
Segmentation of 3D Geometric Models Based on Mesh Laplace
计算机科学, 2015, 42(5): 295-299. https://doi.org/10.11896/j.issn.1002-137X.2015.05.060
[7] 洪 沙,林佳丽,张月良.
基于密度的不确定数据离群点检测研究
Density-based Outlier Detection on Uncertain Data
计算机科学, 2015, 42(5): 230-233. https://doi.org/10.11896/j.issn.1002-137X.2015.05.046
[8] 姜元凯,郑洪源,丁秋林.
一种基于密度的不确定数据离群点检测算法
On Density Based Outlier Detection for Uncertain Data
计算机科学, 2015, 42(4): 172-176. https://doi.org/10.11896/j.issn.1002-137X.2015.04.034
[9] 黄宏涛,吴忠良,万庆生,黄少滨.
基于限界传递相似度图的FCA概念相似度计算方法
FCA Concept Similarity Computation Based on Bounded Transitive Similarity Graph
计算机科学, 2015, 42(1): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2015.01.063
[10] 王敬华,赵新想,张国燕,刘建银.
NLOF:一种新的基于密度的局部离群点检测算法
NLOF:A New Density-based Local Outlier Detecting Algorithm
计算机科学, 2013, 40(8): 181-185.
[11] 薛安荣 鞠时光.
基于空间约束的离群点挖掘

计算机科学, 2007, 34(6): 207-209.
[12] 贺春林.
一种基于视频的车辆检测算法

计算机科学, 2005, 32(5): 243-245.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!