计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 260-266.doi: 10.11896/j.issn.1002-137X.2019.03.039
李长镜1,赵书良1,池云仙2
LI Chang-jing1,ZHAO Shu-liang1,CHI Yun-xian2
摘要: 离群点检测问题是数据挖掘领域的研究热点之一。现有的检测算法主要应用于离群点位于初始属性子空间或底层子空间各种线性组合等情况,当离群点嵌入局部非线性子空间时,进行离群点有效检测的难度很大。为此,文中分析了典型的谱嵌入算法在离群点检测上存在的不足,然后以局部密度为基础,提出了一种基于谱嵌入和局部密度的离群点检测算法。该算法采用迭代策略对不重要的特征向量进行高效筛查,以发现有助于检测出局部非线性子空间离群点的特征向量,并利用上一次迭代获得的基于局部密度的谱嵌入结果来改进下一次迭代的相似度图,经过多次迭代可以将离群点从正常点中分离。仿真实验结果表明,所提算法的检测精度优于当前其他典型算法,且该算法对参数的设置不敏感。
中图分类号:
[1]RAHMANI M,ATIA G K.Randomized robust subspace reco- very and outlier detection for high dimensional data matrices[J].IEEE Transactions on Signal Processing,2017,65(6):1580-1594. [2]FAN F F,LI Z H,CHEN Q,et al.An Outlier-detection Based Approach for Automatic Entity Matching[J].Chinese Journal of Computers,2017,40(10):2197-2211.(in Chinese) 樊峰峰,李战怀,陈群,等.一种基于离群点检测的自动实体匹配方法[J].计算机学报,2017,40(10):2197-2211. [3]TEMPL M,HRON K,FILZMOSER P.Exploratory tools for outlier detection in compositional data with structural zeros[J].Journal of Applied Statistics,2017,44(4):734-752. [4]YANG J H,DENG T Q.A One-Cluster Kernel PCM Based SVDD Method for Outlier Detection [J].Acta Electronica Sinica,2017,45(4):813-819.(in Chinese) 杨金鸿,邓廷权.一种基于单簇核PCM的SVDD离群点检测方法[J].电子学报,2017,45(4):813-819. [5]RO K,ZOU C,WANG Z,et al.Outlier detection for high-dimensional data[J].Biometrika,2015,102(3):589-599. [6]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[J].ACM Sigmod Record,2010,29(2):93-104. [7]KRIEGEL H P,ZIMEK A.Angle-based outlier detection in high-dimensional data[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Las Vegas,Nevada,USA:ACM Press,2008:444-452. [8]DANG X H,MICENKOV B,ASSENT I,et al.Outlier detection with space transformation and spectral analysis[C]∥Proceedings of the 13th SIAM International Conference on Data Mining.Austin,Texas,USA:IEEE Press,2013:225-233. [9]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analysis and an algorithm[C]∥26th Annual Conference on Neural Information Processing Systems 2012.Lake Tahoe,Nevada,United States:IEEE Press,2012:849-856. [10]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,22(8):888-905. [11]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30(4):891-927. [12]YANG Y,MA Z,YANG Y,et al.Multitask spectral clustering by exploring intertask correlation[J].IEEE Transactions on Cybernetics,2015,45(5):1083-1094. [13]BI W,CAI M,LIU M,et al.A big data clustering algorithm for mitigating the risk of customer churn[J].IEEE Transactions on Industrial Informatics,2016,12(3):1270-1281. [14]GU Y,LIU T,JIA X,et al.Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification[J].IEEE Transactions on Geoscience and Remote Sensing,2016,54(6):3235-3247. |
[1] | 李鹏, 刘力军, 黄永东. 基于地标表示的联合谱嵌入和谱旋转的谱聚类算法 Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation 计算机科学, 2021, 48(6A): 220-225. https://doi.org/10.11896/jsjkx.210100167 |
[2] | 汤鑫瑶, 张正军, 储杰, 严涛. 基于自然最近邻的密度峰值聚类算法 Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor 计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112 |
[3] | 钟颖宇, 陈松灿. 高阶多视图离群点检测 High-order Multi-view Outlier Detection 计算机科学, 2020, 47(9): 99-104. https://doi.org/10.11896/jsjkx.200600170 |
[4] | 陈春涛, 陈优广. 基于影响空间的稳健密度峰值聚类算法 Influence Space Based Robust Fast Search and Density Peak Clustering Algorithm 计算机科学, 2019, 46(11): 216-221. https://doi.org/10.11896/jsjkx.181001846 |
[5] | 苟杰,马自堂,张喆程. PODKNN:面向大数据集的并行离群点检测算法 PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset 计算机科学, 2016, 43(7): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2016.07.045 |
[6] | 杨 军,田振华,李龙杰,王小鹏. 基于网格Laplace的三维几何模型分割 Segmentation of 3D Geometric Models Based on Mesh Laplace 计算机科学, 2015, 42(5): 295-299. https://doi.org/10.11896/j.issn.1002-137X.2015.05.060 |
[7] | 洪 沙,林佳丽,张月良. 基于密度的不确定数据离群点检测研究 Density-based Outlier Detection on Uncertain Data 计算机科学, 2015, 42(5): 230-233. https://doi.org/10.11896/j.issn.1002-137X.2015.05.046 |
[8] | 姜元凯,郑洪源,丁秋林. 一种基于密度的不确定数据离群点检测算法 On Density Based Outlier Detection for Uncertain Data 计算机科学, 2015, 42(4): 172-176. https://doi.org/10.11896/j.issn.1002-137X.2015.04.034 |
[9] | 黄宏涛,吴忠良,万庆生,黄少滨. 基于限界传递相似度图的FCA概念相似度计算方法 FCA Concept Similarity Computation Based on Bounded Transitive Similarity Graph 计算机科学, 2015, 42(1): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2015.01.063 |
[10] | 王敬华,赵新想,张国燕,刘建银. NLOF:一种新的基于密度的局部离群点检测算法 NLOF:A New Density-based Local Outlier Detecting Algorithm 计算机科学, 2013, 40(8): 181-185. |
[11] | 薛安荣 鞠时光. 基于空间约束的离群点挖掘 计算机科学, 2007, 34(6): 207-209. |
[12] | 贺春林. 一种基于视频的车辆检测算法 计算机科学, 2005, 32(5): 243-245. |
|