Computer Science ›› 2019, Vol. 46 ›› Issue (3): 260-266.doi: 10.11896/j.issn.1002-137X.2019.03.039

• Artificial Intelligence • Previous Articles     Next Articles

Outlier Detection Algorithm Based on Spectral Embedding and Local Density

LI Chang-jing1,ZHAO Shu-liang1,CHI Yun-xian2   

  1. (College of Mathematics and Information Science,Hebei Normal University,Shijiangzhuang 050024,China)1
    (College of Resources and Environmental Science,Hebei Normal University,Shijiangzhuang 050024,China)2
  • Received:2018-02-11 Revised:2018-05-28 Online:2019-03-15 Published:2019-03-22

Abstract: Outlier detection is one of the hot topics in the field of data mining.The existing detection algorithms are mainly applied to the cases where outliers lie in initial attribute subspace or various linear combinations of underlying subspace,when the outliers are embedded in local nonlinear subspace,it is very difficult to detect the outliers effectively.To solve this problem,the shortcomings of typical spectral embedding algorithm for outlier detection were firstly analyzed,and then on the basis of local density,an outlier detection algorithm based on spectral embedding and local density was proposed.The algorithm which uses iterative strategy can efficiently screen unimportant eigenvectors and discover eigenvectors that are relevant for finding outliers hidden in local non-linear subspaces,and the local density-based spectral embedding from a previous iteration is used for improving the similarity graph for the next iteration,such that outliers are gradually segregated from inliers during these iterations.The simulation results show that the detection accuracy of the proposed algorithm is better than other typical algorithms,and it is not sensitive to the parameter setting.

Key words: Detection accuracy, Iterative strategy, Local density, Outlier detection, Similarity graph, Spectral embedding

CLC Number: 

  • TP393
[1]RAHMANI M,ATIA G K.Randomized robust subspace reco-
very and outlier detection for high dimensional data matrices[J].IEEE Transactions on Signal Processing,2017,65(6):1580-1594.
[2]FAN F F,LI Z H,CHEN Q,et al.An Outlier-detection Based Approach for Automatic Entity Matching[J].Chinese Journal of Computers,2017,40(10):2197-2211.(in Chinese)
樊峰峰,李战怀,陈群,等.一种基于离群点检测的自动实体匹配方法[J].计算机学报,2017,40(10):2197-2211.
[3]TEMPL M,HRON K,FILZMOSER P.Exploratory tools for outlier detection in compositional data with structural zeros[J].Journal of Applied Statistics,2017,44(4):734-752.
[4]YANG J H,DENG T Q.A One-Cluster Kernel PCM Based
SVDD Method for Outlier Detection [J].Acta Electronica Sinica,2017,45(4):813-819.(in Chinese)
杨金鸿,邓廷权.一种基于单簇核PCM的SVDD离群点检测方法[J].电子学报,2017,45(4):813-819.
[5]RO K,ZOU C,WANG Z,et al.Outlier detection for high-dimensional data[J].Biometrika,2015,102(3):589-599.
[6]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[J].ACM Sigmod Record,2010,29(2):93-104.
[7]KRIEGEL H P,ZIMEK A.Angle-based outlier detection in
high-dimensional data[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Las Vegas,Nevada,USA:ACM Press,2008:444-452.
[8]DANG X H,MICENKOV B,ASSENT I,et al.Outlier detection with space transformation and spectral analysis[C]∥Proceedings of the 13th SIAM International Conference on Data Mining.Austin,Texas,USA:IEEE Press,2013:225-233.
[9]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analysis and an algorithm[C]∥26th Annual Conference on Neural Information Processing Systems 2012.Lake Tahoe,Nevada,United States:IEEE Press,2012:849-856.
[10]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,22(8):888-905.
[11]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30(4):891-927.
[12]YANG Y,MA Z,YANG Y,et al.Multitask spectral clustering by exploring intertask correlation[J].IEEE Transactions on Cybernetics,2015,45(5):1083-1094.
[13]BI W,CAI M,LIU M,et al.A big data clustering algorithm for mitigating the risk of customer churn[J].IEEE Transactions on Industrial Informatics,2016,12(3):1270-1281.
[14]GU Y,LIU T,JIA X,et al.Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification[J].IEEE Transactions on Geoscience and Remote Sensing,2016,54(6):3235-3247.
[1] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[2] LI Peng, LIU Li-jun, HUANG Yong-dong. Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation [J]. Computer Science, 2021, 48(6A): 220-225.
[3] TANG Xin-yao, ZHANG Zheng-jun, CHU Jie, YAN Tao. Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor [J]. Computer Science, 2021, 48(3): 151-157.
[4] LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei. Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database [J]. Computer Science, 2021, 48(2): 93-99.
[5] ZHONG Ying-yu, CHEN Song-can. High-order Multi-view Outlier Detection [J]. Computer Science, 2020, 47(9): 99-104.
[6] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[7] CHEN Chun-tao, CHEN You-guang. Influence Space Based Robust Fast Search and Density Peak Clustering Algorithm [J]. Computer Science, 2019, 46(11): 216-221.
[8] WANG Ying and YANG Yu-wang. KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence [J]. Computer Science, 2018, 45(5): 196-200.
[9] FENG Gui-lan, ZHOU Wen-gang. Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor [J]. Computer Science, 2018, 45(11A): 349-352.
[10] YING Yi, REN Kai, LIU Ya-jun. Network Log Analysis Technology Based on Big Data [J]. Computer Science, 2018, 45(11A): 353-355.
[11] XU Dong, WANG Yan-jun, MENG Yu-long, ZHANG Zi-ying. Improved Data Anomaly Detection Method Based on Isolation Forest [J]. Computer Science, 2018, 45(10): 155-159.
[12] GOU Jie, MA Zi-tang and ZHANG Zhe-cheng. PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset [J]. Computer Science, 2016, 43(7): 251-254.
[13] HONG Sha, LIN Jia-li and ZHANG Yue-liang. Density-based Outlier Detection on Uncertain Data [J]. Computer Science, 2015, 42(5): 230-233.
[14] JIANG Yuan-kai, ZHENG Hong-yuan and DING Qiu-lin. On Density Based Outlier Detection for Uncertain Data [J]. Computer Science, 2015, 42(4): 172-176.
[15] HUANG Hong-tao, WU Zhong-liang, WAN Qing-sheng and HUANG Shao-bin. FCA Concept Similarity Computation Based on Bounded Transitive Similarity Graph [J]. Computer Science, 2015, 42(1): 285-289.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!