Computer Science ›› 2024, Vol. 51 ›› Issue (8): 20-33.doi: 10.11896/jsjkx.230600052

• Database & Big Data & Data Science • Previous Articles     Next Articles

Review of Outlier Detection Algorithms

KONG Lingchao, LIU Guozhu   

  1. School of Information Science and Technology,Qingdao,Shandong 266061,China
  • Received:2023-06-06 Revised:2023-12-13 Online:2024-08-15 Published:2024-08-13
  • About author:KONG Lingchao,born in 1998,postgraduate,is a student member of CCF(No.G8696G).His main research interests include data mining and fault detection.
    LIU Guozhu,born in 1965,Ph.D,professor,master supervisor.His main research interests include network security and fault detection.
  • Supported by:
    National Natural Science Foundation of China(61973180).

Abstract: Outlier detection,as an important research direction in the field of data mining,aims to discover data points in a dataset that are different from the majority and have potential analytical value,assistresearchers in identifying potential issues in the data source.Currently,outlier detection has been widely applied in various domains such as fraud detection,smart healthcare,intrusion detection,and fault diagnosis.This study,based on summarizing previous experiences,first discusses the definition of outliers,their causes,and typical application domains.It reviews the advantages and limitations of classical outlier detection algorithms such as DBSCAN and LOF,as well as their improved algorithms.Additionally,it analyzes the advantages of deep learning me-thods in the field of outlier detection.Secondly,considering the requirements for processing massive,high-dimensional,and temporal data in the current internet context,further research is conducted on the development status of outlier detection algorithms in new environments.Finally,the evaluation indicators of outlier detection algorithms,the role of cost factors in outlier detection evaluation,as well as commonly used toolkits and datasets,are introduced.The challenges and future development directions of outlier detection are summarized and prospected.

Key words: Outliers, Anomaly detection, Deep learning, Time-series data, Data mining

CLC Number: 

  • TP301
[1]HAWKINS D M.Identification of outliers[M].Vol.11.Lon-don:Chapman and Hall,1980.
[2]WANG H,BAH M J,HAMMAD M.Progress in outlier detection techniques:A survey[J].IEEE Access 7(2019):107964-108000.
[3]JIANG F,WANG K L,YU X,et al.Summary of Intrusion Detection Models Based on Deep Learning[J].Control and Decision,2020,35(5):1199-1204.
[4]ZHANG W A,HONG Z,ZHU J W,et al.A survey of network intrusion detection methods for industrial control systems[J].Control and Decision,2019,34(11):2277-2288.
[5]CHENG Z,CHAI S.A cyber intrusion detection method based on focal loss neural network[C]//2020 39th Chinese Control Conference(CCC).IEEE,2020.
[6]ZHOU Y J,HE P F,QIU R F,et al.Research on Intrusion Detection Based on Random Forest and Gradient Boosting Tree[J].Journal of Software,2021,32(10):3254-3265.
[7]LIU Y,YANG K.Credit Fraud Detection for Extremely Imba-lanced Data Based on Ensembled Deep Learning[J].Journal of Computer Research and Development,2021,58(3):539-547.
[8]POURHABIBI T,ONG K L,KAM B H,et al.Fraud detection:A systematic literature review of graph-based anomaly detection approaches[J].Decision Support Systems,2020,133:113303.
[9]AL-HASHEDI K G,MAGALINGAM P.Financial fraud detection applying data mining techniques:A comprehensive review from 2009 to 2019[J].Computer Science Review2021,40:100402.
[10]FIORE U,AD S,PERLA F,et al.Using generative adversarial networks for improving classification effectiveness in credit card fraud detection[J].Information Sciences,2019,479:448-455.
[11]FERNANDO T,GAMMULLE H,DENMAN S,et al.Deeplearning for medical anomaly detection-a survey[J].ACM Computing Surveys(CSUR),2021,54(7):1-37.
[12]HAN C,RUNDO L,MURAO K,et al.MADGAN:Unsupervised medical anomaly detection GAN using multiple adjacent brain MRI slice reconstruction[J].BMC bioinformatics,2021,22(2):1-20.
[13]SHVETSOVA N,BAKKER B,FEDULOVA I,et al.Anomaly detection in medical imaging with deep perceptual autoencoders[J].IEEE Access,2021,9:118571-118583.
[14]POORNIMA I,PARAMASIVAN B.Anomaly detection in wireless sensor network using machine learning algorithm[J].Computer communications,2020,151:331-337.
[15]FRANCESCO C,GIANCARLO F,ANTONIO G,et al.Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance[J].Information Fusion,2019,52:13-30.
[16]ZHOU J T,DU J,ZHU H,et al.Anomalynet:An anomaly detection network for video surveillance[J].IEEE Transactions on Information Forensics and Security,2019,14(10):2537-2550.
[17]SULTANI W,CHEN C,SHAH M.Real-world anomaly detection in surveillance videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018.
[18]CHANDOLA V,BANERJEE A,KUMAR V.Anomaly detec-tion:A survey[J].ACM computing surveys(CSUR),2009,41(3):1-58.
[19]XU X,LIU J W,LUO X L.Research on outlier mining[J].Application Research of Computers,2009,26(1):34-40.
[20]XUE A R,YAO L,JU S G,et al.Survey of Outlier Mining[J].Computer Science,2008(11):13-18,27.
[21]MEI L,ZHANG F L,GAO Q.Overview of outlier detectiontechnology[J].Application Research of Computers,2020,37(12):3521-3527.
[22]WU J F,JIN W D,TANG P.Survey on Monitoring Techniques for Data Abnormalities[J].Computer Science,2017,44(S2):24-28.
[23]LEI H L,TUERHONG G,WUSHOUER M,et al.Review of Novelty Detection[J].Computer Engineering and Applications,2021,57(5):47-55.
[24]JOHNSON T,KWOK I,NG R T.Fast Computation of 2-Dimensional Depth Contours[C]//KDD.1998:224-228.
[25]KNOX E M,NG R T.Algorithms for mining distancebased outliers in large datasets[C]//Proceedings of the International Conference on Very Large Data Bases.1998:392-403.
[26]RAMASWAMY S,RASTOGI R,SHIM K.Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000.
[27]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//KDD.1996:226-231.
[28]ERTÖZ L,STEINBACH M,KUMAR V.Finding topics in collections of documents:A shared nearest neighbor approach[J].Clustering and information retrieval.Springer,Boston,MA,2004:83-103.
[29]GUHA S,RASTOGI R,SHIM K.ROCK:A robust clustering algorithm for categorical attributes[J].Information systems,2000,25(5):345-66.
[30]MACQUEEN J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
[31]KOHONEN T.Self-organization and associative memory[M].Springer Science & Business Media.2012.
[32]HE Z,XU X,DENG S.Discovering cluster-based local outliers[J].Pattern recognition letters,2003,24(9/10):1641-1650.
[33]AMER M,GOLDSTEIN M.Nearest-neighbor and clusteringbased anomaly detection algorithms for rapidminer[C]//Proceedings of the 3rd RapidMiner Community Meeting and Conference(RCOMM 2012).2012:1-12.
[34]MUHAMMAD M,DANIEL ANI U,ABDULLAHI A A,et al.Device-Type Profiling for Network Access Control Systems using Clustering-Based Multivariate Gaussian Outlier Score[C]//The 5th International Conference on Future Networks & Distributed Systems.2021.
[35]ALHUSSEIN I,ALI A H.Application of DBSCAN to Anomaly Detection in Airport Terminals[C]//2020 3rd International Conference on Engineering Technology and its Applications(IICETA).IEEE,2020.
[36]ANKERST M,BREUNIG M M,KRIEGEL H P,et al.OP-TICS:Ordering points to identify the clustering structure[J].ACM Sigmod Record,1999,28(2):49-60.
[37]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identi-fying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000.
[38]XU X,LEI Y,ZHOU X.A lof-based method for abnormal segment detection in machinery condition monitoring[C]//2018 Prognostics and System Health Management Conference(PHM-Chongqing).IEEE,2018.
[39]TANG J,CHEN Z,FU A W C,et al.Enhancing effectiveness of outlier detections for low density patterns[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Springer,Berlin,Heidelberg,2002.
[40]JIN W,TUNG A K H,HAN J,et al.Ranking outliers using symmetric neighborhood relationship[C]//Pacific-Asia Confe-rence on Knowledge Discovery and Data Mining.Springer,Berlin,Heidelberg,2006.
[41]KRIEGEL H P,KRÖGER P,SCHUBERT E,et al.LoOP:local outlier probabilities[C]//Proceedings of the 18th ACM Confe-rence on Information and Knowledge Management.2009.
[42]PAPADIMITRIOU S,KITAGAWA H,GIBBONS P B,et al.Loci:Fast outlier detection using the local correlation integral[C]//Proceedings 19th International Conference on Data Engineering(Cat.No.03CH37405).IEEE,2003.
[43]TANG B,HE H.A localdensity-based approach for outlier detection[J].Neurocomputing,2017,241:171-180.
[44]KIRAN B R,THOMAS D M,PARAKKAL R.An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos[J].Journal of Imaging,2018,4(2):36.
[45]CHEN Z,YEO C K,LEE B S,et al.Autoencoder-based network anomaly detection[C]//2018 Wireless Telecommunications Symposium(WTS).IEEE,2018.
[46]WU Y K,LI W,NI M Y,et al.Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder[J].Computer Science,2022,49(3):144-151.
[47]VINCENT P,LAROCHELLE H,LAJOIE I,et al.Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion[J].Journal of Machine Learning Research,2010,11(12):3371-3408.
[48]DOERSCH C.Tutorial on variational autoencoders[J].arXiv:1606.05908,2016.
[49]ZHANG C H,ZHOU X T,ZHANG Y A,et al.Application Research of Deep Auto Encoder in Data Anomaly Detection[J].Computer Engineering and Applications,2020,56(17):93-99.
[50]DI MATTIA F,GALEONE P,DE SIMONI M,et al.A survey on gans for anomaly detection[J].arXiv:1906.11632,2019.
[51]SCHLEGL T,SEEBÖCK P,WALDSTEIN S M,et al.Unsupervised anomaly detection with generative adversarial networks to guide marker discovery[C]//International Conference on Information Processing in Medical Imaging.Cham:Springer,2017:145-157.
[52]ZENATI H,FOO C S,LECOUAT B,et al.Efficient gan-based anomaly detection[J].arXiv:1802.06222,2018.
[53]SCHLEGL T,SEEBÖCK P,WALDSTEIN S M,et al.f-AnoGAN:Fast unsupervised anomaly detection with generative adversarial networks[J].Medical Image Analysis,2019,54:30-44.
[54]DONAHUE J,KRÄHENBÜHL P,DARRELL T.Adversarial feature learning[J].arXiv:1605.09782,2016.
[55]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.Ganomaly:Semi-supervised anomaly detection via adversarial training[C]//Asian Conference on Computer Vision.Cham:Springer,2018.
[56]ARJOVSKY M,CHINTALA S,BOTTOU L.Wasserstein generative adversarial networks[C]//International Conference on Machine Learning.PMLR,2017.
[57]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-imagetranslation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017.
[58]ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent neural network regularization[J].arXiv:1409.2329,2014.
[59]LIU F T,TING K M,ZHOU Z H.Isolation forest[C]//2008 Eighth IEEE International Conference on Data Mining.IEEE,2008:413-422.
[60]LIU F T,TING K M,ZHOU Z H.On detecting clustered anomalies using sciforest[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Sprin-ger,Berlin,Heidelberg,2010.
[61]ZHONG Y Y,CHEN S C.High-order Multi-view Outlier Detection[J].Computer Science,2020,47(9):99-104.
[62]AGGARWAL C C,YU P S.Outlier detection for high dimen-sional data[C]//Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data.2001.
[63]KRIEGEL H P,SCHUBERT M,ZIMEK A.Angle-based outlier detection inhigh-dimensional data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2008.
[64]KRIEGEL H P,KRÖGER P,SCHUBERT E,et al.Outlier detection in axis-parallel subspaces of high dimensional data[C]//Pacific-asia Conference on Knowledge Discovery and Data Mi-ning.Springer,Berlin,Heidelberg,2009.
[65]KELLER F,MULLER E,BOHM K.HiCS:High contrast subspaces for density-based outlierranking[C]//2012 IEEE 28th International Conference on Data Engineering.IEEE,2012.
[66]CHEN S N,QIAN H Y,LI W.Hybrid outlier detection algo-rithm based on angle variance for high-dimensional data[J].Application Research of Computers,2016,33(11):3383-3386.
[67]PHAM N.L1-depth revisited:A robust angle-based outlier factor in high-dimensional space[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2018.
[68]CHANDOLA V,MITHAL V,KUMAR V.Comparative evaluation of anomaly detection techniques for sequence data[C]//2008 Eighth IEEE International Conference on Data Mining.IEEE,2008.
[69]HAWKINS J,AHMAD S.Why neurons have thousands of sy-napses,a theory of sequence memory in neocortex[J].Frontiers in neural circuits,2016,10:174222.
[70]AHMAD S,LAVIN A,PURDY S,et al.Unsupervised real-timeanomaly detection for streaming data[J].Neurocomputing,2017,262:134-147.
[71]XU J,WU H,WANG J,et al.Anomaly Transformer:Time Series Anomaly Detection with Association Discrepancy[J].arXiv:2110.02642,2021.
[72]DEAN J,GHEMAWAT S.MapReduce:Simplified data proces-sing on large clusters[J].Communications of ACM,2008,51(1):107-113.
[73]ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient Distributed Datasets:A {Fault-Tolerant} Abstraction for {In-Memory} Cluster Computing[C]//9th USENIX Symposium on Networked Systems Design and Implementation(NSDI 12).2012.
[74]KANNA P R,SANTHI P.Hybrid intrusion detection using mapreduce based black widow optimized convolutional long short-term memory neural networks[J].Expert Systems with Applications,2022,194:116545.
[75]FATHNIA F,BARAZESH M R,BAYAZ M H J D.RuntimeOptimization of a New Anomaly Detection Method for Smart Metering Data Using Hadoop Map-Reduce[C]//2019 International Power System Conference(PSC).IEEE,2019.
[76]ALNAFESSAH A,CASALE G.Artificial neural networksbased techniques for anomaly detection in Apache Spark[J].Cluster Computing,2020,23(2):1345-1360.
[77]POURHABIBI T,ONG K L,KAM B H,et al.Fraud detection:A systematic literature review of graph-based anomaly detection approaches[J].Decision Support Systems,2020,133:113303.
[78]MA X,WU J,XUE S,et al.A comprehensive survey on graphanomaly detection with deep learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(12):12012-12038.
[79]CHEN B F,LI J D,LU X J,et al.Survey of Deep Learning Based Graph Anomaly Detection Methods[J].Journal of Computer Research and Development,2021,58(7):1436-1455.
[80]MOONESINGHE H D K,TAN P N.Outrank:a graph-based outlier detection framework using random walk[J].Interna-tional Journal on Artificial Intelligence Tools,2008,17(1):19-36.
[81]BANDYOPADHYAY S,VIVEK S V,MURTY M N.Outlierresistant unsupervised deep architectures for attributed network embedding[C]//Proceedings of the 13th International Confe-rence on Web Search and Data Mining.2020.
[82]SU J,DONG Y H,YAN M J,et al.Research progress of anomaly detectionfor complex networks[J].Control and Decision,2021,36(6):1293-1310.
[83]MOJARAD M,NEJATIAN S,PARVIN H,et al.A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters[J].Applied Intelligence,2019,49:2567-2581.
[84]GUO Y L,ZUO X J,CUI J Y.An abnormal behavior detection algorithm based on fuzzy clusteringfor multi-categories affiliation of power entities[J].Journal of Hebei University of Science and Technology,2022,43(5):528-537.
[85]CHEN Z,SHENG V,EDWARDS A,et al.An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection[J].Knowledge and Information Systems,2023,65(1):59-87.
[86]CHEN X,LIU H,XU X,et al.Identification of Suitable Technologies for Drinking Water Quality Prediction:A Comparative Study of Traditional,Ensemble,Cost-Sensitive,Outlier Detection Learning Models and Sampling Algorithms[J].ACS ES&T Water,2021,1(8):1676-1685.
[87]BISONG E.Introduction to Scikit-learn[C]//Building machine learning and deep learning models on Google cloud platform.Apress,Berkeley,CA,2019:215-229.
[88]ZHAO Y,NASRULLAH Z,LI Z.Pyod:A python toolbox for scalable outlier detection[J].arXiv:1901.01588,2019.
[89]SCHUBERT E,ZIMEK A.ELKI:A large open-source libraryfor data analysis-ELKI Release 0.7.5 “Heidelberg”[J].arXiv:1902.03616,2019.
[90]FU L F,CHEN Z,AO C L.Dynamic outlier detection algorithm for network large data set based on classification and regression trees decision tree[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(9):2620-2625.
[91]HUANG J R,WANG Q,CAI X J,et al.Multi-objective Adaptive DBSCAN Outlier Detection Algorithm[J].Journal of Chinese Computer Systems,2022,43(4):702-706.
[1] SUN Yumo, LI Xinhang, ZHAO Wenjie, ZHU Li, LIANG Ya’nan. Driving Towards Intelligent Future:The Application of Deep Learning in Rail Transit Innovation [J]. Computer Science, 2024, 51(8): 1-10.
[2] TANG Ruiqi, XIAO Ting, CHI Ziqiu, WANG Zhe. Few-shot Image Classification Based on Pseudo-label Dependence Enhancement and NoiseInterferenceReduction [J]. Computer Science, 2024, 51(8): 152-159.
[3] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[4] ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[5] GUO Fangyuan, JI Genlin. Video Anomaly Detection Method Based on Dual Discriminators and Pseudo Video Generation [J]. Computer Science, 2024, 51(8): 217-223.
[6] CHEN Siyu, MA Hailong, ZHANG Jianhui. Encrypted Traffic Classification of CNN and BiGRU Based on Self-attention [J]. Computer Science, 2024, 51(8): 396-402.
[7] SHI Dianxi, GAO Yunqi, SONG Linna, LIU Zhe, ZHOU Chenlei, CHEN Ying. Deep-Init:Non Joint Initialization Method for Visual Inertial Odometry Based on Deep Learning [J]. Computer Science, 2024, 51(7): 327-336.
[8] FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
[9] GAN Run, WEI Xianglin, WANG Chao, WANG Bin, WANG Min, FAN Jianhua. Backdoor Attack Method in Autoencoder End-to-End Communication System [J]. Computer Science, 2024, 51(7): 413-421.
[10] YANG Heng, LIU Qinrang, FAN Wang, PEI Xue, WEI Shuai, WANG Xuan. Study on Deep Learning Automatic Scheduling Optimization Based on Feature Importance [J]. Computer Science, 2024, 51(7): 22-28.
[11] ZENG Zihui, LI Chaoyang, LIAO Qing. Multivariate Time Series Anomaly Detection Algorithm in Missing Value Scenario [J]. Computer Science, 2024, 51(7): 108-115.
[12] LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
[13] WANG Li, CHEN Gang, XIA Mingshan, HU Hao. DUWe:Dynamic Unknown Word Embedding Approach for Web Anomaly Detection [J]. Computer Science, 2024, 51(6A): 230300191-5.
[14] HUANG Haixin, CAI Mingqi, WANG Yuyao. Review of Point Cloud Semantic Segmentation Based on Graph Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230400196-7.
[15] ZHANG Le, YU Ying, GE Hao. Mural Inpainting Based on Fast Fourier Convolution and Feature Pruning Coordinate Attention [J]. Computer Science, 2024, 51(6A): 230400083-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!