Computer Science ›› 2021, Vol. 48 ›› Issue (2): 121-127.doi: 10.11896/jsjkx.191100141

• Database & Big Data & Data Science • Previous Articles     Next Articles

Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis

ZOU Cheng-ming1,2,3, CHEN De2   

  1. 1 Hubei Key Laboratory of Transportation Internet of Things Technology,Wuhan 430070,China
    2 School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China
    3 Peng Cheng Laboratory,Shenzhen,Guangdong 518000,China
  • Received:2019-11-19 Revised:2020-04-02 Online:2021-02-15 Published:2021-02-04
  • About author:ZOU Cheng-ming,born in 1975,Ph.D,professor,is a member of China Computer Federation.His main research interests include computer vision,embedded system,software theory and method.
    CHEN De,born in 1995,postgraduate.His main research interests include deep learning,data mining and so on.
  • Supported by:
    The National Key R&D Program of China(2018YFC0704300).

Abstract: Unsupervised anomaly detection on high-dimensional data is one of the most significant challenges in machine learning.Although previous approaches based on single deep auto-encoder and density estimations have made significant progress,they generate low-dimensional representations as they use only a single deep auto-encoder,indicating that there is insufficient information to perform the subsequent density estimation task.To address the above challenge,a mixed auto-encoding gaussian mixture model (MAGMM) is proposed in this paper.MAGMM substitutes a single deep auto-encoder with a mixture of auto-encoders to generate concatenated low-dimensional representations,so that it can preserve key information from a specific cluster of the input sample.In addition,it utilizes an allocation network to constrain the mixture of auto-encoders,so that each sample can be assigned to a dominant auto-encoder.With the above mechanisms,MAGMM avoids from trapping into local optima and reduces the recons-truction errors,which can facilitate completing the density estimation tasks and improve the accuracy of high-dimensional data anomaly detection.Experimental results show that the proposed method performs better than DAGMM and achieves up to 29% improvement based on the standard F1 score.

Key words: Data mining, Density estimation, Dimensionality reduction, Gaussian mixture model, Unsupervised anomaly detection

CLC Number: 

  • TP391
[1] HUANG D,MU D,YANG L,et al.CoDetect:financial fraud detection with anomaly feature detection[J].IEEE Access,2018,6:19161-19174.
[2] VIEGAS E,SANTIN A,BESSANI A,et al.BigFlow:Real-time and reliable anomaly-based intrusion detection for high-speed networks[J].Future Generation Computer Systems,2019,93:473-485.
[3] SANEJA B,RANI R.An efficient approach for outlier detection in big sensor data of health care[J].International Journal of Communication Systems,2017,30(17):e3352.
[4] CHEN Z,HUANG Y,ZOU H.Anomaly Detection of Industrial Control System Based on Outlier Mining[J].Computer Science,2014,41(5):178-181.
[5] ZIMEK A,SCHUBERT E,KRIEGEL H P.A survey on un-supervised outlier detection in high dimensional numerical data[J].Statistical Analysis and Data Mining:The ASA Data Science Journal,2012,5(5):363-387.
[6] RADOVANOVI M,NANOPOULOS A,IVANOVI M.Reverse nearest neighbors in unsupervised distance-based outlier detection[J].IEEE Transactions on Knowledge and Data Enginee-ring,2014,27(5):1369-1382.
[7] YANG B,FU X,SIDIROPOULOS N D,et al.Towardsk-means-friendly spaces:Simultaneous deep learning and clustering[C]//Proceedings of the 34th International Conference on Machine Learning.2017:3861-3870.
[8] CAND$\tilde{\mathrm{E}}$S E J,LI X,MA Y,et al.Robust principal componentanalysis?[J].Journal of the ACM,2011,58(3):1-37.
[9] ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018:781-795.
[10] EHSAN A M,DICK A,VAN D H A.Infinite variational autoencoder for semi-supervised learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5888-5897.
[11] ZHANG D,SUN Y,ERIKSSON B,et al.Deep unsupervisedclustering using mixture of autoencoders[J].arXiv:1712.07788,2017.
[12] CHANDOLA V,BANERJEE A,KUMAR V.Anomaly detection:A survey[J].ACM Computing Surveys (CSUR),2009,41(3):15.1-15.58.
[13] AGGARWAL C C.Outlier analysis[C]//Data mining.Springer,Cham,2015:237-263.
[14] WU J F,JIN Y D,TANG P.Survey on Monitoring Techniques for Data Abnormalities[J].Computer Science,2017,44(Z11):24-28.
[15] JOLLIFFE I.Principal component analysis[M].Berlin Heidelberg:Springer,2011.
[16] SCHÖLKOPF B,SMOLA A,Müller K R.Kernel principal component analysis[C]//International conference on artificial neural networks.Berlin,Heidelberg:Springer,1997:583-588.
[17] XIA Y,CAO X,WEN F,et al.Learning discriminative reconstructions for unsupervised outlier removal[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1511-1519.
[18] AN J,CHO S.Variational autoencoder based anomaly detection using reconstruction probability[J].Special Lecture on IE,2015,2(1):216-234.
[19] ZHAI S,CHENG Y,LU W,et al.Deep structured energy based models for anomaly detection[J].arXiv:1605.07717,2016.
[20] ZHOU C,PAFFENROTH R C.Anomaly detection with robust deep autoencoders[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2017:665-674.
[21] DUDA R O,HART P E,STORK D G.Pattern classification[M].John Wiley & Sons,2012.
[22] BISHOP C M.Neural networks for pattern recognition[M].Oxford University Press,1995.
[23] YANG X,HUANG K,GOULERMAS J Y,et al.Joint learning of unsupervised dimensionality reduction and gaussian mixture model[J].Neural Processing Letters,2017,45(3):791-806.
[24] SCHÖLKOPF B,PLATT J C,SHAWE T J,et al.Estimating the support of a high-dimensional distribution[J].Neural computation,2001,13(7):1443-1471.
[25] TAX D M J,DUIN R P W.Support vector data description[J].Machine learning,2004,54(1):45-66.
[26] YANG X,HUANG K,ZHANG R.Unsupervised dimensionality reduction for gaussian mixture model[C]//InternationalConfe-rence on Neural Information Processing.Springer,Cham,2014:84-92.
[27] TÜSKE Z,TAHIR M A,SCHLÜTER R,et al.Integrating Gaussian mixtures into deep neural networks:Softmax layer with hidden variables[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2015:4285-4289.
[28] HUBER P J.Robust statistics[M].Berlin,Heidelberg:Springer,2011.
[1] LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[2] YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
[3] YANG Hui, TAO Li-hong, ZHU Jian-yong, NIE Fei-ping. Fast Unsupervised Graph Embedding Based on Anchors [J]. Computer Science, 2022, 49(4): 116-123.
[4] KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong. Review of K-means Algorithm Optimization Based on Differential Privacy [J]. Computer Science, 2022, 49(2): 162-173.
[5] MA Dong, LI Xin-yuan, CHEN Hong-mei, XIAO Qing. Mining Spatial co-location Patterns with Star High Influence [J]. Computer Science, 2022, 49(1): 166-174.
[6] ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou. Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index [J]. Computer Science, 2022, 49(1): 121-132.
[7] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[8] XIN Yuan-xue, SHI Peng-fei, XUE Rui-yang. Moving Object Detection Based on Region Extraction and Improved LBP Features [J]. Computer Science, 2021, 48(7): 233-237.
[9] ZHANG Hui. Fault Localization Technology Based on Program Mutation and Gaussian Mixture Model [J]. Computer Science, 2021, 48(6A): 572-574.
[10] XU Hui-hui, YAN Hua. Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children [J]. Computer Science, 2021, 48(6): 210-214.
[11] ZHANG Yan-jin, BAI Liang. Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph [J]. Computer Science, 2021, 48(4): 111-116.
[12] ZHANG Han-shuo, YANG Dong-ju. Technology Data Analysis Algorithm Based on Relational Graph [J]. Computer Science, 2021, 48(3): 174-179.
[13] LIU Xin-bin, WANG Li-zhen, ZHOU Li-hua. MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution [J]. Computer Science, 2021, 48(11): 208-218.
[14] WANG Wei-dong, XU Jin-hui, ZHANG Zhi-feng, YANG Xi-bei. Gaussian Mixture Models Algorithm Based on Density Peaks Clustering [J]. Computer Science, 2021, 48(10): 191-196.
[15] LIU Xiao-nan, SONG Hui-chao, WANG Hong, JIANG Duo, AN Jia-le. Survey on Improvement and Application of Grover Algorithm [J]. Computer Science, 2021, 48(10): 315-323.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!