计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 121-127.doi: 10.11896/jsjkx.191100141
邹承明1,2,3, 陈德2
ZOU Cheng-ming1,2,3, CHEN De2
摘要: 高维数据的无监督异常检测是机器学习的重要挑战之一。虽然先前基于单一深度自动编码器和密度估计的方法已经取得了显著的进展,但是其仅通过一个深度自编码器来生成低维表示,这表明没有足够的信息来执行后续的密度估计任务。为了解决上述问题,文中提出了一种混合自动编码器高斯混合模型(Mixed Auto-encoding Gaussian Mixture Model,MAGMM)。MAGMM使用混合自动编码器来代替单一深度自动编码器生成串联的低维表示,因此它可以保存来自输入样本的特定集群的关键信息。此外,其利用分配网络来约束混合自动编码器,这样每个样本都可以分配给一个占主导地位的自动编码器。利用上述机制,MAGMM避免了陷入局部最优,降低了重构误差,从而可以促进密度估计任务的完成,提高高维数据异常检测的准确性。实验结果表明,该方法优于DAGMM,并在标准F1分数上提高了29%。
中图分类号:
[1] HUANG D,MU D,YANG L,et al.CoDetect:financial fraud detection with anomaly feature detection[J].IEEE Access,2018,6:19161-19174. [2] VIEGAS E,SANTIN A,BESSANI A,et al.BigFlow:Real-time and reliable anomaly-based intrusion detection for high-speed networks[J].Future Generation Computer Systems,2019,93:473-485. [3] SANEJA B,RANI R.An efficient approach for outlier detection in big sensor data of health care[J].International Journal of Communication Systems,2017,30(17):e3352. [4] CHEN Z,HUANG Y,ZOU H.Anomaly Detection of Industrial Control System Based on Outlier Mining[J].Computer Science,2014,41(5):178-181. [5] ZIMEK A,SCHUBERT E,KRIEGEL H P.A survey on un-supervised outlier detection in high dimensional numerical data[J].Statistical Analysis and Data Mining:The ASA Data Science Journal,2012,5(5):363-387. [6] RADOVANOVI M,NANOPOULOS A,IVANOVI M.Reverse nearest neighbors in unsupervised distance-based outlier detection[J].IEEE Transactions on Knowledge and Data Enginee-ring,2014,27(5):1369-1382. [7] YANG B,FU X,SIDIROPOULOS N D,et al.Towardsk-means-friendly spaces:Simultaneous deep learning and clustering[C]//Proceedings of the 34th International Conference on Machine Learning.2017:3861-3870. [8] CAND$\tilde{\mathrm{E}}$S E J,LI X,MA Y,et al.Robust principal componentanalysis?[J].Journal of the ACM,2011,58(3):1-37. [9] ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018:781-795. [10] EHSAN A M,DICK A,VAN D H A.Infinite variational autoencoder for semi-supervised learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5888-5897. [11] ZHANG D,SUN Y,ERIKSSON B,et al.Deep unsupervisedclustering using mixture of autoencoders[J].arXiv:1712.07788,2017. [12] CHANDOLA V,BANERJEE A,KUMAR V.Anomaly detection:A survey[J].ACM Computing Surveys (CSUR),2009,41(3):15.1-15.58. [13] AGGARWAL C C.Outlier analysis[C]//Data mining.Springer,Cham,2015:237-263. [14] WU J F,JIN Y D,TANG P.Survey on Monitoring Techniques for Data Abnormalities[J].Computer Science,2017,44(Z11):24-28. [15] JOLLIFFE I.Principal component analysis[M].Berlin Heidelberg:Springer,2011. [16] SCHÖLKOPF B,SMOLA A,Müller K R.Kernel principal component analysis[C]//International conference on artificial neural networks.Berlin,Heidelberg:Springer,1997:583-588. [17] XIA Y,CAO X,WEN F,et al.Learning discriminative reconstructions for unsupervised outlier removal[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1511-1519. [18] AN J,CHO S.Variational autoencoder based anomaly detection using reconstruction probability[J].Special Lecture on IE,2015,2(1):216-234. [19] ZHAI S,CHENG Y,LU W,et al.Deep structured energy based models for anomaly detection[J].arXiv:1605.07717,2016. [20] ZHOU C,PAFFENROTH R C.Anomaly detection with robust deep autoencoders[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2017:665-674. [21] DUDA R O,HART P E,STORK D G.Pattern classification[M].John Wiley & Sons,2012. [22] BISHOP C M.Neural networks for pattern recognition[M].Oxford University Press,1995. [23] YANG X,HUANG K,GOULERMAS J Y,et al.Joint learning of unsupervised dimensionality reduction and gaussian mixture model[J].Neural Processing Letters,2017,45(3):791-806. [24] SCHÖLKOPF B,PLATT J C,SHAWE T J,et al.Estimating the support of a high-dimensional distribution[J].Neural computation,2001,13(7):1443-1471. [25] TAX D M J,DUIN R P W.Support vector data description[J].Machine learning,2004,54(1):45-66. [26] YANG X,HUANG K,ZHANG R.Unsupervised dimensionality reduction for gaussian mixture model[C]//InternationalConfe-rence on Neural Information Processing.Springer,Cham,2014:84-92. [27] TÜSKE Z,TAHIR M A,SCHLÜTER R,et al.Integrating Gaussian mixtures into deep neural networks:Softmax layer with hidden variables[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2015:4285-4289. [28] HUBER P J.Robust statistics[M].Berlin,Heidelberg:Springer,2011. |
[1] | 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161 |
[2] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
[3] | 杨辉, 陶力宏, 朱建勇, 聂飞平. 基于锚点的快速无监督图嵌入 Fast Unsupervised Graph Embedding Based on Anchors 计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098 |
[4] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[5] | 张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148 |
[6] | 马董, 李新源, 陈红梅, 肖清. 星型高影响的空间co-location模式挖掘 Mining Spatial co-location Patterns with Star High Influence 计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186 |
[7] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[8] | 赵志强, 易秀双, 李婕, 王兴伟. 基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究 Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm 计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001 |
[9] | 徐慧慧, 晏华. 基于相对危险度的儿童先心病风险因素分析算法 Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children 计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082 |
[10] | 张岩金, 白亮. 一种基于符号关系图的快速符号数据聚类算法 Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph 计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011 |
[11] | 张寒烁, 杨冬菊. 基于关系图谱的科技数据分析算法 Technology Data Analysis Algorithm Based on Relational Graph 计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154 |
[12] | 刘嘉琛, 秦小麟, 朱润泽. 基于LSTM-Attention的RFID移动对象位置预测 Prediction of RFID Mobile Object Location Based on LSTM-Attention 计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134 |
[13] | 刘新斌, 王丽珍, 周丽华. MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法 MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution 计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097 |
[14] | 王卫东, 徐金慧, 张志峰, 杨习贝. 基于密度峰值聚类的高斯混合模型算法 Gaussian Mixture Models Algorithm Based on Density Peaks Clustering 计算机科学, 2021, 48(10): 191-196. https://doi.org/10.11896/jsjkx.200800191 |
[15] | 刘晓楠, 宋慧超, 王洪, 江舵, 安家乐. Grover算法改进与应用综述 Survey on Improvement and Application of Grover Algorithm 计算机科学, 2021, 48(10): 315-323. https://doi.org/10.11896/jsjkx.201100141 |
|