计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 53-59.doi: 10.11896/jsjkx.220900027

• 数据库&大数据&数据科学 • 上一篇    下一篇

双编码半监督异常检测模型

李辉, 李文根, 关佶红   

  1. 同济大学电子与信息工程学院 上海 201804
  • 收稿日期:2022-09-05 修回日期:2022-11-30 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 李文根(lwengen@tongji.edu.cn)
  • 作者简介:(2230760@tongji.edu.cn)
  • 基金资助:
    上海市浦江人才计划项目(20PJ1414300);国家自然科学基金联合基金重点支持项目(U1936205);科技部重点研发计划(2021YFC3300300)

Dually Encoded Semi-supervised Anomaly Detection

LI Hui, LI Wengen, GUAN Jihong   

  1. College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China
  • Received:2022-09-05 Revised:2022-11-30 Online:2023-07-15 Published:2023-07-05
  • About author:LI Hui,born in 1997,postgraduate,is a member of China Computer Federation.His main research interests include anomaly detection and big data analysis.LI Wengen,born in 1987,Ph.D,assistant professor,is a member of China Computer Federation.His main research interest is spatio-temporal data management and analysis.
  • Supported by:
    Shanghai Pujiang Program(20PJ1414300),National Natural Science Foundation of China(U1936205) and National Key R & D Program of China(2021YFC3300300).

摘要: 异常检测是机器学习领域广泛研究的一个热点问题,对于工业生产、食品安全、疾病监测等都具有重要作用。当前最新的异常检测方法多基于少量可用的有标记样本和大量无标记样本联合训练半监督检测模型。然而,现有的半监督异常检测模型多采用深度学习框架,在低维数据集上由于缺少足够多的特征信息,难以学习到准确的数据边界,检测性能不佳。针对该问题,提出了双编码半监督异常检测模型(Dually Encoded Semi-supervised Anomaly Detection,DE-SAD),充分利用可获得的少部分有标记数据结合大量无标记数据进行半监督学习,通过双编码阶段约束模型学习更准确的正常数据隐含流形分布,有效拉大了正常数据和异常数据的差距。DE-SAD在来自不同领域的多个异常检测数据集上都表现出优越的异常检测性能,在低维数据上的检测性能尤为突出,其AUROC指标相比当前最优的异常检测方法最高提升了4.6%。

关键词: 异常检测, 半监督学习, 自编码器, 低维数据

Abstract: Anomaly detection is a hot topic that has been widely studied in the field of machine learning and plays an important role in industrial production,food safety,disease monitoring,etc.The latest anomaly detection methods mostly jointly train semi-supervised detection models based on a small number of available labeled samples and many unlabeled samples.However,these existing semi-supervised anomaly detection models mostly use deep learning frameworks.Due to the lack of enough feature information on low-dimensional data sets,it is difficult to learn accurate data boundaries,resulting in insufficient detection perfor-mance.To solve this problem,a dually encoded semi-supervised anomaly detection(DE-SAD)model is proposed.DE-SAD can make full use of a small amount of available labeled data and a large amount of unlabeled data for semi-supervised learning,and learn more accurate implicit manifold distribution of normal data through the dually encoded stage constraint,thus effectively magnifying the gap between normal data and abnormal data.DE-SAD shows excellent ano-maly detection performance on multiple anomaly detection datasets from different fields,especially on low-dimensional data,and its AUROC is up to 4.6% higher than the current state-of-the-art methods.

Key words: Anomaly detection, Semi-supervised learning, Autoencoder, Low-dimensional data

中图分类号: 

  • TP391
[1]PANG G,SHEN C,CAO L,et al.Deep Learning for Anomaly Detection:A Review[J].ACM Computing Surveys,2021,54(2):38:1-38:38.
[2]ILEBERI E,SUN Y,WANG Z.A machine learning based credit card fraud detection using the GA algorithm for feature selection[J].Journal of Big Data,2022,9(1):1-17.
[3]BIN S R,SCHETININ V,SANT P.Review of Machine Lear-ning Approach on Credit Card Fraud Detection[J].Human-Centric Intelligent Systems,2022,2(1/2):55-68.
[4]LI M M,HUANG K,ZITNIK M.Graph representation learning in biomedicine and healthcare[J].arXiv:2104.04883,2022.
[5]WANG J,JIA Y,WANG D,et al.Weighted IForest and siamese GRU on small sample anomaly detection in healthcare[J].Computer Methods and Programs in Biomedicine,2022,218:106706.
[6]CHAGANTI R,RAVI V,PHAM T D.Deep learning basedcross architecture internet of things malware detection and classification[J].Computers & Security,2022,120:102779.
[7]DE PAULA MONTEIRO R,LOZADA M C,MENDIETA D R C,et al.A hybrid prototype selection-based deep learning approach for anomaly detection in industrial machines[J].Expert Systems with Applications,2022,204:117528.
[8]KHARITONOV A,NAHHAS A,POHL M,et al.Comparative analysis of machine learning models for anomaly detection in manufacturing[J].Procedia Computer Science,2022,200:1288-1297.
[9]ZAHEER M Z,MAHMOOD A,KHAN M H,et al.Generative cooperative learning for unsupervised video anomaly detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:14744-14754.
[10]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the Support of a High-Dimensional Distribution[J].Neural Computation,2001,13(7):1443-1471.
[11]WU Y K,LI W,NI M Y,et al.Anomaly Detection Model Based on One-Class Support Vector Machine Fused Deep Autoencoder [J].Computer Science,2022,49(3):144-151.
[12]TAX D M,DUIN R P.Support vector data description[J].Machine Learning,2004,54(1):45-66.
[13]SHYU M L,CHEN S C,SARINNAPAKORN K,et al.A novel anomaly detection scheme based on principal component classi-fier[C]//Proceedings of the 3rd IEEE International Conference on Data Mining.2003:172-179.
[14]LIU F T,TING K M,ZHOU Z H.Isolation Forest[C]//2008 Eighth IEEE International Conference on Data Mining.2008:413-422.
[15]CHENG Z,ZOU C,DONG J.Outlier detection using isolationforest and local outlier factor[C]//Proceedings of the Confe-rence on Research in Adaptive and Convergent Systems.2019:161-168.
[16]ZHANG R J,CHEN W,HANG M X,et al.Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder[J].Computer Science,2021,48(7):62-69.
[17]CHEN Q,DAI Y,LIU G.Research on KPI Anomaly Detection Model for Intelligent Operation and Maintenance[J].Journal of Chongqing University of Technology(Natural Science),2022,36(6):181-188.
[18]ZHOU C,PAFFENROTH R C.Anomaly Detection with Robust Deep Autoencoders[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Halifax NS Canada:ACM,2017:665-674.
[19]ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018:1-19.
[20]RUFF L,VANDERMEULEN R,GOERNITZ N,et al.Deepone-class classification[C]//International Conference on Machine Learning.PMLR,2018:4393-4402.
[21]CHALAPATHY R,CHAWLA S.Deep Learning for Anomaly Detection:A Survey[J].arXiv:1901.03407,2019.
[22]RUFF L,VANDERMEULEN R A,GÖRNITZ N,et al.Deep Semi-Supervised Anomaly Detection[J].arXiv:1906.02694,2020.
[23]GÖRNITZ N,KLOFT M,RIECK K,et al.Toward supervisedanomaly detection[J].Journal of Artificial Intelligence Research,2013,46:235-262.
[24]YUAN F N,ZHANG L,SHI J T,et al.Review of Autoencoder Neural Network Theory and Applications [J].Journal of Computers,2019,42(1):203-230.
[25]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.Ganomaly:Semi-supervised anomaly detection via adversarial training[C]//Asian Conference on Computer Vision.Springer,2018:622-637.
[26]GONG D,TAN M,ZHANG Y,et al.Blind Image Deconvolution by Automatic Gradient Activation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1827-1836.
[27]ODDS-Outlier Detection DataSets[EB/OL].http://odds.cs.stonybrook.edu/.
[28]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!