计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 359-365.doi: 10.11896/jsjkx.230500034

• 信息安全 • 上一篇    下一篇

基于反向标签传播的多生成器主动学习算法及其在离群点检测中的应用研究

邢开颜, 陈文   

  1. 四川大学网络空间安全学院 成都610065
  • 收稿日期:2023-05-06 修回日期:2023-09-11 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 陈文(wenchen@scu.edu.cn)
  • 作者简介:(xingkaiyan@stu.scu.edu.cn)
  • 基金资助:
    国家重点研发计划(020YFB1805405,2019QY0800);国家自然科学基金(U19A2068,61872255)

Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and ItsApplication in Outlier Detection

XING Kaiyan, CHEN Wen   

  1. School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China
  • Received:2023-05-06 Revised:2023-09-11 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Key Research and Development Program of China(020YFB1805405,2019QY0800) and National Natural Science Foundation of China(U19A2068,61872255).

摘要: 当前正负类训练样本分布不均衡的问题已极大地限制了离群检测模型的性能。基于主动学习的离群点检测算法能够通过对样本分布的主动学习,自动合成离群点以平衡训练数据分布。然而,传统的基于主动学习的检测方法缺乏对合成离群点的质量评估和过滤筛选,导致通过主动学习过程合成的训练样本点中存在样本噪声,并降低了分类模型的性能。针对上述问题,提出了基于反向标签传播的多生成器主动学习算法(Multi-Generator Active Learning Algorithm Based on Reverse Label Propagation,MG-RLP),其包括多个神经网络生成器和一个用于离群点边界检测的鉴别器。MG-RLP通过多个子生成器生成多分布特征的样本数据,以防止单生成器合成的训练样本过于聚集而导致的模式崩塌问题。同时,MG-RLP利用反向标签传播过程对神经网络生成的样本点进行质量评估,以筛选出可信的合成样本。筛选后的样本被保留在训练样本中用于对鉴别器进行迭代训练,以提升对离群点的检测性能。基于5个公共数据集,对比验证了MG-RLP与6种典型的离群点检测算法的性能,结果表明,MG-RLP在AUC和检测精度指标上分别提高了15%和22%,结果验证了MG-RLP的有效性。

关键词: 离群点检测, 主动学习, 生成对抗网络, 标签传播

Abstract: The current problem of unbalanced distribution of positive and negative training samples has greatly limited the performance of outlier detection models.The outlier detection algorithm based on active learning can automatically synthesize outliers to balance the training data through active learning of sample distribution.However,the traditional detection method based on active learning lacks the quality assessment and filtering of synthetic outliers,which leads to the fact that the noise in the synthetic training samples degrades the performance of classification models.Aiming at the above problems,a multi-generator adversarial learning algorithm based on reverse label propagation(MG-RLP) is proposed,which consists of multiple neural network generators and a discriminator for outlier boundary detection.MG-RLP uses multiple sub-generators to generate sample data with multi-distribution features to prevent the mode collapse problem caused by the excessive aggregation of training samples synthesized by a single generator.At the same time,the proposed method utilizes the reverse label propagation to evaluate the quality of the sample points generated to screen out credible synthetic samples.The filtered samples are retained in the training samples to iteratively train the discriminator to improve the detection performance of outliers.The MG-RLP is compared with six typical outlier detection algorithms on five public datasets.The results show that the proposed algorithm improves AUC and detection precision by 15% and 22% respectively,which verifies its effectiveness.

Key words: Outlier detection, Active learning, Generative adversarial networks, Label propagation

中图分类号: 

  • TP181
[1]YANG Y,FAN C J,CHEN L,et al.IPMOD:An efficient outlier detection model for high-dimensional medical data streams[J].Expert Systems with Applications,2022,191:116212.
[2]BEULAH J R,PUNITHAVATHANI D S.An efficient mixed attribute outlier detection method for identifying network intrusions[J].International Journal of Information Security and Privacy(IJISP),2020,14(3):115-133.
[3]SU Y,ZHAO Y,SUN M,et al.Detecting outlier machine instances through gaussian mixture variational autoencoder with one dimensional cnn[J].IEEE Transactions on Computers,2021,71(4):892-905.
[4]HILAL W,GADSDEN S A,YAWNEY J.Financial Fraud:A Review of Anomaly Detection Techniques and Recent Advances[J].Expert System with Application,2022,193:116429.
[5]BERGMANN P,BATZNER K,FAUSER M,et al.The MVTec anomaly detection dataset:a comprehensive real-world dataset for unsupervised anomaly detection[J].International Journal of Computer Vision,2021,129(4):1038-1059.
[6]VINUE G,EPIFANIO I.Robust archetypoids for anomaly detection in big functional data[J].Advances in Data Analysis and Classification,2021,15:437-462.
[7]WILLIAMS J,HILL R R,PIGNATIELLO JR J J,et al.Wavelet analysis of variance box plot[J].Journal of Applied Statistics,2022,49(14):3536-3563.
[8]YANG J,CHEN Y,RAHARDJA S.Neighborhood representative for improving outlier detectors[J].Information Sciences,2023,625:192-205.
[9]LI K,GAO X,FU S,et al.Robust outlier detection based on the changing rate of directed density ratio[J].Expert Systems with Applications,2022,207:117988.
[10]MUHR D,AFFENZELLER M.Little data is often enough for distance-based outlier detection[J].Procedia Computer Science,2022,200:984-992.
[11]PEŁKA M.Outlier Identification for Symbolic Data with theApplication of the DBSCAN Algorithm[C]//Modern Classification and Data Analysis:Methodology and Applications to Micro-and Macroeconomic Problems.Cham:Springer International Publishing,2022:53-62.
[12]HINNEBURG A,KEIM D A.An efficient approach to clustering in large multimedia databases with noise[M].Bibliothek der Universität Konstanz,1998.
[13]WANG W,YANG J,MUNTZ R.STING:A statistical information grid approach to spatial data mining[C]//VLDB.1997:186-195.
[14]ALAVERDYAN Z,JUNG J,BOUET R,et al.Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging:application to epilepsy lesion screening[J].Medical Image Analysis,2020,60:101618.
[15]DÉSIR C,BERNARD S,PETITJEAN C,et al.One class random forests[J].Pattern Recognition,2013,46(12):3490-3506.
[16]FAN W,MILLER M,STOLFO S,et al.Using artificial anomalies to detect unknown and known network intrusions[J].Knowledge and Information Systems,2004,6:507-527.
[17]HEMPSTALK K,FRANK E,WITTEN I H.One-class classification by combining density and class probability estimation[C]//Machine Learning and Knowledge Discovery in Data-bases:European Conference,ECML PKDD 2008,Antwerp,Belgium.Berlin Heidelberg:Springer,2008:505-519.
[18]DAI Z,YANG Z,YANG F,et al.Good semi-supervised learning that requires a bad GAN[J].arXiv:1705.09783,2017.
[19]LIU Y,LI Z,ZHOU C,et al.Generative adversarial active lear-ning for unsupervised outlier detection[J].IEEE Transactions on Knowledge and Data Engineering,2019,32(8):1517-1528.
[20]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a high-dimensional distribution[J].Neural Computation,2001,13(7):1443-1471.
[21]SCHLEGL T,SEEBÖCK P,WALDSTEIN S M,et al.Unsupervised anomaly detection with generative adversarial networks to guide marker discovery[C]//Information Processing in Medical Imaging:25th International Conference,IPMI 2017,Boone,NC,USA.Cham:Springer International Publishing,2017:146-157.
[22]ZENATI H,FOO C S,LECOUAT B,et al.Efficient gan-based anomaly detection[J].arXiv:1802.06222,2018.
[23]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.Ganomaly:Semi-supervised anomaly detection via adversarial training[C]//Computer Vision-ACCV 2018:14th Asian Conference on Computer Vision,Perth,Australia,Revised Selected Papers,Part III 14.Springer International Publishing,2019:622-637.
[24]DEECKE L,VANDERMEULEN R,RUFF L,et al.Imageanomaly detection with generative adversarial networks[C]//Machine Learning and Knowledge Discovery in Databases:European Conference,ECML PKDD 2018,Dublin,Ireland.Springer International Publishing,2019:3-17.
[25]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[26]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].arXiv:1706.08500,2017.
[27]ZHU X,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation[J].Tech Report,2002,3175(2004):237-244.
[28]BADRINARAYANAN V,GALASSO F,CIPOLLA R.Labelpropagation in video sequences[C]//2010 IEEE Computer So-ciety Conference on Computer Vision and Pattern Recognition.IEEE,2010:3265-3272.
[29]XIE J,SZYMANSKI B K.Community detection using a neighborhood strength driven label propagation algorithm[C]//2011 IEEE Network Science Workshop.IEEE,2011:188-195.
[30]WU Z H,LIN Y F,GREGORY S,et al.Balanced multi-label propagation for overlapping community detection in social networks[J].Journal of Computer Science and Technology,2012,27(3):468-479.
[31]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30:891-927.
[32]HAN S,HU X,HUANG H,et al.Adbench:Anomaly detection benchmark[J].Advances in Neural Information Processing Systems,2022,35:32142-32159.
[33]LI Z,ZHAO Y,BOTTA N,et al.COPOD:copula-based outlier detection[C]//2020 IEEE International Conference on Data Mining(ICDM).IEEE,2020:1118-1123.
[34]XU H,PANG G,WANG Y,et al.Deep isolation forest foranomaly detection[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(12):12591-12604.
[35]RAMASWAMY S,RASTOGI R,SHIM K.Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000:427-438.
[36]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identi-fying density-based local outliers[C]//Proceedings of the 2000 ACM Sigmod International Conference on Management of Data.2000:93-104.
[37]RUFF L,VANDERMEULEN R,GOERNITZ N,et al.Deepone-class classification[C]//International Conference on Machine Learning.PMLR,2018:4393-4402.
[38]GOODGE A,HOOI B,NG S K,et al.Lunar:Unifying local outlier detection methods via graph neural networks[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2022:6737-6745.
[39]ZHAO Y,NASRULLAH Z,LI Z.Pyod:A python toolbox for scalable outlier detection[J].arXiv:1901.01588,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!