计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 359-365.doi: 10.11896/jsjkx.230500034
邢开颜, 陈文
XING Kaiyan, CHEN Wen
摘要: 当前正负类训练样本分布不均衡的问题已极大地限制了离群检测模型的性能。基于主动学习的离群点检测算法能够通过对样本分布的主动学习,自动合成离群点以平衡训练数据分布。然而,传统的基于主动学习的检测方法缺乏对合成离群点的质量评估和过滤筛选,导致通过主动学习过程合成的训练样本点中存在样本噪声,并降低了分类模型的性能。针对上述问题,提出了基于反向标签传播的多生成器主动学习算法(Multi-Generator Active Learning Algorithm Based on Reverse Label Propagation,MG-RLP),其包括多个神经网络生成器和一个用于离群点边界检测的鉴别器。MG-RLP通过多个子生成器生成多分布特征的样本数据,以防止单生成器合成的训练样本过于聚集而导致的模式崩塌问题。同时,MG-RLP利用反向标签传播过程对神经网络生成的样本点进行质量评估,以筛选出可信的合成样本。筛选后的样本被保留在训练样本中用于对鉴别器进行迭代训练,以提升对离群点的检测性能。基于5个公共数据集,对比验证了MG-RLP与6种典型的离群点检测算法的性能,结果表明,MG-RLP在AUC和检测精度指标上分别提高了15%和22%,结果验证了MG-RLP的有效性。
中图分类号:
[1]YANG Y,FAN C J,CHEN L,et al.IPMOD:An efficient outlier detection model for high-dimensional medical data streams[J].Expert Systems with Applications,2022,191:116212. [2]BEULAH J R,PUNITHAVATHANI D S.An efficient mixed attribute outlier detection method for identifying network intrusions[J].International Journal of Information Security and Privacy(IJISP),2020,14(3):115-133. [3]SU Y,ZHAO Y,SUN M,et al.Detecting outlier machine instances through gaussian mixture variational autoencoder with one dimensional cnn[J].IEEE Transactions on Computers,2021,71(4):892-905. [4]HILAL W,GADSDEN S A,YAWNEY J.Financial Fraud:A Review of Anomaly Detection Techniques and Recent Advances[J].Expert System with Application,2022,193:116429. [5]BERGMANN P,BATZNER K,FAUSER M,et al.The MVTec anomaly detection dataset:a comprehensive real-world dataset for unsupervised anomaly detection[J].International Journal of Computer Vision,2021,129(4):1038-1059. [6]VINUE G,EPIFANIO I.Robust archetypoids for anomaly detection in big functional data[J].Advances in Data Analysis and Classification,2021,15:437-462. [7]WILLIAMS J,HILL R R,PIGNATIELLO JR J J,et al.Wavelet analysis of variance box plot[J].Journal of Applied Statistics,2022,49(14):3536-3563. [8]YANG J,CHEN Y,RAHARDJA S.Neighborhood representative for improving outlier detectors[J].Information Sciences,2023,625:192-205. [9]LI K,GAO X,FU S,et al.Robust outlier detection based on the changing rate of directed density ratio[J].Expert Systems with Applications,2022,207:117988. [10]MUHR D,AFFENZELLER M.Little data is often enough for distance-based outlier detection[J].Procedia Computer Science,2022,200:984-992. [11]PEŁKA M.Outlier Identification for Symbolic Data with theApplication of the DBSCAN Algorithm[C]//Modern Classification and Data Analysis:Methodology and Applications to Micro-and Macroeconomic Problems.Cham:Springer International Publishing,2022:53-62. [12]HINNEBURG A,KEIM D A.An efficient approach to clustering in large multimedia databases with noise[M].Bibliothek der Universität Konstanz,1998. [13]WANG W,YANG J,MUNTZ R.STING:A statistical information grid approach to spatial data mining[C]//VLDB.1997:186-195. [14]ALAVERDYAN Z,JUNG J,BOUET R,et al.Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging:application to epilepsy lesion screening[J].Medical Image Analysis,2020,60:101618. [15]DÉSIR C,BERNARD S,PETITJEAN C,et al.One class random forests[J].Pattern Recognition,2013,46(12):3490-3506. [16]FAN W,MILLER M,STOLFO S,et al.Using artificial anomalies to detect unknown and known network intrusions[J].Knowledge and Information Systems,2004,6:507-527. [17]HEMPSTALK K,FRANK E,WITTEN I H.One-class classification by combining density and class probability estimation[C]//Machine Learning and Knowledge Discovery in Data-bases:European Conference,ECML PKDD 2008,Antwerp,Belgium.Berlin Heidelberg:Springer,2008:505-519. [18]DAI Z,YANG Z,YANG F,et al.Good semi-supervised learning that requires a bad GAN[J].arXiv:1705.09783,2017. [19]LIU Y,LI Z,ZHOU C,et al.Generative adversarial active lear-ning for unsupervised outlier detection[J].IEEE Transactions on Knowledge and Data Engineering,2019,32(8):1517-1528. [20]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a high-dimensional distribution[J].Neural Computation,2001,13(7):1443-1471. [21]SCHLEGL T,SEEBÖCK P,WALDSTEIN S M,et al.Unsupervised anomaly detection with generative adversarial networks to guide marker discovery[C]//Information Processing in Medical Imaging:25th International Conference,IPMI 2017,Boone,NC,USA.Cham:Springer International Publishing,2017:146-157. [22]ZENATI H,FOO C S,LECOUAT B,et al.Efficient gan-based anomaly detection[J].arXiv:1802.06222,2018. [23]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.Ganomaly:Semi-supervised anomaly detection via adversarial training[C]//Computer Vision-ACCV 2018:14th Asian Conference on Computer Vision,Perth,Australia,Revised Selected Papers,Part III 14.Springer International Publishing,2019:622-637. [24]DEECKE L,VANDERMEULEN R,RUFF L,et al.Imageanomaly detection with generative adversarial networks[C]//Machine Learning and Knowledge Discovery in Databases:European Conference,ECML PKDD 2018,Dublin,Ireland.Springer International Publishing,2019:3-17. [25]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144. [26]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].arXiv:1706.08500,2017. [27]ZHU X,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation[J].Tech Report,2002,3175(2004):237-244. [28]BADRINARAYANAN V,GALASSO F,CIPOLLA R.Labelpropagation in video sequences[C]//2010 IEEE Computer So-ciety Conference on Computer Vision and Pattern Recognition.IEEE,2010:3265-3272. [29]XIE J,SZYMANSKI B K.Community detection using a neighborhood strength driven label propagation algorithm[C]//2011 IEEE Network Science Workshop.IEEE,2011:188-195. [30]WU Z H,LIN Y F,GREGORY S,et al.Balanced multi-label propagation for overlapping community detection in social networks[J].Journal of Computer Science and Technology,2012,27(3):468-479. [31]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30:891-927. [32]HAN S,HU X,HUANG H,et al.Adbench:Anomaly detection benchmark[J].Advances in Neural Information Processing Systems,2022,35:32142-32159. [33]LI Z,ZHAO Y,BOTTA N,et al.COPOD:copula-based outlier detection[C]//2020 IEEE International Conference on Data Mining(ICDM).IEEE,2020:1118-1123. [34]XU H,PANG G,WANG Y,et al.Deep isolation forest foranomaly detection[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(12):12591-12604. [35]RAMASWAMY S,RASTOGI R,SHIM K.Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000:427-438. [36]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identi-fying density-based local outliers[C]//Proceedings of the 2000 ACM Sigmod International Conference on Management of Data.2000:93-104. [37]RUFF L,VANDERMEULEN R,GOERNITZ N,et al.Deepone-class classification[C]//International Conference on Machine Learning.PMLR,2018:4393-4402. [38]GOODGE A,HOOI B,NG S K,et al.Lunar:Unifying local outlier detection methods via graph neural networks[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2022:6737-6745. [39]ZHAO Y,NASRULLAH Z,LI Z.Pyod:A python toolbox for scalable outlier detection[J].arXiv:1901.01588,2019. |
|