基于反向标签传播的多生成器主动学习算法及其在离群点检测中的应用研究

doi:10.11896/jsjkx.230500034

Abstract

Abstract: The current problem of unbalanced distribution of positive and negative training samples has greatly limited the performance of outlier detection models.The outlier detection algorithm based on active learning can automatically synthesize outliers to balance the training data through active learning of sample distribution.However,the traditional detection method based on active learning lacks the quality assessment and filtering of synthetic outliers,which leads to the fact that the noise in the synthetic training samples degrades the performance of classification models.Aiming at the above problems,a multi-generator adversarial learning algorithm based on reverse label propagation(MG-RLP) is proposed,which consists of multiple neural network generators and a discriminator for outlier boundary detection.MG-RLP uses multiple sub-generators to generate sample data with multi-distribution features to prevent the mode collapse problem caused by the excessive aggregation of training samples synthesized by a single generator.At the same time,the proposed method utilizes the reverse label propagation to evaluate the quality of the sample points generated to screen out credible synthetic samples.The filtered samples are retained in the training samples to iteratively train the discriminator to improve the detection performance of outliers.The MG-RLP is compared with six typical outlier detection algorithms on five public datasets.The results show that the proposed algorithm improves AUC and detection precision by 15% and 22% respectively,which verifies its effectiveness.

Key words: Outlier detection, Active learning, Generative adversarial networks, Label propagation

CLC Number:

TP181

XING Kaiyan, CHEN Wen. Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and ItsApplication in Outlier Detection[J].Computer Science, 2024, 51(4): 359-365.

References

[1]YANG Y,FAN C J,CHEN L,et al.IPMOD:An efficient outlier detection model for high-dimensional medical data streams[J].Expert Systems with Applications,2022,191:116212.
[2]BEULAH J R,PUNITHAVATHANI D S.An efficient mixed attribute outlier detection method for identifying network intrusions[J].International Journal of Information Security and Privacy(IJISP),2020,14(3):115-133.
[3]SU Y,ZHAO Y,SUN M,et al.Detecting outlier machine instances through gaussian mixture variational autoencoder with one dimensional cnn[J].IEEE Transactions on Computers,2021,71(4):892-905.
[4]HILAL W,GADSDEN S A,YAWNEY J.Financial Fraud:A Review of Anomaly Detection Techniques and Recent Advances[J].Expert System with Application,2022,193:116429.
[5]BERGMANN P,BATZNER K,FAUSER M,et al.The MVTec anomaly detection dataset:a comprehensive real-world dataset for unsupervised anomaly detection[J].International Journal of Computer Vision,2021,129(4):1038-1059.
[6]VINUE G,EPIFANIO I.Robust archetypoids for anomaly detection in big functional data[J].Advances in Data Analysis and Classification,2021,15:437-462.
[7]WILLIAMS J,HILL R R,PIGNATIELLO JR J J,et al.Wavelet analysis of variance box plot[J].Journal of Applied Statistics,2022,49(14):3536-3563.
[8]YANG J,CHEN Y,RAHARDJA S.Neighborhood representative for improving outlier detectors[J].Information Sciences,2023,625:192-205.
[9]LI K,GAO X,FU S,et al.Robust outlier detection based on the changing rate of directed density ratio[J].Expert Systems with Applications,2022,207:117988.
[10]MUHR D,AFFENZELLER M.Little data is often enough for distance-based outlier detection[J].Procedia Computer Science,2022,200:984-992.
[11]PEŁKA M.Outlier Identification for Symbolic Data with theApplication of the DBSCAN Algorithm[C]//Modern Classification and Data Analysis:Methodology and Applications to Micro-and Macroeconomic Problems.Cham:Springer International Publishing,2022:53-62.
[12]HINNEBURG A,KEIM D A.An efficient approach to clustering in large multimedia databases with noise[M].Bibliothek der Universität Konstanz,1998.
[13]WANG W,YANG J,MUNTZ R.STING:A statistical information grid approach to spatial data mining[C]//VLDB.1997:186-195.
[14]ALAVERDYAN Z,JUNG J,BOUET R,et al.Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging:application to epilepsy lesion screening[J].Medical Image Analysis,2020,60:101618.
[15]DÉSIR C,BERNARD S,PETITJEAN C,et al.One class random forests[J].Pattern Recognition,2013,46(12):3490-3506.
[16]FAN W,MILLER M,STOLFO S,et al.Using artificial anomalies to detect unknown and known network intrusions[J].Knowledge and Information Systems,2004,6:507-527.
[17]HEMPSTALK K,FRANK E,WITTEN I H.One-class classification by combining density and class probability estimation[C]//Machine Learning and Knowledge Discovery in Data-bases:European Conference,ECML PKDD 2008,Antwerp,Belgium.Berlin Heidelberg:Springer,2008:505-519.
[18]DAI Z,YANG Z,YANG F,et al.Good semi-supervised learning that requires a bad GAN[J].arXiv:1705.09783,2017.
[19]LIU Y,LI Z,ZHOU C,et al.Generative adversarial active lear-ning for unsupervised outlier detection[J].IEEE Transactions on Knowledge and Data Engineering,2019,32(8):1517-1528.
[20]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a high-dimensional distribution[J].Neural Computation,2001,13(7):1443-1471.
[21]SCHLEGL T,SEEBÖCK P,WALDSTEIN S M,et al.Unsupervised anomaly detection with generative adversarial networks to guide marker discovery[C]//Information Processing in Medical Imaging:25th International Conference,IPMI 2017,Boone,NC,USA.Cham:Springer International Publishing,2017:146-157.
[22]ZENATI H,FOO C S,LECOUAT B,et al.Efficient gan-based anomaly detection[J].arXiv:1802.06222,2018.
[23]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.Ganomaly:Semi-supervised anomaly detection via adversarial training[C]//Computer Vision－ACCV 2018:14th Asian Conference on Computer Vision,Perth,Australia,Revised Selected Papers,Part III 14.Springer International Publishing,2019:622-637.
[24]DEECKE L,VANDERMEULEN R,RUFF L,et al.Imageanomaly detection with generative adversarial networks[C]//Machine Learning and Knowledge Discovery in Databases:European Conference,ECML PKDD 2018,Dublin,Ireland.Springer International Publishing,2019:3-17.
[25]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[26]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[J].arXiv:1706.08500,2017.
[27]ZHU X,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation[J].Tech Report,2002,3175(2004):237-244.
[28]BADRINARAYANAN V,GALASSO F,CIPOLLA R.Labelpropagation in video sequences[C]//2010 IEEE Computer So-ciety Conference on Computer Vision and Pattern Recognition.IEEE,2010:3265-3272.
[29]XIE J,SZYMANSKI B K.Community detection using a neighborhood strength driven label propagation algorithm[C]//2011 IEEE Network Science Workshop.IEEE,2011:188-195.
[30]WU Z H,LIN Y F,GREGORY S,et al.Balanced multi-label propagation for overlapping community detection in social networks[J].Journal of Computer Science and Technology,2012,27(3):468-479.
[31]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30:891-927.
[32]HAN S,HU X,HUANG H,et al.Adbench:Anomaly detection benchmark[J].Advances in Neural Information Processing Systems,2022,35:32142-32159.
[33]LI Z,ZHAO Y,BOTTA N,et al.COPOD:copula-based outlier detection[C]//2020 IEEE International Conference on Data Mining(ICDM).IEEE,2020:1118-1123.
[34]XU H,PANG G,WANG Y,et al.Deep isolation forest foranomaly detection[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(12):12591-12604.
[35]RAMASWAMY S,RASTOGI R,SHIM K.Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000:427-438.
[36]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identi-fying density-based local outliers[C]//Proceedings of the 2000 ACM Sigmod International Conference on Management of Data.2000:93-104.
[37]RUFF L,VANDERMEULEN R,GOERNITZ N,et al.Deepone-class classification[C]//International Conference on Machine Learning.PMLR,2018:4393-4402.
[38]GOODGE A,HOOI B,NG S K,et al.Lunar:Unifying local outlier detection methods via graph neural networks[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2022:6737-6745.
[39]ZHAO Y,NASRULLAH Z,LI Z.Pyod:A python toolbox for scalable outlier detection[J].arXiv:1901.01588,2019.

Related Articles 15

[1]	XU Maolong, JIANG Gaoxia, WANG Wenjian. Label Noise Filtering Framework Based on Outlier Detection [J]. Computer Science, 2024, 51(2): 87-99.
[2]	ZHOU Shenghao, YUAN Weiwei, GUAN Donghai. Local Interpretable Model-agnostic Explanations Based on Active Learning and Rational Quadratic Kernel [J]. Computer Science, 2024, 51(2): 245-251.
[3]	WU Guibin, YANG Zongyuan, XIONG Yongping, ZHANG Xing, WANG Wei. Seal Removal Based on Generative Adversarial Gated Convolutional Network [J]. Computer Science, 2024, 51(1): 198-206.
[4]	YAN Yan, SUI Yi, SI Jianwei. Remote Sensing Image Pan-sharpening Method Based on Generative Adversarial Network [J]. Computer Science, 2023, 50(8): 133-141.
[5]	QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang. Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation [J]. Computer Science, 2023, 50(6A): 220200092-6.
[6]	WANG Jinwei, ZENG Kehui, ZHANG Jiawei, LUO Xiangyang, MA Bin. GAN-generated Face Detection Based on Space-Frequency Convolutional Neural Network [J]. Computer Science, 2023, 50(6): 216-224.
[7]	CAI Shaotian, CHEN Xiaojun, CHEN Longteng, QIU Liping. Stratified Pseudo-label Based Image Clustering [J]. Computer Science, 2023, 50(6): 225-235.
[8]	GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[9]	XU Jie, ZHOU Xinzhi. Multi-elite Interactive Learning Based Particle Swarm Optimization Algorithm with Adaptive Bound-handling Technique [J]. Computer Science, 2023, 50(11): 210-219.
[10]	DING Hongxin, ZOU Peinie, ZHAO Junfeng, WANG Yasha. Active Learning-based Text Entity and Relation Joint Extraction Method [J]. Computer Science, 2023, 50(10): 126-134.
[11]	ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[12]	HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[13]	XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[14]	HE Xi, HE Ke-tai, WANG Jin-shan, LIN Shen-wen, YANG Jing-lin, FENG Yu-chao. Analysis of Bitcoin Entity Transaction Patterns [J]. Computer Science, 2022, 49(6A): 502-507.
[15]	XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and ItsApplication in Outlier Detection

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0