基于知识蒸馏和高效通道注意力的异常检测

doi:10.11896/jsjkx.220900034

计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220900034-10.doi: 10.11896/jsjkx.220900034

基于知识蒸馏和高效通道注意力的异常检测

周士金, 邢红杰

河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室河北保定 071002

发布日期:2023-11-09
通讯作者: 邢红杰(hjxing@hbu.edu.cn)
作者简介:(549409090@qq.com)
基金资助:
国家自然科学基金(61672205);河北省自然科学基金(F2017201020);河北大学高层次人才科研启动项目(521100222002);河北大学附属医院基金项目(2019Q003);复杂能源系统智能计算教育部工程研究中心开放基金(ESIC202101)

Novelty Detection Method Based on Knowledge Distillation and Efficient Channel Attention

ZHOU Shijin, XING Hongjie

Hebei Key Laboratory of Machine Learning and Computational Intelligence,College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China

Published:2023-11-09
About author:ZHOU Shijin,born in 1997,postgra-duate.His main research interests include novelty detection and generative adversarial network.
XING Hongjie,born in 1976,Ph.D,professor,master supervisor.His main research interests include kernel me-thods,neural networks,novelty detection,and ensemble learning.
Supported by:
National Natural Science Foundation of China(61672205),Natural Science Foundation of Hebei Province(F2017201020),High-Level Talents Research Start-Up Project of Hebei University(521100222002),Affiliated Hospital Foundation Project of Hebei University(2019Q003) and Open Foundation of Engineering Research Center of Intelligent Computing for Complex Energy Systems(ESIC202101).

摘要/Abstract

摘要： 基于知识蒸馏的异常检测方法通常将经过预训练的网络作为教师网络,并将与该教师网络的模型结构及规模大小相同的网络用作学生网络,对于待测数据,利用教师网络与学生网络之间的差异判定其为正常数据或异常数据。然而,教师网络与学生网络的结构和规模均相同,一方面,会使得基于知识蒸馏的异常检测方法在异常数据上产生的差异过小;另一方面,教师网络的预训练数据集在规模上远大于学生网络的训练集,这会使得学生网络产生大量的冗余信息。为了解决上述问题,将高效通道注意力(Efficient Channel Attention,ECA)模块引入到基于知识蒸馏的异常检测方法中,利用ECA的跨通道交互策略,设计比教师网络结构更简单且规模更小的学生网络,既可以有效地获取正常数据的特征,去除冗余信息,又能增大教师网络与学生网络之间的差异,提高异常检测的性能。在6个图像数据集上的实验结果表明,与其他5种相关方法相比,所提方法取得了更优的检测性能。

关键词: 异常检测, 知识蒸馏, 注意力机制, 教师网络, 学生网络

Abstract: The knowledge distillation based novelty detection method usually utilizes the pre-trained network as the teacher network.The network that has the same model structure and size as the teacher network is used as the student network.For testing data,the difference between the teacher network and the student network is utilized to discriminate them as normal or novel.However,the teacher network and the student network have the same network structure and size.On the one hand,the know-ledge distillation based novelty detection method may produce a small difference in the novel data.On the other hand,because the pre-trained data set of the teacher network is much larger in scale than the training set of the student network,the student network may thus obtain lots of redundant information.To solve this problem,the efficient channel attention(ECA) module is introduced into the knowledge distillation based novelty detection method.Utilizing the cross-channel interaction strategy,the student network with a simpler network structure and smaller size in comparison with the teacher network is designed.Hence,the features of the normal data can be efficiently obtained.The redundant information may be removed.The difference between the teacher network and the student network can also be enlarged.Moreover,the novelty detection performance may be improved.In comparison with 5 related methods,experimental results on the 6 image data sets demonstrate that the proposed method obtains better detection performance.

Key words: Novelty detection, Knowledge distillation, Attention mechanism, Teacher network, Student network

中图分类号:

TP391.4

周士金, 邢红杰. 基于知识蒸馏和高效通道注意力的异常检测[J]. 计算机科学, 2023, 50(11A): 220900034-10. https://doi.org/10.11896/jsjkx.220900034

ZHOU Shijin, XING Hongjie. Novelty Detection Method Based on Knowledge Distillation and Efficient Channel Attention[J]. Computer Science, 2023, 50(11A): 220900034-10. https://doi.org/10.11896/jsjkx.220900034

参考文献

[1]RUFF L,KAUFFMANN J R,VANDERMEULEN R A,et al.A Unifying Review of Deep and Shallow Anomaly Detection[J].Proceedings of the IEEE,2021,109(5):756-795.
[2]MALAIYA R K,KWON D,KIM J,et al.An Empirical Evalu-ation of Deep Learning for Network Anomaly Detection[C]//2018 International Conference on Computing,Networking and Communications(ICNC).IEEE,2018.
[3]ZHENG Y J,ZHOU X H,SHENG W G,et al.Generative ad-versarial network based telecom fraud detection at the receiving bank[J].Neural Networks,2018,102:78-86.
[4]ZHAO R,YAN R,CHEN Z,et al.Deep learning and its applications to machine health monitoring[J].Mechanical Systems and Signal Processing,2019,115(15):213-237.
[5]GUO P,XUE Z,MTEMA Z,et al.Ensemble Deep Learning for Cervix Image Selection toward Improving Reliability in Automated Cervical Precancer Screening[J].Diagnostics(Basel),2020,10(7):451.
[6]ZHANG Z,CHEN S,SUN L.P-KDGAN:Progressive Know-ledge Distillation with GANs for One-class Novelty Detection[C]//Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence.2020.
[7]SALEHI M,SADJADI N,BASELIZADEH S,et al.Multiresolution Knowledge Distillation for Anomaly Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Nashville,TN,USA,2021.
[8]HINTON G,VINYALS O,DEAN J.Distilling the Knowledgein a Neural Network[J].arXiv:1503.02531,2015.
[9]ZAGORUYKO S,KOMODAKIS N.Paying More Attention to Attention:Improving the Performance of Convolutional Neural Networks via Attention Transfer[J].arXiv:1612.03928,2016.
[10]HUANG Z,WANG N.Like What You Like:Knowledge Distill via Neuron Selectivity Transfer[J].arXiv:1707.01219,2017.
[11]KIM J,PARK S,KWAK N.Paraphrasing complex network:Network compression via factor transfer[J].Advances in Neural Information Processing Systems,2018,31:2765-2774.
[12]HEO B,LEE M,YUN S,et al.Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons[C]//Proceedings of the AAAI Conference on Artificial Intelligence,Honolulu,Hawaii,USA,2019.
[13]PASSALIS N,TZELEPI M,TEFAS A.Heterogeneous Know-ledge Distillation Using Information Flow Modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,Seattle,WA,USA,2020.
[14]PASSALIS N,TEFAS A.Learning Deep Representations withProbabilistic Knowledge Transfer[C]//Proceedings of the European Conference on Computer Vision(ECCV).Cham:Sprin-ger,2018.
[15]JIN X,PENG B,WU Y,et al.Knowledge Distillation via Route Constrained Optimization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2019.
[16]CHEN D,MEI J P,ZHANG Y,et al.Cross-Layer Distillationwith Semantic Calibration[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021.
[17]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.GANomaly:Semi-supervised Anomaly Detection via Adversarial Training[C]//Asian Conference on Computer Vision.Springer,2018.
[18]WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient Channel Attention for Deep Convolutional Neural Networks[C]//IEEE/CVF Conference on Computer Vision Pattern Recognition,Seattle,WA,USA,2020.
[19]HU J,SHEN L,SUN G.Squeeze-and-Excitation Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Salt Lake City,UT,USA,2018.
[20]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of the European Conference on Computer Vision(ECCV).Cham:Springer,2018.
[21]HU J,SHEN L,ALBANIE S,et al.Gather-Excite:Exploiting Feature Context in Convolutional Neural Networks[C]//Advances in Neural Information Processing Systems 31(NeurIPS 2018).2018.
[22]ROY A G,NAVAB N,WACHINGER C.Recalibrating FullyConvolutional Networks With Spatial and Channel “Squeeze and Excitation” Blocks[J].IEEE Transactions on Medical Imaging,2019,38(2):540-549.
[23]GAO Z,XIE J,WANG Q,et al.Global Second-Order PoolingConvolutional Networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019.
[24]FU J,LIU J,TIAN H,et al.Dual Attention Network for Scene Segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019.
[25]NAIR V,HINTON G E.Rectified Linear Units Improve Re-stricted Boltzmann Machines[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning.Haifa,Israel,2010:807-814.
[26]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning,PMLR,2015.
[27]CHEN Y,DAI X,LIU M,et al.Dynamic ReLU[C]//Procee-dings of the Computer Vision-ECCV 2020.Cham:Springer International Publishing,2020.
[28]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[29]XIAO H,RASUL K,VOLLGRAF R.Fashion-MNIST:a Novel Image Dataset for Benchmarking Machine Learning Algorithms[J].arXiv:1708.07747,2017.
[30]KRIZHEVSKY A,HINTON G.Learning Multiple Layers ofFeatures from Tiny Images[J/OL].https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0D60E5DD558A91470E0EA1725FF36E0A?doi=10.1.1.222.9220&rep=rep1&type=pdf.
[31]NETZER Y,WANG T,COATES A,et al.Reading digits in natural images with unsupervised feature learning[J/OL].http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
[32]COATES A,NG A,LEE H.An Analysis of Single-Layer Networks in Unsupervised Feature Learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics.2011.
[33]BERGMANN P,FAUSER M,SATTLEGGER D,et al.MVTecAD－A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition(CVPR).Los Alamitos,CA,USA,2019.
[34]FAWCETT T.An introduction to ROC analysis[J].PatternRecognition Letters,2006,27(8):861-874.
[35]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining Knowledge Discovery,2016,30(4):891-927.
[36]GONG D,LIU L,LE V,et al.Memorizing Normality to Detect Anomaly:Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).Long Beach,CA,USA,2019.
[37]SCHLEGL T,SEEBOCK P,WALDSTEIN S M,et al.f-AnoGAN:Fast unsupervised anomaly detection with generative adversarial networks[J].Med Image Anal,2019,54:30-44.
[38]RUFF L,VANDERMEULEN R,GOERNITZ N,et al.DeepOne-Class Classification[C]//Proceedings of the 35th International Conference on Machine Learning,Stockholm.PMLR,2018.
[39]CHENG H,YANG L,LIU Z.Relation-Based Knowledge Distillation for Anomaly Detection[C]//Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Springer,2021.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于知识蒸馏和高效通道注意力的异常检测

Novelty Detection Method Based on Knowledge Distillation and Efficient Channel Attention

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0