计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 259-268.doi: 10.11896/jsjkx.221000009
万旭, 毛莺池, 王孜博, 刘意, 平萍
WAN Xu, MAO Yingchi, WANG Zibo, LIU Yi, PING Ping
摘要: 针对传统自蒸馏方法存在数据预处理成本高、局部特征检测缺失,以及模型分类精度低的情况,提出了基于相似一致性的模型自蒸馏方法(Similarity and Consistency by Self-Distillation,SCD),提高模型分类精度。首先,对样本图像的不同层进行学习得到特征图,通过特征权值分布获取注意力图。然后,计算Mini-batch内样本间注意力图的相似性获得相似一致性知识矩阵,构建基于相似一致性的知识,使得无须对实例数据进行失真处理或提取同一类别的数据来获取额外的实例间知识,避免了大量的数据预处理工作带来的训练成本高和训练复杂的问题。最后,将相似一致性知识矩阵在模型中间层之间单向传递,让浅层次的相似矩阵模仿深层次的相似矩阵,细化低层次的相似性,捕获更加丰富的上下文场景和局部特征,解决局部特征检测缺失问题,实现单阶段单向知识转移的自蒸馏。实验结果表明,采用基于相似一致性的模型自蒸馏方法:在公开数据集CIFAR100和TinyImageNet上,验证了SCD提取的相似一致性知识在模型自蒸馏中的有效性,相较于自注意力蒸馏方法(Self Attention Distillation,SAD)和保持相似性的知识蒸馏方法(Similarity-Preserving Knowledge Distillation,SPKD),分类精度平均提升1.42%;相较于基于深度监督的自蒸馏方法(Be Your Own Teacher,BYOT)和动态本地集成知识蒸馏方法(On-the-fly Native Ensemble,ONE),分类精度平均提升1.13%;相较于基于深度神经网络的数据失真引导自蒸馏方法(Data-Distortion Guided Self-Distillation,DDGSD)和基于类间的自蒸馏方法(Class-wise Self-Knowledge Distillation,CS-KD),分类精度平均提升1.23%。
中图分类号:
[1]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778. [2]LI W,ZHU X,GONG S.Person Re-Identification by Deep Joint Learning of Multi-Loss Classification [C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.Melbourne:Morgan Kaufmann,2017:2194-2200. [3]LAN X,ZHU X,GONG S.Person Search by Multi-Scale Ma-tching[C]//Proceedings of European Conference on Computer Vision.Munich:Springer Verlag,2018:536-552. [4]XIE G S,ZHANG Z,LIU L,et al.SRSC:Selective,Robust,and Supervised Constrained Feature Representation for Image Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(10):4290-4302. [5]CAI Q,PAN Y,WANG Y,et al.Learning a Unified SampleWeighting Network for Object Detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:14173-14182. [6]LIU Y,CHEN K,LIU C,et al.Structured Knowledge Distillation for Semantic Segmentation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:2604-2613. [7]PASSALIS N,TZELEPI M,TEFAS A.Heterogeneous Know-ledge Distillation Using Information Flow Modeling[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Greece:IEEE Press,2020:2339-2348. [8]ZHAO L,PENG X,CHEN Y,et al.Knowledge as Priors:Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:6528-6537. [9]XU G,LIU Z,LI X,et al.Knowledge Distillation Meets Self-Supervision[C]//Proceedings of European Conference on Compu-ter Vision.Glasgow:Springer,2020:588-604. [10]ANIL R,PEREYRA G,PASSOS A,et al.Large Scale Distributed Neural Network Training Through Online Distillation[C]//International Conference on Learning Representations.2018:1-12. [11]CHEN D,MEI J P,WANG C,et al.Online Knowledge Distillation with Diverse Peers[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York:AAAI press,2020:3430-3437. [12]GUO Q,WANG X,WU Y,et al.Online Knowledge Distillation via Collaborative Learning[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:11017-11026. [13]WU G,GONG S.Peer Collaborative Learning for Online Know-ledge Distillation[C]//Proceedings of AAAI Conference on Artificial Intelligence.2021. [14]XU T B,LIU C L.Data-Distortion Guided Self-Distillation for Deep Neural Networks[C]//Proceedings of AAAI Conference on Artificial Intelligence.Honolulu:AAAI press,2019:5565-5572. [15]LEE H,HWANG S J,SHIN J.Rethinking Data Augmentation:Self-Supervision and Self-Distillation[J].arXiv:1910.05872,2019. [16]CROWLEY E J,GRAY G,STORKEY A J.Moonshine:Distilling with Cheap Convolutions[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:2888-2898. [17]BARZ B,RODNER E,GARCIA Y G,et al.Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(5):1088-1101. [18]YUN S,PARK J,LEE K,et al.Regularizing Class-Wise Predictions via Self-Knowledge Distillation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:13873-13882. [19]XU T B,LIU C L.Deep Neural Network Self-Distillation Ex-ploiting Data Representation Invariance[J].IEEE Transactions on Neural Networks and Learning Systems,2020,33(1):257-269. [20]LAN X,ZHU X,GONG S.Knowledge Distillation by On-The-Fly Native Ensemble[C]//NeurIPS 2018.2018:7517-7527. [21]HOU S,PAN X,LOY C C,et al.Learning a Unified Classifier Incrementally via Rebalancing[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Re-cognition.Long Beach:IEEE Press,2019:831-839. [22]ZHANG L,SONG J,GAO A,et al.Be Your Own Teacher:Improve the Performance of Convolutional Neural Networks via Self Distillation[C]//Proceedings of IEEE International Confe-rence on Computer Vision.Seoul:IEEE Press,2019:3712-3721. [23]HOU Y,MA Z,LIU C,et al.Learning Lightweight Lane Detection CNNS by Self Attention Distillation[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:1013-1021. [24]PARK S,KIM J,HEO Y S.Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data[J].Sensors,2022,22(7):2623. [25]KRIZHEVSKY A,HINTON G.Learning Multiple Layers ofFeatures from Tiny Images[J/OL].https://www.researchgate.net/publication/306218037_Learning_multiple_layers_of_features_from_tiny_images. [26]NETZER Y,WANG T,COATES A,et al.Reading Digits in Natural Images with Unsupervised Feature Learning[J/OL].https://www.researchgate.net/publication/266031774_Rea-ding_Digits_in_Natural_Images_with_Unsupervised_Feature_Learning. [27]XIAO H,RASUL K,VOLLGRAF R.Fashion-MNIST:A Novel Image Dataset for Benchmarking Machine Learning Algorithms[J].arXiv:1708.07747,2017. [28]LE Y,YANG X.Tiny Imagenet Visual Recognition Challenge[S].CS 231N,2015. [29]TUNG F,MORI G.Similarity-Preserving Knowledge Distil-lation[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:1365-1374. |
|