计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 259-268.doi: 10.11896/jsjkx.221000009

• 人工智能 • 上一篇    下一篇

基于相似一致性的模型自蒸馏方法

万旭, 毛莺池, 王孜博, 刘意, 平萍   

  1. 水利部水利大数据技术重点实验室 南京 211100
    河海大学计算机与信息学院 南京 211100
  • 收稿日期:2022-10-07 修回日期:2023-02-26 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 毛莺池(yingchimao@hhu.edu.cn)
  • 作者简介:(211307040041@hhu.edu.cn)
  • 基金资助:
    国家十四五重点研发计划(2022YFC3005401);云南省重点研发计划(202203AA080009,202202AF080003);江苏省科技成果转化项目(BA2021002);江苏省重点研发计划(BE2020729)

Similarity and Consistency by Self-distillation Method

WAN Xu, MAO Yingchi, WANG Zibo, LIU Yi, PING Ping   

  1. Key Laboratory of Water Big Data Technology of Ministry of Water Resources,Nanjing 211100,China
    College of Computer and Information,Hohai University,Nanjing 211100,China
  • Received:2022-10-07 Revised:2023-02-26 Online:2023-11-15 Published:2023-11-06
  • About author:WAN Xu,born in 1998,postgraduate,is a member of China Computer Federation.Her main research interest is knowledge graph.MAO Yingchi,born in 1976,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.Her main research interests include distributed data processing and edge intelligent computing.
  • Supported by:
    National 14th Five-year Key Research and Development Program of China(2022YFC3005401), Yunnan Province Key Research and Development Program(202203AA080009,202202AF080003),Transformation Program of Scientific and Technological Achievements of Jiangsu Province (BA2021002) and Jiangsu Province Key Research and Development Program(BE2020729).

摘要: 针对传统自蒸馏方法存在数据预处理成本高、局部特征检测缺失,以及模型分类精度低的情况,提出了基于相似一致性的模型自蒸馏方法(Similarity and Consistency by Self-Distillation,SCD),提高模型分类精度。首先,对样本图像的不同层进行学习得到特征图,通过特征权值分布获取注意力图。然后,计算Mini-batch内样本间注意力图的相似性获得相似一致性知识矩阵,构建基于相似一致性的知识,使得无须对实例数据进行失真处理或提取同一类别的数据来获取额外的实例间知识,避免了大量的数据预处理工作带来的训练成本高和训练复杂的问题。最后,将相似一致性知识矩阵在模型中间层之间单向传递,让浅层次的相似矩阵模仿深层次的相似矩阵,细化低层次的相似性,捕获更加丰富的上下文场景和局部特征,解决局部特征检测缺失问题,实现单阶段单向知识转移的自蒸馏。实验结果表明,采用基于相似一致性的模型自蒸馏方法:在公开数据集CIFAR100和TinyImageNet上,验证了SCD提取的相似一致性知识在模型自蒸馏中的有效性,相较于自注意力蒸馏方法(Self Attention Distillation,SAD)和保持相似性的知识蒸馏方法(Similarity-Preserving Knowledge Distillation,SPKD),分类精度平均提升1.42%;相较于基于深度监督的自蒸馏方法(Be Your Own Teacher,BYOT)和动态本地集成知识蒸馏方法(On-the-fly Native Ensemble,ONE),分类精度平均提升1.13%;相较于基于深度神经网络的数据失真引导自蒸馏方法(Data-Distortion Guided Self-Distillation,DDGSD)和基于类间的自蒸馏方法(Class-wise Self-Knowledge Distillation,CS-KD),分类精度平均提升1.23%。

关键词: 知识蒸馏, 知识表达, 自蒸馏, 相似一致性, 知识矩阵

Abstract: Due to high data pre-processing costs and missing local features detection in self-distillation methods for models compression,a similarity and consistency by self-distillation(SCD) method is proposed to improve model classification accuracy.Firstly,different layers of the sample images are learned to get the feature maps,and the attention maps are obtained by the distribution of feature weights.Then,the similarity of the attention graph between samples within the mini-batch is calculated to obtain the similar consistency knowledge matrix,and the similar consistency-based knowledge is constructed without distorting the instance data or extracting the same class of data to obtain additional inter-instance knowledge,avoiding a large amount of data pre-processing work.Finally,the similar consistency knowledge matrix is passed unidirectionally between intermediate layers of the model,allowing shallow layers to mimic deep layers and capture richer contextual scenes and local features which can solve the problem of missing local feature detection.Experimental results show that the proposed SCD method can improve the classification accuracy on the public dataset CIFAR100.Compared with the self attention distillation(SAD) method and the similarity-preserving knowledge distillation(SPKD) method,the average improvement is 1.42%.Compared with the be your own teacher(BYOT) method and the on-the-fly native ensemble(ONE) method,the average improvement is 1.13%.Compared with the data-distortion guided self-distillation(DDGSD) method and the class-wise self-knowledge distillation(CS-KD) method,the average improvement is 1.23%.

Key words: Knowledge distillation, Knowledge representation, Self-distillation, Similarity consistency, Knowledge matrix

中图分类号: 

  • TP311
[1]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778.
[2]LI W,ZHU X,GONG S.Person Re-Identification by Deep Joint Learning of Multi-Loss Classification [C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.Melbourne:Morgan Kaufmann,2017:2194-2200.
[3]LAN X,ZHU X,GONG S.Person Search by Multi-Scale Ma-tching[C]//Proceedings of European Conference on Computer Vision.Munich:Springer Verlag,2018:536-552.
[4]XIE G S,ZHANG Z,LIU L,et al.SRSC:Selective,Robust,and Supervised Constrained Feature Representation for Image Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(10):4290-4302.
[5]CAI Q,PAN Y,WANG Y,et al.Learning a Unified SampleWeighting Network for Object Detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:14173-14182.
[6]LIU Y,CHEN K,LIU C,et al.Structured Knowledge Distillation for Semantic Segmentation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:2604-2613.
[7]PASSALIS N,TZELEPI M,TEFAS A.Heterogeneous Know-ledge Distillation Using Information Flow Modeling[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Greece:IEEE Press,2020:2339-2348.
[8]ZHAO L,PENG X,CHEN Y,et al.Knowledge as Priors:Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:6528-6537.
[9]XU G,LIU Z,LI X,et al.Knowledge Distillation Meets Self-Supervision[C]//Proceedings of European Conference on Compu-ter Vision.Glasgow:Springer,2020:588-604.
[10]ANIL R,PEREYRA G,PASSOS A,et al.Large Scale Distributed Neural Network Training Through Online Distillation[C]//International Conference on Learning Representations.2018:1-12.
[11]CHEN D,MEI J P,WANG C,et al.Online Knowledge Distillation with Diverse Peers[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York:AAAI press,2020:3430-3437.
[12]GUO Q,WANG X,WU Y,et al.Online Knowledge Distillation via Collaborative Learning[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:11017-11026.
[13]WU G,GONG S.Peer Collaborative Learning for Online Know-ledge Distillation[C]//Proceedings of AAAI Conference on Artificial Intelligence.2021.
[14]XU T B,LIU C L.Data-Distortion Guided Self-Distillation for Deep Neural Networks[C]//Proceedings of AAAI Conference on Artificial Intelligence.Honolulu:AAAI press,2019:5565-5572.
[15]LEE H,HWANG S J,SHIN J.Rethinking Data Augmentation:Self-Supervision and Self-Distillation[J].arXiv:1910.05872,2019.
[16]CROWLEY E J,GRAY G,STORKEY A J.Moonshine:Distilling with Cheap Convolutions[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:2888-2898.
[17]BARZ B,RODNER E,GARCIA Y G,et al.Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(5):1088-1101.
[18]YUN S,PARK J,LEE K,et al.Regularizing Class-Wise Predictions via Self-Knowledge Distillation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:13873-13882.
[19]XU T B,LIU C L.Deep Neural Network Self-Distillation Ex-ploiting Data Representation Invariance[J].IEEE Transactions on Neural Networks and Learning Systems,2020,33(1):257-269.
[20]LAN X,ZHU X,GONG S.Knowledge Distillation by On-The-Fly Native Ensemble[C]//NeurIPS 2018.2018:7517-7527.
[21]HOU S,PAN X,LOY C C,et al.Learning a Unified Classifier Incrementally via Rebalancing[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Re-cognition.Long Beach:IEEE Press,2019:831-839.
[22]ZHANG L,SONG J,GAO A,et al.Be Your Own Teacher:Improve the Performance of Convolutional Neural Networks via Self Distillation[C]//Proceedings of IEEE International Confe-rence on Computer Vision.Seoul:IEEE Press,2019:3712-3721.
[23]HOU Y,MA Z,LIU C,et al.Learning Lightweight Lane Detection CNNS by Self Attention Distillation[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:1013-1021.
[24]PARK S,KIM J,HEO Y S.Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data[J].Sensors,2022,22(7):2623.
[25]KRIZHEVSKY A,HINTON G.Learning Multiple Layers ofFeatures from Tiny Images[J/OL].https://www.researchgate.net/publication/306218037_Learning_multiple_layers_of_features_from_tiny_images.
[26]NETZER Y,WANG T,COATES A,et al.Reading Digits in Natural Images with Unsupervised Feature Learning[J/OL].https://www.researchgate.net/publication/266031774_Rea-ding_Digits_in_Natural_Images_with_Unsupervised_Feature_Learning.
[27]XIAO H,RASUL K,VOLLGRAF R.Fashion-MNIST:A Novel Image Dataset for Benchmarking Machine Learning Algorithms[J].arXiv:1708.07747,2017.
[28]LE Y,YANG X.Tiny Imagenet Visual Recognition Challenge[S].CS 231N,2015.
[29]TUNG F,MORI G.Similarity-Preserving Knowledge Distil-lation[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:1365-1374.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!