Computer Science ›› 2023, Vol. 50 ›› Issue (11): 259-268.doi: 10.11896/jsjkx.221000009

• Artificial Intelligence • Previous Articles     Next Articles

Similarity and Consistency by Self-distillation Method

WAN Xu, MAO Yingchi, WANG Zibo, LIU Yi, PING Ping   

  1. Key Laboratory of Water Big Data Technology of Ministry of Water Resources,Nanjing 211100,China
    College of Computer and Information,Hohai University,Nanjing 211100,China
  • Received:2022-10-07 Revised:2023-02-26 Online:2023-11-15 Published:2023-11-06
  • About author:WAN Xu,born in 1998,postgraduate,is a member of China Computer Federation.Her main research interest is knowledge graph.MAO Yingchi,born in 1976,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.Her main research interests include distributed data processing and edge intelligent computing.
  • Supported by:
    National 14th Five-year Key Research and Development Program of China(2022YFC3005401), Yunnan Province Key Research and Development Program(202203AA080009,202202AF080003),Transformation Program of Scientific and Technological Achievements of Jiangsu Province (BA2021002) and Jiangsu Province Key Research and Development Program(BE2020729).

Abstract: Due to high data pre-processing costs and missing local features detection in self-distillation methods for models compression,a similarity and consistency by self-distillation(SCD) method is proposed to improve model classification accuracy.Firstly,different layers of the sample images are learned to get the feature maps,and the attention maps are obtained by the distribution of feature weights.Then,the similarity of the attention graph between samples within the mini-batch is calculated to obtain the similar consistency knowledge matrix,and the similar consistency-based knowledge is constructed without distorting the instance data or extracting the same class of data to obtain additional inter-instance knowledge,avoiding a large amount of data pre-processing work.Finally,the similar consistency knowledge matrix is passed unidirectionally between intermediate layers of the model,allowing shallow layers to mimic deep layers and capture richer contextual scenes and local features which can solve the problem of missing local feature detection.Experimental results show that the proposed SCD method can improve the classification accuracy on the public dataset CIFAR100.Compared with the self attention distillation(SAD) method and the similarity-preserving knowledge distillation(SPKD) method,the average improvement is 1.42%.Compared with the be your own teacher(BYOT) method and the on-the-fly native ensemble(ONE) method,the average improvement is 1.13%.Compared with the data-distortion guided self-distillation(DDGSD) method and the class-wise self-knowledge distillation(CS-KD) method,the average improvement is 1.23%.

Key words: Knowledge distillation, Knowledge representation, Self-distillation, Similarity consistency, Knowledge matrix

CLC Number: 

  • TP311
[1]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778.
[2]LI W,ZHU X,GONG S.Person Re-Identification by Deep Joint Learning of Multi-Loss Classification [C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.Melbourne:Morgan Kaufmann,2017:2194-2200.
[3]LAN X,ZHU X,GONG S.Person Search by Multi-Scale Ma-tching[C]//Proceedings of European Conference on Computer Vision.Munich:Springer Verlag,2018:536-552.
[4]XIE G S,ZHANG Z,LIU L,et al.SRSC:Selective,Robust,and Supervised Constrained Feature Representation for Image Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(10):4290-4302.
[5]CAI Q,PAN Y,WANG Y,et al.Learning a Unified SampleWeighting Network for Object Detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:14173-14182.
[6]LIU Y,CHEN K,LIU C,et al.Structured Knowledge Distillation for Semantic Segmentation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:2604-2613.
[7]PASSALIS N,TZELEPI M,TEFAS A.Heterogeneous Know-ledge Distillation Using Information Flow Modeling[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Greece:IEEE Press,2020:2339-2348.
[8]ZHAO L,PENG X,CHEN Y,et al.Knowledge as Priors:Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:6528-6537.
[9]XU G,LIU Z,LI X,et al.Knowledge Distillation Meets Self-Supervision[C]//Proceedings of European Conference on Compu-ter Vision.Glasgow:Springer,2020:588-604.
[10]ANIL R,PEREYRA G,PASSOS A,et al.Large Scale Distributed Neural Network Training Through Online Distillation[C]//International Conference on Learning Representations.2018:1-12.
[11]CHEN D,MEI J P,WANG C,et al.Online Knowledge Distillation with Diverse Peers[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York:AAAI press,2020:3430-3437.
[12]GUO Q,WANG X,WU Y,et al.Online Knowledge Distillation via Collaborative Learning[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:11017-11026.
[13]WU G,GONG S.Peer Collaborative Learning for Online Know-ledge Distillation[C]//Proceedings of AAAI Conference on Artificial Intelligence.2021.
[14]XU T B,LIU C L.Data-Distortion Guided Self-Distillation for Deep Neural Networks[C]//Proceedings of AAAI Conference on Artificial Intelligence.Honolulu:AAAI press,2019:5565-5572.
[15]LEE H,HWANG S J,SHIN J.Rethinking Data Augmentation:Self-Supervision and Self-Distillation[J].arXiv:1910.05872,2019.
[16]CROWLEY E J,GRAY G,STORKEY A J.Moonshine:Distilling with Cheap Convolutions[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:2888-2898.
[17]BARZ B,RODNER E,GARCIA Y G,et al.Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(5):1088-1101.
[18]YUN S,PARK J,LEE K,et al.Regularizing Class-Wise Predictions via Self-Knowledge Distillation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Online:IEEE Press,2020:13873-13882.
[19]XU T B,LIU C L.Deep Neural Network Self-Distillation Ex-ploiting Data Representation Invariance[J].IEEE Transactions on Neural Networks and Learning Systems,2020,33(1):257-269.
[20]LAN X,ZHU X,GONG S.Knowledge Distillation by On-The-Fly Native Ensemble[C]//NeurIPS 2018.2018:7517-7527.
[21]HOU S,PAN X,LOY C C,et al.Learning a Unified Classifier Incrementally via Rebalancing[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Re-cognition.Long Beach:IEEE Press,2019:831-839.
[22]ZHANG L,SONG J,GAO A,et al.Be Your Own Teacher:Improve the Performance of Convolutional Neural Networks via Self Distillation[C]//Proceedings of IEEE International Confe-rence on Computer Vision.Seoul:IEEE Press,2019:3712-3721.
[23]HOU Y,MA Z,LIU C,et al.Learning Lightweight Lane Detection CNNS by Self Attention Distillation[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:1013-1021.
[24]PARK S,KIM J,HEO Y S.Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data[J].Sensors,2022,22(7):2623.
[25]KRIZHEVSKY A,HINTON G.Learning Multiple Layers ofFeatures from Tiny Images[J/OL].https://www.researchgate.net/publication/306218037_Learning_multiple_layers_of_features_from_tiny_images.
[26]NETZER Y,WANG T,COATES A,et al.Reading Digits in Natural Images with Unsupervised Feature Learning[J/OL].https://www.researchgate.net/publication/266031774_Rea-ding_Digits_in_Natural_Images_with_Unsupervised_Feature_Learning.
[27]XIAO H,RASUL K,VOLLGRAF R.Fashion-MNIST:A Novel Image Dataset for Benchmarking Machine Learning Algorithms[J].arXiv:1708.07747,2017.
[28]LE Y,YANG X.Tiny Imagenet Visual Recognition Challenge[S].CS 231N,2015.
[29]TUNG F,MORI G.Similarity-Preserving Knowledge Distil-lation[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:1365-1374.
[1] XIAO Guiyang, WANG Lisong , JIANG Guohua. Multimodal Knowledge Graph Embedding with Text-Image Enhancement [J]. Computer Science, 2023, 50(8): 163-169.
[2] ZHAO Ran, YUAN Jiabin, FAN Lili. Medical Ultrasound Image Super-resolution Reconstruction Based on Video Multi-frame Fusion [J]. Computer Science, 2023, 50(7): 143-151.
[3] ZHAO Jiangjiang, WANG Yang, XU Yingying, GAO Yang. Extractive Automatic Summarization Model Based on Knowledge Distillation [J]. Computer Science, 2023, 50(6A): 210300179-7.
[4] GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[5] SHEN Qiuhui, ZHANG Hongjun, XU Youwei, WANG Hang, CHENG Kai. Comprehensive Survey of Loss Functions in Knowledge Graph Embedding Models [J]. Computer Science, 2023, 50(4): 149-158.
[6] ZHANG Yu, CAO Xiqing, NIU Saisai, XU Xinlei, ZHANG Qian, WANG Zhe. Incremental Class Learning Approach Based on Prototype Replay and Dynamic Update [J]. Computer Science, 2023, 50(11A): 230300012-7.
[7] ZHOU Shijin, XING Hongjie. Novelty Detection Method Based on Knowledge Distillation and Efficient Channel Attention [J]. Computer Science, 2023, 50(11A): 220900034-10.
[8] WANG Jie, LI Xiao-nan, LI Guan-yu. Adaptive Attention-based Knowledge Graph Completion [J]. Computer Science, 2022, 49(7): 204-211.
[9] CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[10] CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[11] HUANG Yu-jiao, ZHAN Li-chao, FAN Xing-gang, XIAO Jie, LONG Hai-xia. Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM [J]. Computer Science, 2022, 49(11A): 211200181-6.
[12] JIANG Zong-lin, LI Zhi-jun, GU Hai-jun. Intelligent Operation Framework for Relational Database Application [J]. Computer Science, 2022, 49(11A): 211200030-9.
[13] XIAO Zheng-ye, LIN Shi-quan, WAN Xiu-an, FANGYu-chun, NI Lan. Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition [J]. Computer Science, 2022, 49(11): 156-162.
[14] MIAO Zhuang, WANG Ya-peng, LI Yang, WANG Jia-bao, ZHANG Rui, ZHAO Xin-xin. Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation [J]. Computer Science, 2022, 49(10): 159-168.
[15] HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang. Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network [J]. Computer Science, 2022, 49(10): 169-175.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!