基于多阶段多生成对抗网络的互学习知识蒸馏方法

doi:10.11896/jsjkx.210800250

Abstract

Abstract: Aiming at the problems of insufficient knowledge distillation efficiency,single stage training methods,complex training processes and difficult convergence of traditional knowledge distillation methods in image classification tasks,this paper designs a mutual learning knowledge distillation based on multi-stage multi-generative adversarial networks(MS-MGANs).Firstly,the whole training process is divided into several stages,teacher models of different stages are obtained to guide student models to achieve better accuracy.Secondly,the layer-wise greedy strategy is introduced to replace the traditional end-to-end training mode,and the layer-wise training strategy based on convolution block is adopted to reduce the number of parameters to be optimized in each iteration process,and further improve the distillation efficiency of the model.Finally,a generative adversarial structure is introduced into the knowledge distillation framework,with the teacher model as the feature discriminator and the student model as the feature generator,so that the student model can better follow or even surpass the performance of the teacher model in the process of continuously imitating the teacher model.The proposed method is compared with other advanced knowledge distillation methods on several public image classification data sets,and the experimental results show that the new knowledge distillation method has better performance in image classification.

Key words: Mutual learning knowledge distillation, Layer-wise greedy strategy, Generative adversarial network, Model compression, Image classification

CLC Number:

TP391

HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang. Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network[J].Computer Science, 2022, 49(10): 169-175.

References

[1]WANG R Z,GAO J,HUANG S H,et al.Malicious Code Family Detection Method Based on Knowledge Distillation[J].Compu-ter Science,2021,48(1):280-286.
[2]LIU J,CHEN Y,LIU K.Exploiting the Ground-Truth:An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6754-6761.
[3]TAN K,WANG D.Towards model compression for deep lear-ning based speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2021,29:1785-1794.
[4]CHEN X,ZHANG Y,XU H,et al.Adversarial Distillation for Efficient Recommendation with External Knowledge[J].ACM Transactions on Information Systems,2019,37(1):12.1-12.28.
[5]HAN S,MAO H,DALLY W.Deep compression:Compressing deep neural networks with pruning,trained quantization and Huffman coding[J].arXiv:1510.00149,2015.
[6]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[7]ROMERO A,BALLAS N,KAHOU S,et al.FitNets:Hints for Thin Deep Nets[J].arXiv:1412.6550,2014.
[8]YE J,JI Y,WANG X,et al.Data-free Knowledge Amalgamation via Group-stack Dual-GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12516-12525.
[9]GUO Q,WANG X,WU Y,et al.Online Knowledge Distillation via Collaborative Learning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11017-11026.
[10]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48.
[11]PENTINA A,SHARMANSKA V,LAMPERT C H.Curricu-lum learning of multiple tasks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5492-5500.
[12]ZHANG X,SHAPIRO P,KUMAR G,et al.Curriculum learning for domain adaptation in neural machine translation[J].arXiv:1905.05816,2019.
[13]GUO Y,CHEN Y,ZHENG Y,et al.Breaking the curse of space explosion:Towards efficient nas with curriculum search[C]//International Conference on Machine Learning.2020:3822-3831.
[14]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].arXiv:1406.2661,2014.
[15]CHEN H,WANG Y,XU C,et al.Data-free learning of student networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3514-3522.
[16]WANG Y,GONZALEZ-GARCIA A,BERGA D,et al.Mine-gan:effective knowledge transfer from gans to target domains with few images[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:9332-9341.
[17]LI M,LIN J,DING Y,et al.Gan compression:Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5284-5294.
[18]VIAZOVETSKYI Y,IVASHKIN V,KASHIN E.Stylegan2distillation for feed-forward image manipulation[C]//European Conference on Computer Vision.2020:170-186.
[19]GONG R,LIU X,JIANG S,et al.Differentiable soft quantization:Bridging full-precision and low-bit neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4852-4861.
[20]BOO Y,SHIN S,CHOI J,et al.Stochastic precision ensemble:Self-knowledge distillation for quantized deep neural networks[J].arXiv:2009.14502,2020.
[21]GUO S,WANG Y,LI Q,et al.Dmcp:Differentiable Markovchannel pruning for neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1539-1547.
[22]LIN M,JI R,WANG Y,et al.Hrank:Filter pruning using high-rank feature map[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:1529-1538.
[23]TAI C,XIAO T,ZHANG Y,et al.Convolutional neural networks with low-rank regularization[J].arXiv:1511.06067,2015.
[24]WU B,WANG D,ZHAO G,et al.Hybrid tensor decomposition in neural network compression[J].Neural Networks,2020,132:309-320.
[25]YIM J,JOO D,BAE J,et al.A gift from knowledge distillation:Fast optimization,network minimization and transfer learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4133-4141.
[26]JIN X,PENG B,WU Y,et al.Knowledge distillation via route constrained optimization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1345-1354.
[27]KULKARNI A,PANCHI N,CHIDDARWAR S.Stagewiseknowledge distillation[J].arXiv:1911.06786,2019.
[28]LIU Y,CAO J,LI B,et al.Knowledge distillation via instance relationship graph[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2019:7096-7104.
[29]WANG X,LI Y.Harmonized dense knowledge distillation trai-ning for multi-exit architectures[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10218-10226.

Related Articles 15

[1]	ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[2]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[3]	WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[4]	DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[5]	DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[6]	XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[7]	CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[8]	YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[9]	YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi. Speech Enhancement Based on Time-Frequency Domain GAN [J]. Computer Science, 2022, 49(6): 187-192.
[10]	ZHU Xu-dong, XIONG Yun. Study on Multi-label Image Classification Based on Sample Distribution Loss [J]. Computer Science, 2022, 49(6): 210-216.
[11]	XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.
[12]	CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[13]	PENG Yun-cong, QIN Xiao-lin, ZHANG Li-ge, GU Yong-xiang. Survey on Few-shot Learning Algorithms for Image Classification [J]. Computer Science, 2022, 49(5): 1-9.
[14]	ZHANG Wen-xuan, WU Qin. Fine-grained Image Classification Based on Multi-branch Attention-augmentation [J]. Computer Science, 2022, 49(5): 105-112.
[15]	DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0