计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 169-175.doi: 10.11896/jsjkx.210800250

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多阶段多生成对抗网络的互学习知识蒸馏方法

黄仲浩, 杨兴耀, 于炯, 郭亮, 李想   

  1. 新疆大学软件学院 乌鲁木齐 830008
  • 收稿日期:2021-08-30 修回日期:2022-03-07 出版日期:2022-10-15 发布日期:2022-10-13
  • 通讯作者: 杨兴耀(yangxy@xju.edu.cn)
  • 作者简介:(hzhao@stu.xju.edu.cn)
  • 基金资助:
    国家自然科学基金(61862060,61966035,61562086);新疆维吾尔自治区教育厅项目(XJEDU2016S035);新疆大学博士科研启动基金项目(BS150257)

Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network

HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang   

  1. School of Software,Xinjiang University,Urumqi 830008,China
  • Received:2021-08-30 Revised:2022-03-07 Online:2022-10-15 Published:2022-10-13
  • About author:HUANG Zhong-hao,born in 1997,postgraduate.His main research interests include data compression and recommendation system.
    YANG Xing-yao,born in 1984,Ph.D,associate professor, is a member of China Computer Federation.His main research interests include recommender system and trust computing.
  • Supported by:
    National Natural Science Foundation of China(61862060,61966035,61562086),Education Department Project of Xinjiang Uygur Autonomous Region(XJEDU2016S035) and Doctoral Research Start-up Foundation of Xinjiang University(BS150257).

摘要: 针对传统的知识蒸馏方法在图像分类任务中对知识蒸馏的效率不高、阶段训练方式单一、训练过程复杂且难收敛的问题,设计了一种基于多阶段多生成对抗网络(MS-MGANs)的互学习知识蒸馏方法。首先,将整个训练过程划分为多个阶段,得到不同阶段的老师模型,用于逐步指导学生模型,获得更好的精度效果;其次,引入逐层贪婪策略取代传统的端到端训练模式,通过基于卷积块的逐层训练来减少每阶段迭代过程中需优化的参数量,进一步提高模型蒸馏效率;最后,在知识蒸馏框架中引入生成对抗结构,使用老师模型作为特征辨别器,使用学生模型作为特征生成器,促使学生模型在不断模仿老师模型的过程中更好地接近甚至超越老师模型的性能。在多个公开的图像分类数据集上对所提方法和其他流行的知识蒸馏方法进行对比实验,实验结果表明所提知识蒸馏方法具有更好的图像分类性能。

关键词: 互学习知识蒸馏, 逐层贪婪策略, 生成对抗网络, 模型压缩, 图像分类

Abstract: Aiming at the problems of insufficient knowledge distillation efficiency,single stage training methods,complex training processes and difficult convergence of traditional knowledge distillation methods in image classification tasks,this paper designs a mutual learning knowledge distillation based on multi-stage multi-generative adversarial networks(MS-MGANs).Firstly,the whole training process is divided into several stages,teacher models of different stages are obtained to guide student models to achieve better accuracy.Secondly,the layer-wise greedy strategy is introduced to replace the traditional end-to-end training mode,and the layer-wise training strategy based on convolution block is adopted to reduce the number of parameters to be optimized in each iteration process,and further improve the distillation efficiency of the model.Finally,a generative adversarial structure is introduced into the knowledge distillation framework,with the teacher model as the feature discriminator and the student model as the feature generator,so that the student model can better follow or even surpass the performance of the teacher model in the process of continuously imitating the teacher model.The proposed method is compared with other advanced knowledge distillation methods on several public image classification data sets,and the experimental results show that the new knowledge distillation method has better performance in image classification.

Key words: Mutual learning knowledge distillation, Layer-wise greedy strategy, Generative adversarial network, Model compression, Image classification

中图分类号: 

  • TP391
[1]WANG R Z,GAO J,HUANG S H,et al.Malicious Code Family Detection Method Based on Knowledge Distillation[J].Compu-ter Science,2021,48(1):280-286.
[2]LIU J,CHEN Y,LIU K.Exploiting the Ground-Truth:An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6754-6761.
[3]TAN K,WANG D.Towards model compression for deep lear-ning based speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2021,29:1785-1794.
[4]CHEN X,ZHANG Y,XU H,et al.Adversarial Distillation for Efficient Recommendation with External Knowledge[J].ACM Transactions on Information Systems,2019,37(1):12.1-12.28.
[5]HAN S,MAO H,DALLY W.Deep compression:Compressing deep neural networks with pruning,trained quantization and Huffman coding[J].arXiv:1510.00149,2015.
[6]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[7]ROMERO A,BALLAS N,KAHOU S,et al.FitNets:Hints for Thin Deep Nets[J].arXiv:1412.6550,2014.
[8]YE J,JI Y,WANG X,et al.Data-free Knowledge Amalgamation via Group-stack Dual-GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12516-12525.
[9]GUO Q,WANG X,WU Y,et al.Online Knowledge Distillation via Collaborative Learning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11017-11026.
[10]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48.
[11]PENTINA A,SHARMANSKA V,LAMPERT C H.Curricu-lum learning of multiple tasks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5492-5500.
[12]ZHANG X,SHAPIRO P,KUMAR G,et al.Curriculum learning for domain adaptation in neural machine translation[J].arXiv:1905.05816,2019.
[13]GUO Y,CHEN Y,ZHENG Y,et al.Breaking the curse of space explosion:Towards efficient nas with curriculum search[C]//International Conference on Machine Learning.2020:3822-3831.
[14]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].arXiv:1406.2661,2014.
[15]CHEN H,WANG Y,XU C,et al.Data-free learning of student networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3514-3522.
[16]WANG Y,GONZALEZ-GARCIA A,BERGA D,et al.Mine-gan:effective knowledge transfer from gans to target domains with few images[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:9332-9341.
[17]LI M,LIN J,DING Y,et al.Gan compression:Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5284-5294.
[18]VIAZOVETSKYI Y,IVASHKIN V,KASHIN E.Stylegan2distillation for feed-forward image manipulation[C]//European Conference on Computer Vision.2020:170-186.
[19]GONG R,LIU X,JIANG S,et al.Differentiable soft quantization:Bridging full-precision and low-bit neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4852-4861.
[20]BOO Y,SHIN S,CHOI J,et al.Stochastic precision ensemble:Self-knowledge distillation for quantized deep neural networks[J].arXiv:2009.14502,2020.
[21]GUO S,WANG Y,LI Q,et al.Dmcp:Differentiable Markovchannel pruning for neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1539-1547.
[22]LIN M,JI R,WANG Y,et al.Hrank:Filter pruning using high-rank feature map[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:1529-1538.
[23]TAI C,XIAO T,ZHANG Y,et al.Convolutional neural networks with low-rank regularization[J].arXiv:1511.06067,2015.
[24]WU B,WANG D,ZHAO G,et al.Hybrid tensor decomposition in neural network compression[J].Neural Networks,2020,132:309-320.
[25]YIM J,JOO D,BAE J,et al.A gift from knowledge distillation:Fast optimization,network minimization and transfer learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4133-4141.
[26]JIN X,PENG B,WU Y,et al.Knowledge distillation via route constrained optimization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1345-1354.
[27]KULKARNI A,PANCHI N,CHIDDARWAR S.Stagewiseknowledge distillation[J].arXiv:1911.06786,2019.
[28]LIU Y,CAO J,LI B,et al.Knowledge distillation via instance relationship graph[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2019:7096-7104.
[29]WANG X,LI Y.Harmonized dense knowledge distillation trai-ning for multi-exit architectures[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10218-10226.
[1] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[2] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[3] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[4] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[5] 楚玉春, 龚航, 王学芳, 刘培顺.
基于YOLOv4的目标检测知识蒸馏算法研究
Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4
计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204
[6] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[7] 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建.
基于注意力机制和多任务学习的阿尔茨海默症分类
Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning
计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
[8] 尹文兵, 高戈, 曾邦, 王霄, 陈怡.
基于时频域生成对抗网络的语音增强算法
Speech Enhancement Based on Time-Frequency Domain GAN
计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114
[9] 朱旭东, 熊贇.
基于样本分布损失的图像多标签分类研究
Study on Multi-label Image Classification Based on Sample Distribution Loss
计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267
[10] 徐辉, 康金梦, 张加万.
基于特征感知的数字壁画复原方法
Digital Mural Inpainting Method Based on Feature Perception
计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[11] 程祥鸣, 邓春华.
基于无标签知识蒸馏的人脸识别模型的压缩算法
Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation
计算机科学, 2022, 49(6): 245-253. https://doi.org/10.11896/jsjkx.210400023
[12] 彭云聪, 秦小林, 张力戈, 顾勇翔.
面向图像分类的小样本学习算法综述
Survey on Few-shot Learning Algorithms for Image Classification
计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128
[13] 张文轩, 吴秦.
基于多分支注意力增强的细粒度图像分类
Fine-grained Image Classification Based on Multi-branch Attention-augmentation
计算机科学, 2022, 49(5): 105-112. https://doi.org/10.11896/jsjkx.210100108
[14] 高志宇, 王天荆, 汪悦, 沈航, 白光伟.
基于生成对抗网络的5G网络流量预测方法
Traffic Prediction Method for 5G Network Based on Generative Adversarial Network
计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240
[15] 黎思泉, 万永菁, 蒋翠玲.
基于生成对抗网络去影像的多基频估计算法
Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal
计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!