计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 169-175.doi: 10.11896/jsjkx.210800250
黄仲浩, 杨兴耀, 于炯, 郭亮, 李想
HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang
摘要: 针对传统的知识蒸馏方法在图像分类任务中对知识蒸馏的效率不高、阶段训练方式单一、训练过程复杂且难收敛的问题,设计了一种基于多阶段多生成对抗网络(MS-MGANs)的互学习知识蒸馏方法。首先,将整个训练过程划分为多个阶段,得到不同阶段的老师模型,用于逐步指导学生模型,获得更好的精度效果;其次,引入逐层贪婪策略取代传统的端到端训练模式,通过基于卷积块的逐层训练来减少每阶段迭代过程中需优化的参数量,进一步提高模型蒸馏效率;最后,在知识蒸馏框架中引入生成对抗结构,使用老师模型作为特征辨别器,使用学生模型作为特征生成器,促使学生模型在不断模仿老师模型的过程中更好地接近甚至超越老师模型的性能。在多个公开的图像分类数据集上对所提方法和其他流行的知识蒸馏方法进行对比实验,实验结果表明所提知识蒸馏方法具有更好的图像分类性能。
中图分类号:
| [1]WANG R Z,GAO J,HUANG S H,et al.Malicious Code Family Detection Method Based on Knowledge Distillation[J].Compu-ter Science,2021,48(1):280-286. [2]LIU J,CHEN Y,LIU K.Exploiting the Ground-Truth:An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6754-6761. [3]TAN K,WANG D.Towards model compression for deep lear-ning based speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2021,29:1785-1794. [4]CHEN X,ZHANG Y,XU H,et al.Adversarial Distillation for Efficient Recommendation with External Knowledge[J].ACM Transactions on Information Systems,2019,37(1):12.1-12.28. [5]HAN S,MAO H,DALLY W.Deep compression:Compressing deep neural networks with pruning,trained quantization and Huffman coding[J].arXiv:1510.00149,2015. [6]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [7]ROMERO A,BALLAS N,KAHOU S,et al.FitNets:Hints for Thin Deep Nets[J].arXiv:1412.6550,2014. [8]YE J,JI Y,WANG X,et al.Data-free Knowledge Amalgamation via Group-stack Dual-GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12516-12525. [9]GUO Q,WANG X,WU Y,et al.Online Knowledge Distillation via Collaborative Learning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11017-11026. [10]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48. [11]PENTINA A,SHARMANSKA V,LAMPERT C H.Curricu-lum learning of multiple tasks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5492-5500. [12]ZHANG X,SHAPIRO P,KUMAR G,et al.Curriculum learning for domain adaptation in neural machine translation[J].arXiv:1905.05816,2019. [13]GUO Y,CHEN Y,ZHENG Y,et al.Breaking the curse of space explosion:Towards efficient nas with curriculum search[C]//International Conference on Machine Learning.2020:3822-3831. [14]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].arXiv:1406.2661,2014. [15]CHEN H,WANG Y,XU C,et al.Data-free learning of student networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3514-3522. [16]WANG Y,GONZALEZ-GARCIA A,BERGA D,et al.Mine-gan:effective knowledge transfer from gans to target domains with few images[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:9332-9341. [17]LI M,LIN J,DING Y,et al.Gan compression:Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5284-5294. [18]VIAZOVETSKYI Y,IVASHKIN V,KASHIN E.Stylegan2distillation for feed-forward image manipulation[C]//European Conference on Computer Vision.2020:170-186. [19]GONG R,LIU X,JIANG S,et al.Differentiable soft quantization:Bridging full-precision and low-bit neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4852-4861. [20]BOO Y,SHIN S,CHOI J,et al.Stochastic precision ensemble:Self-knowledge distillation for quantized deep neural networks[J].arXiv:2009.14502,2020. [21]GUO S,WANG Y,LI Q,et al.Dmcp:Differentiable Markovchannel pruning for neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1539-1547. [22]LIN M,JI R,WANG Y,et al.Hrank:Filter pruning using high-rank feature map[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:1529-1538. [23]TAI C,XIAO T,ZHANG Y,et al.Convolutional neural networks with low-rank regularization[J].arXiv:1511.06067,2015. [24]WU B,WANG D,ZHAO G,et al.Hybrid tensor decomposition in neural network compression[J].Neural Networks,2020,132:309-320. [25]YIM J,JOO D,BAE J,et al.A gift from knowledge distillation:Fast optimization,network minimization and transfer learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4133-4141. [26]JIN X,PENG B,WU Y,et al.Knowledge distillation via route constrained optimization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1345-1354. [27]KULKARNI A,PANCHI N,CHIDDARWAR S.Stagewiseknowledge distillation[J].arXiv:1911.06786,2019. [28]LIU Y,CAO J,LI B,et al.Knowledge distillation via instance relationship graph[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2019:7096-7104. [29]WANG X,LI Y.Harmonized dense knowledge distillation trai-ning for multi-exit architectures[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10218-10226. | 
| [1] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 | 
| [2] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 | 
| [3] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 | 
| [4] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 | 
| [5] | 楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204 | 
| [6] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 | 
| [7] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 | 
| [8] | 尹文兵, 高戈, 曾邦, 王霄, 陈怡. 基于时频域生成对抗网络的语音增强算法 Speech Enhancement Based on Time-Frequency Domain GAN 计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114 | 
| [9] | 朱旭东, 熊贇. 基于样本分布损失的图像多标签分类研究 Study on Multi-label Image Classification Based on Sample Distribution Loss 计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267 | 
| [10] | 徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105 | 
| [11] | 程祥鸣, 邓春华. 基于无标签知识蒸馏的人脸识别模型的压缩算法 Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation 计算机科学, 2022, 49(6): 245-253. https://doi.org/10.11896/jsjkx.210400023 | 
| [12] | 彭云聪, 秦小林, 张力戈, 顾勇翔. 面向图像分类的小样本学习算法综述 Survey on Few-shot Learning Algorithms for Image Classification 计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128 | 
| [13] | 张文轩, 吴秦. 基于多分支注意力增强的细粒度图像分类 Fine-grained Image Classification Based on Multi-branch Attention-augmentation 计算机科学, 2022, 49(5): 105-112. https://doi.org/10.11896/jsjkx.210100108 | 
| [14] | 高志宇, 王天荆, 汪悦, 沈航, 白光伟. 基于生成对抗网络的5G网络流量预测方法 Traffic Prediction Method for 5G Network Based on Generative Adversarial Network 计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240 | 
| [15] | 黎思泉, 万永菁, 蒋翠玲. 基于生成对抗网络去影像的多基频估计算法 Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal 计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081 | 
| 
 | ||