计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 241-247.doi: 10.11896/jsjkx.240700059
司悦航1, 成清1,2, 黄金才1
SI Yuehang1, CHENG Qing1,2, HUANG Jincai1
摘要: 知识蒸馏在目标识别的模型压缩等关键领域受到重视。通过深入研究知识蒸馏的效率并分析教师模型和学生模型间知识传递的特点,发现合理设置助教模型可以显著缩小教师和学生之间的性能差距。然而,助教模型的规模和数量的不合理选择会对学生产生负面影响。因此,提出了一种创新的多助教知识蒸馏训练框架,通过动态调整助教的数量和规模,以优化知识从教师向学生传递的过程,从而提高学生模型的训练准确率。此外,还设计了一种动态停止知识蒸馏的策略,设置不同训练方法的学生模型作为对照组,实现对知识蒸馏停止回合的个性化设计,进一步提升学生模型的训练效率,并构建更精简高效的多助教知识蒸馏框架。通过在公开数据集上进行实验,证明了提出的面向知识蒸馏的多助教动态设置方法的有效性。
中图分类号:
[1]GORDON A,EBAN E,NACHUM O,et al.Morphnet:Fast & simple resource-constrained structure learning of deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1586-1595. [2]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015. [3]LIANG Z P,HUANG X J,LI S D,et al.Offline data-driven evolutionary optimization based on pruning stack generalization[J].Acta Automatica Sinica,2023,49(6):1306-1325. [4]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [5]DONG X,HUANG O,THULASIRAMAN P,et al.ImprovedKnowledge Distillation via Teacher Assistants for Sentiment Analysis[C]//2023 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2023:300-305. [6]ZAGORUYKO S,KOMODAKIS N.Paying more attention toattention:Improving the performance of convolutional neural networks via attention transfer[J].arXiv:1612.03928,2016. [7]TARVAINEN A,VALPOLA H.Mean teachers are better role models:Weight-averaged consistency targets improve semi-supervised deep learning results[J].arXiv:1703.01780,2017. [8]GUO W,HUANG J H,HOU C Y,et al.A text classificationmethod combining noise suppression and double distillation [J].Computer Science,2023,50(6):251-260. [9]FUKUDA T,KURATA G.Generalized knowledge distillationfrom an ensemble of specialized teachers leveraging unsupervised neural clustering[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2021:6868-6872. [10]SHI S H,WANG X D,YANG C X,et al.SAR image target re-cognition method based on cross-domain small sample learning [J].Computer Science,2024,51(201):465-471. [11]ZHANG L F,SONG J B,GAO A,et al.Be your own teacher:Improve the performance of convolutional neural networks via self distillation[C]//Proceedings of the IEEE/CVF InternationalConference on Computer Vision.2019:3713-3722. [12]KIM S,JEONG M,KO B C.Lightweight surrogate random fo-rest support for model simplification and feature relevance[J].Applied Intelligence,2022,52(1):471-481. [13]HEO B,LEE M,YUN S,et al.Knowledge transfer via distillation of activation boundaries formed by hidden neurons[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3779-3787. [14]WANG R Z,ZHANG X S,WANG M H.Text classificationcombining dynamic mask attention and multi-teacher multi-feature knowledge distillation[J].Journal of Chinese Information Processing,2024,38(3):113-129. [15]YANG C L,XIE L X,QIAO S Y,et al.Knowledge distillation in generations:More tolerant teachers educate better students[J].arXiv:1805.05551,2018. [16]MIRZADEH S I,FARAJTABAR M,LI A,et al.Improvedknowledge distillation via teacher assistant[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5191-5198. [17]LIU S H,DU K,SHE C D,et al.Multi-teacher joint knowledge distillation based on CenterNet [J].Systems Engineering and Electronics,2024,46(4):1174-1184. [18]CHO J H,HARIHARAN B.On the efficacy of knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4794-4802. [19]CHU Y C,GONG H,WANG X F,et al.Research on knowledge distillation algorithm for target detection based on YOLOv4 [J].Computer Science,2022,49(201):337-344. [20]SHAO R R,LIU Y A,ZHANG W,et al.Review of knowledge distillation in deep learning [J].Chinese Journal of Computers,2022,45(8):1638-1673. [21]GAO Y,CAO Y J,DUAN P S.Review on lightweight methods of neural network models [J].Computer Science,2024,51(201):23-33. |
|