计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 241-247.doi: 10.11896/jsjkx.240700059

• 人工智能 • 上一篇    下一篇

面向知识蒸馏的多助教动态设置方法

司悦航1, 成清1,2, 黄金才1   

  1. 1 国防科技大学大数据与决策实验室 长沙 410073
    2 湖南先进技术研究院 长沙 410072
  • 收稿日期:2024-07-09 修回日期:2024-11-07 出版日期:2025-05-15 发布日期:2025-05-12
  • 通讯作者: 成清(chengqing@nudt.edu.cn)
  • 作者简介:(siyuehang@nudt.edu.cn)

Multi-assistant Dynamic Setting Method for Knowledge Distillation

SI Yuehang1, CHENG Qing1,2, HUANG Jincai1   

  1. 1 Laboratory for Big Data and Decision,National University of Defense Technology,Changsha 410073,China
    2 Hunan Advanced Technology Research Institute,Changsha 410072,China
  • Received:2024-07-09 Revised:2024-11-07 Online:2025-05-15 Published:2025-05-12
  • About author:SI Yuehang,born in 2000,Ph.D.His main research interests include data fusion and knowledge processing.
    CHENG Qing,born in 1986,associate professor,is a member of CCF(No.31422G).His main research interests include knowledge reasoning and intelligence Q&A.

摘要: 知识蒸馏在目标识别的模型压缩等关键领域受到重视。通过深入研究知识蒸馏的效率并分析教师模型和学生模型间知识传递的特点,发现合理设置助教模型可以显著缩小教师和学生之间的性能差距。然而,助教模型的规模和数量的不合理选择会对学生产生负面影响。因此,提出了一种创新的多助教知识蒸馏训练框架,通过动态调整助教的数量和规模,以优化知识从教师向学生传递的过程,从而提高学生模型的训练准确率。此外,还设计了一种动态停止知识蒸馏的策略,设置不同训练方法的学生模型作为对照组,实现对知识蒸馏停止回合的个性化设计,进一步提升学生模型的训练效率,并构建更精简高效的多助教知识蒸馏框架。通过在公开数据集上进行实验,证明了提出的面向知识蒸馏的多助教动态设置方法的有效性。

关键词: 知识蒸馏, 目标识别, 多助教, 动态设置, DSKD

Abstract: Knowledge distillation is increasingly gaining attention in key areas such as model compression for object recognition.Through in-depth research into the efficiency of knowledge distillation and an analysis of the characteristics of knowledge transfer between the teacher and student models,it is found that the reasonable setting of an assistant model can significantly reduce the performance gap between the teacher and student.However,the unreasonable choice of the scale and number of assistant models can have a negative impact on the student.Therefore,this paper proposes an innovative multi-assistant knowledge distillation training framework,which optimizes the process of knowledge transfer from the teacher to the student by dynamically adjusting the number and scale of assistant models,thereby improving the training accuracy of the student model.In addition,this paper also designs a dynamic stopping strategy for knowledge distillation,sets student models with different training methods as a control group,and achieves personalized design of the stopping rounds for knowledge distillation,further improving the training efficiency of the student model and constructing a more streamlined and efficient multi-assistant knowledge distillation framework.Experiments on public datasets prove the effectiveness of the proposed multi-assistant dynamic setting method for knowledge distillation.

Key words: Knowledge distillation, Object recognition, Multi assistants, Dynamic setting, DSKD

中图分类号: 

  • TP301
[1]GORDON A,EBAN E,NACHUM O,et al.Morphnet:Fast & simple resource-constrained structure learning of deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1586-1595.
[2]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[3]LIANG Z P,HUANG X J,LI S D,et al.Offline data-driven evolutionary optimization based on pruning stack generalization[J].Acta Automatica Sinica,2023,49(6):1306-1325.
[4]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[5]DONG X,HUANG O,THULASIRAMAN P,et al.ImprovedKnowledge Distillation via Teacher Assistants for Sentiment Analysis[C]//2023 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2023:300-305.
[6]ZAGORUYKO S,KOMODAKIS N.Paying more attention toattention:Improving the performance of convolutional neural networks via attention transfer[J].arXiv:1612.03928,2016.
[7]TARVAINEN A,VALPOLA H.Mean teachers are better role models:Weight-averaged consistency targets improve semi-supervised deep learning results[J].arXiv:1703.01780,2017.
[8]GUO W,HUANG J H,HOU C Y,et al.A text classificationmethod combining noise suppression and double distillation [J].Computer Science,2023,50(6):251-260.
[9]FUKUDA T,KURATA G.Generalized knowledge distillationfrom an ensemble of specialized teachers leveraging unsupervised neural clustering[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2021:6868-6872.
[10]SHI S H,WANG X D,YANG C X,et al.SAR image target re-cognition method based on cross-domain small sample learning [J].Computer Science,2024,51(201):465-471.
[11]ZHANG L F,SONG J B,GAO A,et al.Be your own teacher:Improve the performance of convolutional neural networks via self distillation[C]//Proceedings of the IEEE/CVF InternationalConference on Computer Vision.2019:3713-3722.
[12]KIM S,JEONG M,KO B C.Lightweight surrogate random fo-rest support for model simplification and feature relevance[J].Applied Intelligence,2022,52(1):471-481.
[13]HEO B,LEE M,YUN S,et al.Knowledge transfer via distillation of activation boundaries formed by hidden neurons[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3779-3787.
[14]WANG R Z,ZHANG X S,WANG M H.Text classificationcombining dynamic mask attention and multi-teacher multi-feature knowledge distillation[J].Journal of Chinese Information Processing,2024,38(3):113-129.
[15]YANG C L,XIE L X,QIAO S Y,et al.Knowledge distillation in generations:More tolerant teachers educate better students[J].arXiv:1805.05551,2018.
[16]MIRZADEH S I,FARAJTABAR M,LI A,et al.Improvedknowledge distillation via teacher assistant[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5191-5198.
[17]LIU S H,DU K,SHE C D,et al.Multi-teacher joint knowledge distillation based on CenterNet [J].Systems Engineering and Electronics,2024,46(4):1174-1184.
[18]CHO J H,HARIHARAN B.On the efficacy of knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4794-4802.
[19]CHU Y C,GONG H,WANG X F,et al.Research on knowledge distillation algorithm for target detection based on YOLOv4 [J].Computer Science,2022,49(201):337-344.
[20]SHAO R R,LIU Y A,ZHANG W,et al.Review of knowledge distillation in deep learning [J].Chinese Journal of Computers,2022,45(8):1638-1673.
[21]GAO Y,CAO Y J,DUAN P S.Review on lightweight methods of neural network models [J].Computer Science,2024,51(201):23-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!