面向知识蒸馏的多助教动态设置方法

doi:10.11896/jsjkx.240700059

Abstract

Abstract: Knowledge distillation is increasingly gaining attention in key areas such as model compression for object recognition.Through in-depth research into the efficiency of knowledge distillation and an analysis of the characteristics of knowledge transfer between the teacher and student models,it is found that the reasonable setting of an assistant model can significantly reduce the performance gap between the teacher and student.However,the unreasonable choice of the scale and number of assistant models can have a negative impact on the student.Therefore,this paper proposes an innovative multi-assistant knowledge distillation training framework,which optimizes the process of knowledge transfer from the teacher to the student by dynamically adjusting the number and scale of assistant models,thereby improving the training accuracy of the student model.In addition,this paper also designs a dynamic stopping strategy for knowledge distillation,sets student models with different training methods as a control group,and achieves personalized design of the stopping rounds for knowledge distillation,further improving the training efficiency of the student model and constructing a more streamlined and efficient multi-assistant knowledge distillation framework.Experiments on public datasets prove the effectiveness of the proposed multi-assistant dynamic setting method for knowledge distillation.

Key words: Knowledge distillation, Object recognition, Multi assistants, Dynamic setting, DSKD

CLC Number:

TP301

SI Yuehang, CHENG Qing, HUANG Jincai. Multi-assistant Dynamic Setting Method for Knowledge Distillation[J].Computer Science, 2025, 52(5): 241-247.

References

[1]GORDON A,EBAN E,NACHUM O,et al.Morphnet:Fast & simple resource-constrained structure learning of deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1586-1595.
[2]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[3]LIANG Z P,HUANG X J,LI S D,et al.Offline data-driven evolutionary optimization based on pruning stack generalization[J].Acta Automatica Sinica,2023,49(6):1306-1325.
[4]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[5]DONG X,HUANG O,THULASIRAMAN P,et al.ImprovedKnowledge Distillation via Teacher Assistants for Sentiment Analysis[C]//2023 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2023:300-305.
[6]ZAGORUYKO S,KOMODAKIS N.Paying more attention toattention:Improving the performance of convolutional neural networks via attention transfer[J].arXiv:1612.03928,2016.
[7]TARVAINEN A,VALPOLA H.Mean teachers are better role models:Weight-averaged consistency targets improve semi-supervised deep learning results[J].arXiv:1703.01780,2017.
[8]GUO W,HUANG J H,HOU C Y,et al.A text classificationmethod combining noise suppression and double distillation [J].Computer Science,2023,50(6):251-260.
[9]FUKUDA T,KURATA G.Generalized knowledge distillationfrom an ensemble of specialized teachers leveraging unsupervised neural clustering[C]//ICASSP 2021－2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2021:6868-6872.
[10]SHI S H,WANG X D,YANG C X,et al.SAR image target re-cognition method based on cross-domain small sample learning [J].Computer Science,2024,51(201):465-471.
[11]ZHANG L F,SONG J B,GAO A,et al.Be your own teacher:Improve the performance of convolutional neural networks via self distillation[C]//Proceedings of the IEEE/CVF InternationalConference on Computer Vision.2019:3713-3722.
[12]KIM S,JEONG M,KO B C.Lightweight surrogate random fo-rest support for model simplification and feature relevance[J].Applied Intelligence,2022,52(1):471-481.
[13]HEO B,LEE M,YUN S,et al.Knowledge transfer via distillation of activation boundaries formed by hidden neurons[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3779-3787.
[14]WANG R Z,ZHANG X S,WANG M H.Text classificationcombining dynamic mask attention and multi-teacher multi-feature knowledge distillation[J].Journal of Chinese Information Processing,2024,38(3):113-129.
[15]YANG C L,XIE L X,QIAO S Y,et al.Knowledge distillation in generations:More tolerant teachers educate better students[J].arXiv:1805.05551,2018.
[16]MIRZADEH S I,FARAJTABAR M,LI A,et al.Improvedknowledge distillation via teacher assistant[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5191-5198.
[17]LIU S H,DU K,SHE C D,et al.Multi-teacher joint knowledge distillation based on CenterNet [J].Systems Engineering and Electronics,2024,46(4):1174-1184.
[18]CHO J H,HARIHARAN B.On the efficacy of knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:4794-4802.
[19]CHU Y C,GONG H,WANG X F,et al.Research on knowledge distillation algorithm for target detection based on YOLOv4 [J].Computer Science,2022,49(201):337-344.
[20]SHAO R R,LIU Y A,ZHANG W,et al.Review of knowledge distillation in deep learning [J].Chinese Journal of Computers,2022,45(8):1638-1673.
[21]GAO Y,CAO Y J,DUAN P S.Review on lightweight methods of neural network models [J].Computer Science,2024,51(201):23-33.

Related Articles 15

[1]	ZHOU Yi, MAO Kuanmin. Research on Individual Identification of Cattle Based on YOLO-Unet Combined Network [J]. Computer Science, 2025, 52(4): 194-201.
[2]	HE Liren, PENG Bo, CHI Mingmin. Unsupervised Multi-class Anomaly Detection Based on Prototype Reverse Distillation [J]. Computer Science, 2025, 52(2): 202-211.
[3]	TAN Zhiwen, XU Ruzhi, WANG Naiyu, LUO Dan. Differential Privacy Federated Learning Method Based on Knowledge Distillation [J]. Computer Science, 2024, 51(6A): 230600002-8.
[4]	QIAO Hong, XING Hongjie. Attention-based Multi-scale Distillation Anomaly Detection [J]. Computer Science, 2024, 51(6A): 230300223-11.
[5]	SHI Songhao, WANG Xiaodan, YANG Chunxiao, WANG Yifei. SAR Image Target Recognition Based on Cross Domain Few Shot Learning [J]. Computer Science, 2024, 51(6A): 230800136-7.
[6]	SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation [J]. Computer Science, 2024, 51(5): 313-320.
[7]	WANG Xu, LIU Changhong, LI Shengchun, LIU Shuang, ZHAO Kangting, CHEN Liang. Study on Manufacturing Company Automated Chart Analysis Method Based on Natural LanguageGeneration [J]. Computer Science, 2024, 51(4): 174-181.
[8]	CHEN Jinyin, LI Xiao, JIN Haibo, CHEN Ruoxi, ZHENG Haibin, LI Hu. CheatKD:Knowledge Distillation Backdoor Attack Method Based on Poisoned Neuronal Assimilation [J]. Computer Science, 2024, 51(3): 351-359.
[9]	LIU Wei, LIU Yuzhao, TANG Congke, WANG Yuanyuan, SHE Wei, TIAN Zhao. Study on Blockchain Based Federated Distillation Data Sharing Model [J]. Computer Science, 2024, 51(3): 39-47.
[10]	KONG Senlin, ZHANG Hui, HUANG Zhennan, LIU Youwu, TAO Yan. Asymmetric Teacher-Student Network Model for Industrial Image Anomaly Detection [J]. Computer Science, 2024, 51(11A): 240200069-7.
[11]	ZHAO Ran, YUAN Jiabin, FAN Lili. Medical Ultrasound Image Super-resolution Reconstruction Based on Video Multi-frame Fusion [J]. Computer Science, 2023, 50(7): 143-151.
[12]	ZHAO Jiangjiang, WANG Yang, XU Yingying, GAO Yang. Extractive Automatic Summarization Model Based on Knowledge Distillation [J]. Computer Science, 2023, 50(6A): 210300179-7.
[13]	GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[14]	ZHOU Shijin, XING Hongjie. Novelty Detection Method Based on Knowledge Distillation and Efficient Channel Attention [J]. Computer Science, 2023, 50(11A): 220900034-10.
[15]	ZHANG Yu, CAO Xiqing, NIU Saisai, XU Xinlei, ZHANG Qian, WANG Zhe. Incremental Class Learning Approach Based on Prototype Replay and Dynamic Update [J]. Computer Science, 2023, 50(11A): 230300012-7.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multi-assistant Dynamic Setting Method for Knowledge Distillation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0