基于云边协同子类蒸馏的卷积神经网络模型压缩方法

doi:10.11896/jsjkx.240100038

计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 313-320.doi: 10.11896/jsjkx.240100038

基于云边协同子类蒸馏的卷积神经网络模型压缩方法

孙婧¹, 王晓霞²

1 华东政法大学智能科学与信息法学系上海 201620
2 西北师范大学计算机科学与工程学院兰州 730070

收稿日期:2024-01-02 修回日期:2024-03-25 出版日期:2024-05-15 发布日期:2024-05-08
通讯作者: 孙婧(jingsuncs@126.com)
基金资助:
国家自然科学基金(12161080)

Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation

SUN Jing¹, WANG Xiaoxia²

1 Department of Intelligent Science and Information Law,East China University of Political Science and Law,Shanghai 201620,China
2 School of Computer Science and Engineering,Northwest Normal University,Lanzhou,730070,China

Received:2024-01-02 Revised:2024-03-25 Online:2024-05-15 Published:2024-05-08
About author:SUN Jing,born in 1985,Ph.D,lecturer,is a member of CCF(No.30246M).Her main research interests include distributed storage systems,edge computing and knowledge distillation.
Supported by:
National Natural Science Foundation of China(12161080).

摘要/Abstract

摘要： 当前卷积神经网络模型的训练和分发流程中,云端拥有充足的计算资源和数据集,但难以应对边缘场景中碎片化的需求。边缘侧能够直接进行模型的训练和推理,但难以直接使用云端按照统一规则训练的卷积神经网络模型。针对在边缘侧资源受限的情况下,卷积神经网络算法进行模型压缩的训练和推理有效性低的问题,首先,提出了一种基于云边协同的模型分发和训练框架,该框架可以结合云端和边缘侧各自的优势进行模型再训练,满足边缘对指定识别目标、指定硬件资源和指定精度的需求。其次,基于云边协同框架训练的思路,对知识蒸馏技术进行改进,提出了新的基于Logits和基于Channels两种子类知识蒸馏方法(SLKD和SCKD),云服务端先提供具有多目标识别的模型,而后通过子类知识蒸馏的方法,在边缘侧将模型重新训练为一个可以在资源受限的场景下部署的轻量化模型。最后,在CIFAR-10公共数据集上,对联合训练框架的有效性和两种子类蒸馏算法进行了验证。实验结果表明,在压缩比为50%的情况下,相比具有全部分类的模型,所提模型推理准确率得到了显著的提升(10%~11%);相比模型的重新训练,通过知识蒸馏方法训练出的模型精度也有显著提高,并且压缩比率越高,模型精度提升越明显。

关键词: 云边协同, 深度学习, 知识蒸馏, 模型压缩, 特征提取

Abstract: In the current training and distribution process of convolutional neural network models,the cloud has sufficient computing resources and datasets,but it is difficult to cope with the demand for fragmentation in edge scenes.The edge side can directly train and infer models,but it is difficult to directly use the convolutional neural network models trained in the cloud according to unified rules.To address the issue of low training and inference effectiveness of convolutional neural network algorithms for model compression in the context of limited resources on the edge side,a model distribution and training framework based on cloud edge collaboration is firstly proposed.This framework can combine the advantages of both cloud and edge sides for model retraining,meeting the edge's requirements for specified recognition targets,specified hardware resources,and specified accuracy.Secondly,based on the training approach of the cloud edge collaborative framework,new subclass knowledge distillation methods based on logits and channels (SLKD and SCKD) are proposed to improve knowledge distillation technology.The cloud server first provides a model with multi-target recognition,and then through the subclass knowledge distillation method,the model is retrained on the edge side into a lightweight model that can be deployed in resource limited scenarios.Finally,the effectiveness of the joint training framework and the two subcategory distillation algorithm are validated on the CIFAR-10 dataset.The experimental results show that at a compression ratio of 50%,the inference accuracy is improved by 10% to 11% compared to models with full classification.Compared to the retraining of the model,the accuracy of the model trained through knowledge distillation method has also been greatly improved,and the higher the compression ratio,the more significant the improvement in model accuracy.

Key words: Cloud edge collaboration, Deep learning, Knowledge distillation, Model compression, Feature extraction

中图分类号:

TP391.4

孙婧, 王晓霞. 基于云边协同子类蒸馏的卷积神经网络模型压缩方法[J]. 计算机科学, 2024, 51(5): 313-320. https://doi.org/10.11896/jsjkx.240100038

SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation[J]. Computer Science, 2024, 51(5): 313-320. https://doi.org/10.11896/jsjkx.240100038

参考文献

[1]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition [J].Computer Science,2014,18(3):178-182.
[2]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[J].arXiv.1602.07360,2016.
[3]SANDLER M,HOWARD A,ZHU M,et al.MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:4510-4520.
[4]FRANKLE J,CARBIN M.The lottery ticket hypothesis:finding sparse,trainable neural networks[C]//Proceedings of the Se-venth International Conference on Learning Representations.New Orleans:ICLR,2019.
[5]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proceedings of the Seventh International Conference on Learning Representations.New Orleans:ICLR,2019.
[6]HINTON G,VINYALS O,DEAN J.Distilling the knowledge ina neural network[J].Computer Science,2015,14(7):38-39.
[7]PANG Y H,ZHANG Y M,WANG Y,et al,Exploring model compression limits and laws:a pyramid knowledge distillation framework for satellite-on-Orbit object recognition[J].IEEE Transactions on Geoscience and Remote Sensing,2024(62):1-13.
[8]CAI Y H,YAO Z W,DONG Z,et al,Zeroq:a novel zero shot quantization framework[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:13166-13175.
[9]AKBARI A,JAFARI R.Transferring activity recognition mo-dels for new wearable sensors with deep generative domain adaptation[C]// Proceedings of the 18th International Conference on Information Processing in Sensor Networks.New York:ACM,2019:85-96.
[10]ROKNI S A,GHASEMZADEH H.Synchronous dynamic viewlearning:a framework for autonomous training of activity recognition models using wearable sensors[C]//Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks.2017:79-90.
[11]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep mutual learning[C]// Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:4320-4328.
[12]FURLANELLO T,LIPTON Z C,TSCHANNEN M,et al.Born again neural networks[C]//International Conference on Machine Learning.2018:1607-1616.
[13]MIRZADEH S I,FARAJTABAR M,LI A,et al.Improvedknowledge distillation via teacher assistant[C]// Proceedings of the AAAI Conference on Artificial Intelligence.2020:5191-5198.
[14]ADRIANA R,BALLASN,KAHOU S E,et al.Fitnets:hints for thin deep nets[J].arXiv:1412.6550,2014.
[15]PARK W,KIM D,LU Y,et al.Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:3967-3976.
[16]HEO B,KIM J,YUN S,et al.A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1920-1931.
[17]LOPES R G,FENU S,STARNER T.Data-free knowledge distillation for deep neural networks[J].arXiv:1710.07535,2017.
[18]YE J,JI Y,WANG X,et al.Data-Free knowledge amalgamation via group-Stack dual-gan[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR,2020:12513-12522.
[19]YOO J,CHO M Y,KIM T.Knowledge extraction with no observable data[J].Advances in Neural Information Processing Systems 32.NeurIPS,2019,32:2701-2710.
[20]SONG J,CHEN Y,YE J,et al.Spot-adaptive knowledge distillation[J].IEEE Transactions on Image Processing,IEEE,2022,31:3359-3370.
[21]ZHAO B,CUI Q,SONG R J,et al.Decoupled knowledge distillation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11953-11962.
[22]BEYER L,ZHAI X,ROYER A,et al.Knowledge distillation:a good teacher is patient and consistent[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10925-10934.
[23]HE K,ZHANG X,REN S,et al.Spatial Pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Tran-sactions on Pattern Analysis & Machine Intelligence,2015,37(9):1904-1916.
[24]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于云边协同子类蒸馏的卷积神经网络模型压缩方法

Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0