计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 313-320.doi: 10.11896/jsjkx.240100038
孙婧1, 王晓霞2
SUN Jing1, WANG Xiaoxia2
摘要: 当前卷积神经网络模型的训练和分发流程中,云端拥有充足的计算资源和数据集,但难以应对边缘场景中碎片化的需求。边缘侧能够直接进行模型的训练和推理,但难以直接使用云端按照统一规则训练的卷积神经网络模型。针对在边缘侧资源受限的情况下,卷积神经网络算法进行模型压缩的训练和推理有效性低的问题,首先,提出了一种基于云边协同的模型分发和训练框架,该框架可以结合云端和边缘侧各自的优势进行模型再训练,满足边缘对指定识别目标、指定硬件资源和指定精度的需求。其次,基于云边协同框架训练的思路,对知识蒸馏技术进行改进,提出了新的基于Logits和基于Channels两种子类知识蒸馏方法(SLKD和SCKD),云服务端先提供具有多目标识别的模型,而后通过子类知识蒸馏的方法,在边缘侧将模型重新训练为一个可以在资源受限的场景下部署的轻量化模型。最后,在CIFAR-10公共数据集上,对联合训练框架的有效性和两种子类蒸馏算法进行了验证。实验结果表明,在压缩比为50%的情况下,相比具有全部分类的模型,所提模型推理准确率得到了显著的提升(10%~11%);相比模型的重新训练,通过知识蒸馏方法训练出的模型精度也有显著提高,并且压缩比率越高,模型精度提升越明显。
中图分类号:
[1]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition [J].Computer Science,2014,18(3):178-182. [2]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[J].arXiv.1602.07360,2016. [3]SANDLER M,HOWARD A,ZHU M,et al.MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:4510-4520. [4]FRANKLE J,CARBIN M.The lottery ticket hypothesis:finding sparse,trainable neural networks[C]//Proceedings of the Se-venth International Conference on Learning Representations.New Orleans:ICLR,2019. [5]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proceedings of the Seventh International Conference on Learning Representations.New Orleans:ICLR,2019. [6]HINTON G,VINYALS O,DEAN J.Distilling the knowledge ina neural network[J].Computer Science,2015,14(7):38-39. [7]PANG Y H,ZHANG Y M,WANG Y,et al,Exploring model compression limits and laws:a pyramid knowledge distillation framework for satellite-on-Orbit object recognition[J].IEEE Transactions on Geoscience and Remote Sensing,2024(62):1-13. [8]CAI Y H,YAO Z W,DONG Z,et al,Zeroq:a novel zero shot quantization framework[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:13166-13175. [9]AKBARI A,JAFARI R.Transferring activity recognition mo-dels for new wearable sensors with deep generative domain adaptation[C]// Proceedings of the 18th International Conference on Information Processing in Sensor Networks.New York:ACM,2019:85-96. [10]ROKNI S A,GHASEMZADEH H.Synchronous dynamic viewlearning:a framework for autonomous training of activity recognition models using wearable sensors[C]//Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks.2017:79-90. [11]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep mutual learning[C]// Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:4320-4328. [12]FURLANELLO T,LIPTON Z C,TSCHANNEN M,et al.Born again neural networks[C]//International Conference on Machine Learning.2018:1607-1616. [13]MIRZADEH S I,FARAJTABAR M,LI A,et al.Improvedknowledge distillation via teacher assistant[C]// Proceedings of the AAAI Conference on Artificial Intelligence.2020:5191-5198. [14]ADRIANA R,BALLASN,KAHOU S E,et al.Fitnets:hints for thin deep nets[J].arXiv:1412.6550,2014. [15]PARK W,KIM D,LU Y,et al.Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:3967-3976. [16]HEO B,KIM J,YUN S,et al.A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1920-1931. [17]LOPES R G,FENU S,STARNER T.Data-free knowledge distillation for deep neural networks[J].arXiv:1710.07535,2017. [18]YE J,JI Y,WANG X,et al.Data-Free knowledge amalgamation via group-Stack dual-gan[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR,2020:12513-12522. [19]YOO J,CHO M Y,KIM T.Knowledge extraction with no observable data[J].Advances in Neural Information Processing Systems 32.NeurIPS,2019,32:2701-2710. [20]SONG J,CHEN Y,YE J,et al.Spot-adaptive knowledge distillation[J].IEEE Transactions on Image Processing,IEEE,2022,31:3359-3370. [21]ZHAO B,CUI Q,SONG R J,et al.Decoupled knowledge distillation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11953-11962. [22]BEYER L,ZHAI X,ROYER A,et al.Knowledge distillation:a good teacher is patient and consistent[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10925-10934. [23]HE K,ZHANG X,REN S,et al.Spatial Pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Tran-sactions on Pattern Analysis & Machine Intelligence,2015,37(9):1904-1916. [24]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778. |
|