Computer Science ›› 2024, Vol. 51 ›› Issue (5): 313-320.doi: 10.11896/jsjkx.240100038

• Computer Network • Previous Articles     Next Articles

Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation

SUN Jing1, WANG Xiaoxia2   

  1. 1 Department of Intelligent Science and Information Law,East China University of Political Science and Law,Shanghai 201620,China
    2 School of Computer Science and Engineering,Northwest Normal University,Lanzhou,730070,China
  • Received:2024-01-02 Revised:2024-03-25 Online:2024-05-15 Published:2024-05-08
  • About author:SUN Jing,born in 1985,Ph.D,lecturer,is a member of CCF(No.30246M).Her main research interests include distributed storage systems,edge computing and knowledge distillation.
  • Supported by:
    National Natural Science Foundation of China(12161080).

Abstract: In the current training and distribution process of convolutional neural network models,the cloud has sufficient computing resources and datasets,but it is difficult to cope with the demand for fragmentation in edge scenes.The edge side can directly train and infer models,but it is difficult to directly use the convolutional neural network models trained in the cloud according to unified rules.To address the issue of low training and inference effectiveness of convolutional neural network algorithms for model compression in the context of limited resources on the edge side,a model distribution and training framework based on cloud edge collaboration is firstly proposed.This framework can combine the advantages of both cloud and edge sides for model retraining,meeting the edge's requirements for specified recognition targets,specified hardware resources,and specified accuracy.Secondly,based on the training approach of the cloud edge collaborative framework,new subclass knowledge distillation methods based on logits and channels (SLKD and SCKD) are proposed to improve knowledge distillation technology.The cloud server first provides a model with multi-target recognition,and then through the subclass knowledge distillation method,the model is retrained on the edge side into a lightweight model that can be deployed in resource limited scenarios.Finally,the effectiveness of the joint training framework and the two subcategory distillation algorithm are validated on the CIFAR-10 dataset.The experimental results show that at a compression ratio of 50%,the inference accuracy is improved by 10% to 11% compared to models with full classification.Compared to the retraining of the model,the accuracy of the model trained through knowledge distillation method has also been greatly improved,and the higher the compression ratio,the more significant the improvement in model accuracy.

Key words: Cloud edge collaboration, Deep learning, Knowledge distillation, Model compression, Feature extraction

CLC Number: 

  • TP391.4
[1]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition [J].Computer Science,2014,18(3):178-182.
[2]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[J].arXiv.1602.07360,2016.
[3]SANDLER M,HOWARD A,ZHU M,et al.MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:4510-4520.
[4]FRANKLE J,CARBIN M.The lottery ticket hypothesis:finding sparse,trainable neural networks[C]//Proceedings of the Se-venth International Conference on Learning Representations.New Orleans:ICLR,2019.
[5]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proceedings of the Seventh International Conference on Learning Representations.New Orleans:ICLR,2019.
[6]HINTON G,VINYALS O,DEAN J.Distilling the knowledge ina neural network[J].Computer Science,2015,14(7):38-39.
[7]PANG Y H,ZHANG Y M,WANG Y,et al,Exploring model compression limits and laws:a pyramid knowledge distillation framework for satellite-on-Orbit object recognition[J].IEEE Transactions on Geoscience and Remote Sensing,2024(62):1-13.
[8]CAI Y H,YAO Z W,DONG Z,et al,Zeroq:a novel zero shot quantization framework[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:13166-13175.
[9]AKBARI A,JAFARI R.Transferring activity recognition mo-dels for new wearable sensors with deep generative domain adaptation[C]// Proceedings of the 18th International Conference on Information Processing in Sensor Networks.New York:ACM,2019:85-96.
[10]ROKNI S A,GHASEMZADEH H.Synchronous dynamic viewlearning:a framework for autonomous training of activity recognition models using wearable sensors[C]//Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks.2017:79-90.
[11]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep mutual learning[C]// Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:4320-4328.
[12]FURLANELLO T,LIPTON Z C,TSCHANNEN M,et al.Born again neural networks[C]//International Conference on Machine Learning.2018:1607-1616.
[13]MIRZADEH S I,FARAJTABAR M,LI A,et al.Improvedknowledge distillation via teacher assistant[C]// Proceedings of the AAAI Conference on Artificial Intelligence.2020:5191-5198.
[14]ADRIANA R,BALLASN,KAHOU S E,et al.Fitnets:hints for thin deep nets[J].arXiv:1412.6550,2014.
[15]PARK W,KIM D,LU Y,et al.Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:3967-3976.
[16]HEO B,KIM J,YUN S,et al.A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1920-1931.
[17]LOPES R G,FENU S,STARNER T.Data-free knowledge distillation for deep neural networks[J].arXiv:1710.07535,2017.
[18]YE J,JI Y,WANG X,et al.Data-Free knowledge amalgamation via group-Stack dual-gan[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.CVPR,2020:12513-12522.
[19]YOO J,CHO M Y,KIM T.Knowledge extraction with no observable data[J].Advances in Neural Information Processing Systems 32.NeurIPS,2019,32:2701-2710.
[20]SONG J,CHEN Y,YE J,et al.Spot-adaptive knowledge distillation[J].IEEE Transactions on Image Processing,IEEE,2022,31:3359-3370.
[21]ZHAO B,CUI Q,SONG R J,et al.Decoupled knowledge distillation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11953-11962.
[22]BEYER L,ZHAI X,ROYER A,et al.Knowledge distillation:a good teacher is patient and consistent[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10925-10934.
[23]HE K,ZHANG X,REN S,et al.Spatial Pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Tran-sactions on Pattern Analysis & Machine Intelligence,2015,37(9):1904-1916.
[24]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[1] BAO Kainan, ZHANG Junbo, SONG Li, LI Tianrui. ST-WaveMLP:Spatio-Temporal Global-aware Network for Traffic Flow Prediction [J]. Computer Science, 2024, 51(5): 27-34.
[2] ZHANG Jianliang, LI Yang, ZHU Qingshan, XUE Hongling, MA Junwei, ZHANG Lixia, BI Sheng. Substation Equipment Malfunction Alarm Algorithm Based on Dual-domain Sparse Transformer [J]. Computer Science, 2024, 51(5): 62-69.
[3] HE Shiyang, WANG Zhaohui, GONG Shengrong, ZHONG Shan. Cross-modal Information Filtering-based Networks for Visual Question Answering [J]. Computer Science, 2024, 51(5): 85-91.
[4] SONG Jianfeng, ZHANG Wenying, HAN Lu, HU Guozheng, MIAO Qiguang. Multi-stage Intelligent Color Restoration Algorithm for Black-and-White Movies [J]. Computer Science, 2024, 51(5): 92-99.
[5] HE Xiaohui, ZHOU Tao, LI Panle, CHANG Jing, LI Jiamian. Study on Building Extraction from Remote Sensing Image Based on Multi-scale Attention [J]. Computer Science, 2024, 51(5): 134-142.
[6] XU Xuejie, WANG Baohui. Multi-label Patent Classification Based on Text and Historical Data [J]. Computer Science, 2024, 51(5): 172-178.
[7] LI Zichen, YI Xiuwen, CHEN Shun, ZHANG Junbo, LI Tianrui. Government Event Dispatch Approach Based on Deep Multi-view Network [J]. Computer Science, 2024, 51(5): 216-222.
[8] HONG Tijing, LIU Dengfeng, LIU Yian. Radar Active Jamming Recognition Based on Multiscale Fully Convolutional Neural Network and GRU [J]. Computer Science, 2024, 51(5): 306-312.
[9] CHEN Runhuan, DAI Hua, ZHENG Guineng, LI Hui , YANG Geng. Urban Electricity Load Forecasting Method Based on Discrepancy Compensation and Short-termSampling Contrastive Loss [J]. Computer Science, 2024, 51(4): 158-164.
[10] LIN Binwei, YU Zhiyong, HUANG Fangwan, GUO Xianwei. Data Completion and Prediction of Street Parking Spaces Based on Transformer [J]. Computer Science, 2024, 51(4): 165-173.
[11] WANG Xu, LIU Changhong, LI Shengchun, LIU Shuang, ZHAO Kangting, CHEN Liang. Study on Manufacturing Company Automated Chart Analysis Method Based on Natural LanguageGeneration [J]. Computer Science, 2024, 51(4): 174-181.
[12] SONG Hao, MAO Kuanmin, ZHU Zhou. Algorithm of Stereo Matching Based on GAANET [J]. Computer Science, 2024, 51(4): 229-235.
[13] XUE Jinqiang, WU Qin. Progressive Multi-stage Image Denoising Algorithm Combining Convolutional Neural Network and
Multi-layer Perceptron
[J]. Computer Science, 2024, 51(4): 243-253.
[14] LIU Wei, LIU Yuzhao, TANG Congke, WANG Yuanyuan, SHE Wei, TIAN Zhao. Study on Blockchain Based Federated Distillation Data Sharing Model [J]. Computer Science, 2024, 51(3): 39-47.
[15] HUANG Kun, SUN Weiwei. Traffic Speed Forecasting Algorithm Based on Missing Data [J]. Computer Science, 2024, 51(3): 72-80.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!