Computer Science ›› 2023, Vol. 50 ›› Issue (11): 306-316.doi: 10.11896/jsjkx.230300078

• Computer Network • Previous Articles     Next Articles

Adaptive Model Quantization Method for Intelligent Internet of Things Terminal

WANG Yuzhan, GUO Bin, WANG Hongli, LIU Sicong   

  1. College of Computer Science,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:2023-03-10 Revised:2023-07-20 Online:2023-11-15 Published:2023-11-06
  • About author:WANG Yuzhan,born in 2000,master.His main research interests include mobile computing,model compression,and middleware for the Internet of things.GUO Bin,born in 1980,Ph.D,professor.His main research interests include ubiquitous computing,mobile crowd sensing,and HCI.
  • Supported by:
    National Science Fund for Distinguished Young Scholars (62025205) and National Natural Science Foundation of China (62032020, 61725205, 62102317).

Abstract: With the rapid development of deep learning and the Internet of Everything,the combination of deep learning and mobile terminal devices has become a major research hotspot.While deep learning improves the performance of terminal devices,it also faces many challenges when deploying models on resource-constrained terminal devices,such as the limited computing and storage resources of terminal devices,and the inability of deep learning models to adapt to changing device context.We focus on the adaptive quantization of deep models with resource adaptive.Specifically,a resource-adaptive mixed-precision model quantization method is proposed,which firstly uses the gated network and the backbone network to construct the model and partitioned model at layer as the granularity to find the best quantization policy of the model,and combines the edge devices to reduce the model resource consumption.In order to find the optimal model quantization policy,FPGA-based deep learning model deployment is adopted.When the model needs to be deployed on resource-constrained edge devices,adaptive training is performed according to resource constraints,and a quantization-aware method isadopted to reduce the accuracy loss caused by model quantization.Experimental results show that our method can reduce the storage space by 50% while retaining 78% accuracy,and reduce the energy consumption by 60% on the FPGA device with no more than 2% accuracy loss.

Key words: AIoT, Deep learning, Model quantization, Resource adaptation, FPGA

CLC Number: 

  • TP391
[1]LI W,LIEWIG M.A survey of AI accelerators for edge environment[C]//World Conference on Information Systems and Technologies.Cham:Springer,2020:35-44.
[2]ALOM M Z,TAHA T M,YAKOPCIC C,et al.A state-of-the-art survey on deep learning theory and architectures[J].Electronics,2019,8(3):292.
[3]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances inNeural Information Processing Systems,2012,25.
[4]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5058-5066.
[5]GUPTA S,AGRAWAL A,GOPALAKRISHNAN K,et al.Deep learning with limited numerical precision[C]//Interna-tional Conference on Machine Learning.PMLR,2015:1737-1746.
[6]VANHOUCKE V,SENIOR A,MAO M Z.Improving the speed of neural networks on CPUs[C]//NIPS 2011.2011.
[7]JIAO L C,SUN Q G,YANG Y T,et al.Progress,implementation and prospect of deep neural network FPGA design [J].Chinese Journal of Computers,2022,45(3):441-471.
[8]CHEN D,SINGH D.Fractal video compression in OpenCL:An evaluation of CPUs,GPUs,and FPGAs as acceleration platforms[C]//2013 18th Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2013:297-304.
[9]NURVITADHI E,SIM J,SHEFFIELD D,et al.Acceleratingrecurrent neural networks in analytics servers:Comparison of FPGA,CPU,GPU,and ASIC[C]//2016 26th International Conference on Field Programmable Logic and Applications(FPL).IEEE,2016:1-4.
[10]AHMED M T,SINHA S.Design and Development of Efficient Face Recognition Architecture Using Neural Network on FPGA[C]//2018 Second International Conference on Intelligent Computing and Control Systems(ICICCS).IEEE,2018:905-909.
[11]RICE K L,BHUIYAN M A,TAHA T M,et al.FPGA implementation of Izhikevich spiking neural networks for character recognition[C]//2009 International Conference on Reconfigurable Computing and FPGAs.IEEE,2009:451-456.
[12]MA Y,CAO Y,VRUDHULA S,et al.Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.2017:45-54.
[13]SHEN J,WANG Y,XU P,et al.Fractional skipping:Towards finer-grained dynamic cnn inference[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5700-5708.
[14]WANG X,YU F,DOU Z Y,et al.Skipnet:Learning dynamicrouting in convolutional networks[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:409-424.
[15]JACOB B,KLIGYS S,CHEN B,et al.Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2704-2713.
[16]WU Z,NAGARAJAN T,KUMAR A,et al.Blockdrop:Dynamic inference paths in residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8817-8826.
[17]WILLIAMS S,WATERMAN A,PATTERSON D.Roofline:an insightful visual performance model for multicore architectures[J].Communications of the ACM,2009,52(4):65-76.
[18]Dual Lens Camera Module AN5642 User Manual[EB/OL] https://alinx.com/public/upload/file/AN5642_User_Manual.pdf.
[19]POTTER M C,WYBLE B,HAGMANN C E,et al.Detectingmeaning in RSVP at 13 ms per picture[J].Attention,Perception,& Psychophysics,2014,76(2):270-279.
[1] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[2] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[3] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[4] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[5] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[6] SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
[7] WANG Xu, WU Yanxia, ZHANG Xue, HONG Ruize, LI Guangsheng. Survey of Rotating Object Detection Research in Computer Vision [J]. Computer Science, 2023, 50(8): 79-92.
[8] ZHOU Ziyi, XIONG Hailing. Image Captioning Optimization Strategy Based on Deep Learning [J]. Computer Science, 2023, 50(8): 99-110.
[9] ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[10] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[11] LI Kun, GUO Wei, ZHANG Fan, DU Jiayu, YANG Meiyue. Adversarial Malware Generation Method Based on Genetic Algorithm [J]. Computer Science, 2023, 50(7): 325-331.
[12] WANG Mingxia, XIONG Yun. Disease Diagnosis Prediction Algorithm Based on Contrastive Learning [J]. Computer Science, 2023, 50(7): 46-52.
[13] SHEN Zhehui, WANG Kailai, KONG Xiangjie. Exploring Station Spatio-Temporal Mobility Pattern:A Short and Long-term Traffic Prediction Framework [J]. Computer Science, 2023, 50(7): 98-106.
[14] HUO Weile, JING Tao, REN Shuang. Review of 3D Object Detection for Autonomous Driving [J]. Computer Science, 2023, 50(7): 107-118.
[15] ZHOU Bo, JIANG Peifeng, DUAN Chang, LUO Yuetong. Study on Single Background Object Detection Oriented Improved-RetinaNet Model and Its Application [J]. Computer Science, 2023, 50(7): 137-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!