计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 306-316.doi: 10.11896/jsjkx.230300078

• 计算机网络 • 上一篇    下一篇

智能物联网终端自适应模型量化方法

王羽展, 郭斌, 王虹力, 刘思聪   

  1. 西北工业大学计算机学院 西安 710129
  • 收稿日期:2023-03-10 修回日期:2023-07-20 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 郭斌(guob@nwpu.edu.cn)
  • 作者简介:(wyz_yy@mail.nwpu.edu.cn)
  • 基金资助:
    国家杰出青年科学基金(62025205);国家自然科学基金(62032020, 61725205, 62102317)

Adaptive Model Quantization Method for Intelligent Internet of Things Terminal

WANG Yuzhan, GUO Bin, WANG Hongli, LIU Sicong   

  1. College of Computer Science,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:2023-03-10 Revised:2023-07-20 Online:2023-11-15 Published:2023-11-06
  • About author:WANG Yuzhan,born in 2000,master.His main research interests include mobile computing,model compression,and middleware for the Internet of things.GUO Bin,born in 1980,Ph.D,professor.His main research interests include ubiquitous computing,mobile crowd sensing,and HCI.
  • Supported by:
    National Science Fund for Distinguished Young Scholars (62025205) and National Natural Science Foundation of China (62032020, 61725205, 62102317).

摘要: 随着深度学习与万物互联的快速发展,将深度学习与移动终端设备结合已经成为了一大研究热点。深度学习给终端设备带来性能提升的同时,将模型部署在资源受限的终端设备时也面临诸多挑战,如终端设备计算和存储资源受限,深度学习模型难以适应不断变化的设备状态等。基于此,研究了资源自适应的深度学习模型自适应量化问题。提出资源自适应混合精度模型量化方法,利用门控网络和骨干网络进行模型构建,以层为粒度寻找模型最佳量化策略,结合边端设备降低模型资源消耗。为了寻找最优模型量化策略,采取基于FPGA的深度学习模型部署。需要将模型部署在资源受限的边端设备上时,根据资源约束进行自适应训练,采取量化感知方法降低模型量化带来的精度损失。实验结果表明,该方法能够在保留78%的准确率的同时,降低50%的存储空间;同时,在FPGA设备上模型精度下降不超过2%,而能源消耗降低60%。

关键词: 智能物联网, 深度学习, 模型量化, 资源自适应, FPGA

Abstract: With the rapid development of deep learning and the Internet of Everything,the combination of deep learning and mobile terminal devices has become a major research hotspot.While deep learning improves the performance of terminal devices,it also faces many challenges when deploying models on resource-constrained terminal devices,such as the limited computing and storage resources of terminal devices,and the inability of deep learning models to adapt to changing device context.We focus on the adaptive quantization of deep models with resource adaptive.Specifically,a resource-adaptive mixed-precision model quantization method is proposed,which firstly uses the gated network and the backbone network to construct the model and partitioned model at layer as the granularity to find the best quantization policy of the model,and combines the edge devices to reduce the model resource consumption.In order to find the optimal model quantization policy,FPGA-based deep learning model deployment is adopted.When the model needs to be deployed on resource-constrained edge devices,adaptive training is performed according to resource constraints,and a quantization-aware method isadopted to reduce the accuracy loss caused by model quantization.Experimental results show that our method can reduce the storage space by 50% while retaining 78% accuracy,and reduce the energy consumption by 60% on the FPGA device with no more than 2% accuracy loss.

Key words: AIoT, Deep learning, Model quantization, Resource adaptation, FPGA

中图分类号: 

  • TP391
[1]LI W,LIEWIG M.A survey of AI accelerators for edge environment[C]//World Conference on Information Systems and Technologies.Cham:Springer,2020:35-44.
[2]ALOM M Z,TAHA T M,YAKOPCIC C,et al.A state-of-the-art survey on deep learning theory and architectures[J].Electronics,2019,8(3):292.
[3]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances inNeural Information Processing Systems,2012,25.
[4]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5058-5066.
[5]GUPTA S,AGRAWAL A,GOPALAKRISHNAN K,et al.Deep learning with limited numerical precision[C]//Interna-tional Conference on Machine Learning.PMLR,2015:1737-1746.
[6]VANHOUCKE V,SENIOR A,MAO M Z.Improving the speed of neural networks on CPUs[C]//NIPS 2011.2011.
[7]JIAO L C,SUN Q G,YANG Y T,et al.Progress,implementation and prospect of deep neural network FPGA design [J].Chinese Journal of Computers,2022,45(3):441-471.
[8]CHEN D,SINGH D.Fractal video compression in OpenCL:An evaluation of CPUs,GPUs,and FPGAs as acceleration platforms[C]//2013 18th Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2013:297-304.
[9]NURVITADHI E,SIM J,SHEFFIELD D,et al.Acceleratingrecurrent neural networks in analytics servers:Comparison of FPGA,CPU,GPU,and ASIC[C]//2016 26th International Conference on Field Programmable Logic and Applications(FPL).IEEE,2016:1-4.
[10]AHMED M T,SINHA S.Design and Development of Efficient Face Recognition Architecture Using Neural Network on FPGA[C]//2018 Second International Conference on Intelligent Computing and Control Systems(ICICCS).IEEE,2018:905-909.
[11]RICE K L,BHUIYAN M A,TAHA T M,et al.FPGA implementation of Izhikevich spiking neural networks for character recognition[C]//2009 International Conference on Reconfigurable Computing and FPGAs.IEEE,2009:451-456.
[12]MA Y,CAO Y,VRUDHULA S,et al.Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.2017:45-54.
[13]SHEN J,WANG Y,XU P,et al.Fractional skipping:Towards finer-grained dynamic cnn inference[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5700-5708.
[14]WANG X,YU F,DOU Z Y,et al.Skipnet:Learning dynamicrouting in convolutional networks[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:409-424.
[15]JACOB B,KLIGYS S,CHEN B,et al.Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2704-2713.
[16]WU Z,NAGARAJAN T,KUMAR A,et al.Blockdrop:Dynamic inference paths in residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8817-8826.
[17]WILLIAMS S,WATERMAN A,PATTERSON D.Roofline:an insightful visual performance model for multicore architectures[J].Communications of the ACM,2009,52(4):65-76.
[18]Dual Lens Camera Module AN5642 User Manual[EB/OL] https://alinx.com/public/upload/file/AN5642_User_Manual.pdf.
[19]POTTER M C,WYBLE B,HAGMANN C E,et al.Detectingmeaning in RSVP at 13 ms per picture[J].Attention,Perception,& Psychophysics,2014,76(2):270-279.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!