计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 306-316.doi: 10.11896/jsjkx.230300078
王羽展, 郭斌, 王虹力, 刘思聪
WANG Yuzhan, GUO Bin, WANG Hongli, LIU Sicong
摘要: 随着深度学习与万物互联的快速发展,将深度学习与移动终端设备结合已经成为了一大研究热点。深度学习给终端设备带来性能提升的同时,将模型部署在资源受限的终端设备时也面临诸多挑战,如终端设备计算和存储资源受限,深度学习模型难以适应不断变化的设备状态等。基于此,研究了资源自适应的深度学习模型自适应量化问题。提出资源自适应混合精度模型量化方法,利用门控网络和骨干网络进行模型构建,以层为粒度寻找模型最佳量化策略,结合边端设备降低模型资源消耗。为了寻找最优模型量化策略,采取基于FPGA的深度学习模型部署。需要将模型部署在资源受限的边端设备上时,根据资源约束进行自适应训练,采取量化感知方法降低模型量化带来的精度损失。实验结果表明,该方法能够在保留78%的准确率的同时,降低50%的存储空间;同时,在FPGA设备上模型精度下降不超过2%,而能源消耗降低60%。
中图分类号:
[1]LI W,LIEWIG M.A survey of AI accelerators for edge environment[C]//World Conference on Information Systems and Technologies.Cham:Springer,2020:35-44. [2]ALOM M Z,TAHA T M,YAKOPCIC C,et al.A state-of-the-art survey on deep learning theory and architectures[J].Electronics,2019,8(3):292. [3]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances inNeural Information Processing Systems,2012,25. [4]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5058-5066. [5]GUPTA S,AGRAWAL A,GOPALAKRISHNAN K,et al.Deep learning with limited numerical precision[C]//Interna-tional Conference on Machine Learning.PMLR,2015:1737-1746. [6]VANHOUCKE V,SENIOR A,MAO M Z.Improving the speed of neural networks on CPUs[C]//NIPS 2011.2011. [7]JIAO L C,SUN Q G,YANG Y T,et al.Progress,implementation and prospect of deep neural network FPGA design [J].Chinese Journal of Computers,2022,45(3):441-471. [8]CHEN D,SINGH D.Fractal video compression in OpenCL:An evaluation of CPUs,GPUs,and FPGAs as acceleration platforms[C]//2013 18th Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2013:297-304. [9]NURVITADHI E,SIM J,SHEFFIELD D,et al.Acceleratingrecurrent neural networks in analytics servers:Comparison of FPGA,CPU,GPU,and ASIC[C]//2016 26th International Conference on Field Programmable Logic and Applications(FPL).IEEE,2016:1-4. [10]AHMED M T,SINHA S.Design and Development of Efficient Face Recognition Architecture Using Neural Network on FPGA[C]//2018 Second International Conference on Intelligent Computing and Control Systems(ICICCS).IEEE,2018:905-909. [11]RICE K L,BHUIYAN M A,TAHA T M,et al.FPGA implementation of Izhikevich spiking neural networks for character recognition[C]//2009 International Conference on Reconfigurable Computing and FPGAs.IEEE,2009:451-456. [12]MA Y,CAO Y,VRUDHULA S,et al.Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.2017:45-54. [13]SHEN J,WANG Y,XU P,et al.Fractional skipping:Towards finer-grained dynamic cnn inference[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5700-5708. [14]WANG X,YU F,DOU Z Y,et al.Skipnet:Learning dynamicrouting in convolutional networks[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:409-424. [15]JACOB B,KLIGYS S,CHEN B,et al.Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2704-2713. [16]WU Z,NAGARAJAN T,KUMAR A,et al.Blockdrop:Dynamic inference paths in residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8817-8826. [17]WILLIAMS S,WATERMAN A,PATTERSON D.Roofline:an insightful visual performance model for multicore architectures[J].Communications of the ACM,2009,52(4):65-76. [18]Dual Lens Camera Module AN5642 User Manual[EB/OL] https://alinx.com/public/upload/file/AN5642_User_Manual.pdf. [19]POTTER M C,WYBLE B,HAGMANN C E,et al.Detectingmeaning in RSVP at 13 ms per picture[J].Attention,Perception,& Psychophysics,2014,76(2):270-279. |
|