计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220800045-7.doi: 10.11896/jsjkx.220800045

• 计算机软件&体系架构 • 上一篇    下一篇

面向边缘计算的轻量级网络硬件加速设计

余运俊1, 张鹏飞1, 龚汉城2, 陈敏2   

  1. 1 南昌大学信息工程学院 南昌 330000
    2 江西江投数字经济研究院 南昌 330000
  • 发布日期:2023-11-09
  • 通讯作者: 余运俊(yuyunjun@ncu.edu.cn)
  • 基金资助:
    国家国际科技合作专项(2014DFG72240);江西省重点研发计划项目(20214BBG74006)

Lightweight Network Hardware Acceleration Design for Edge Computing

YU Yunjun1, ZHANG Pengfei1, GONG Hancheng2, CHEN Min2   

  1. 1 School of Information Engineering,Nanchang University,Nanchang 330000,China
    2 Jiangxi Jiangtou Digital Economy Research Institute,Nanchang 330000,China
  • Published:2023-11-09
  • About author:YU Yunjun,born in 1978,Ph.D,asso-ciate professor.His main research in-terests include fault diagnosis,active disturbance rejection control(ADRC),data-driven optimal control and its applications in microgrids,and low-carbon electricity technology.
  • Supported by:
    National International Science and Technology Cooperation Project(2014DFG72240) and Key R&D Program in Jiangxi Province(20214BBG74006).

摘要: 随着边缘设备数据的增多和神经网络的不断落地应用,边缘计算为以云计算为核心的大数据技术分担了压力。现场可编程门阵列(FPGA)因灵活的体系结构和低功耗,在边缘计算以及构建神经网络加速器中显示出优异的特性。但是,传统的基于传统卷积算法的 FPGA 解决方案往往受到片上计算单元数量的限制。使用 Zynq 作为硬件加速平台,对参数进行定点量化,利用数组分区提高流水线运行速度。采用 Winograd 快速卷积算法对传统的卷积进行改进,将卷积运算中的乘法运算转换为加法运算,降低了模型的计算复杂度,极大提高了所设计的加速器的计算性能。实验表明,XC7Z035工作在150MHz时钟下获得了 43.5GOP/s 的性能,能效是 Xeon(R) Silver 4214R 的 129倍,是双核 ARM 的 159 倍。所提方案在资源和功耗受限的情况下可以提供较高的性能,适用于网络边缘端对轻量级神经网络的落地应用。

关键词: 边缘计算, 硬件加速, 轻量级卷积神经网络, Winograd, FPGA

Abstract: With the increase of edge device data and the continuous application of neural networks,the rise of edge computing has shared the pressure on big data technologies with cloud computing as the core.Field programmable gate arrays(FPGAs) have shown excellent properties in edge computing and building neural network accelerators due to their flexible architecture and low power consumption.But traditional FPGA solutions based on traditional convolution algorithms are often limited by the number of on-chip computing units.In this paper,Zynq is used as a hardware acceleration platform,to quantize parameters at a fixed point,and array partitioning is used to improve pipeline running speed.The Winograd fast convolution algorithm is used to improve the traditional convolution,and the multiplication operation in the convolution operation is converted into an addition operation,which reduces the computational complexity of the model.The computational performance of the designed accelerator is greatly improved.Experiments show that XC7Z035 can achieve 43.5GOP/s performance under 150 MHz clock,and the energy efficiency is 129 times of Xeon(R) Silver 4214R and 159 timesof dual-core ARM.The proposedsolution is limited in resources and power consumption.It can provide high performance and is suitable for the landing application of lightweight neural networks at the edge of the network.

Key words: Edge computing, Hardware acceleration, Lightweight convolutional neural networks, Winograd, FPGA

中图分类号: 

  • TP391
[1]GIRSHICK R,DONAHUE J,DARRELL T et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition 2014:580-587.
[2]LECUN Y,BOTTOU L.Gradient-based learning applied to document recognition[C]//Proceedings of the IEEE.1998:2278-2324.
[3]KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNetClassification with Deep Convolutional Neural Networks[C]//NIPS 2012.2012.
[4]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1506,2014.
[5]SZEGEDY C,LIU W,JIA Y et al.Going Deeper with Convolutions[C]//CVPR.2015:1-9.
[6]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:770-778.
[7]LU Z,CHEN Y,LI T,et al.Convolutional Neural NetworkConstruction Method for Embedded FPGAs Oriented Edge Computing[J].Compute Research Develope,2018,55:12.
[8]COATES A,HUVAL B,WANG T,et al.Deep learning withCOTS HPC systems[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning.JMLR.org,2013:1337-1345.
[9]JOUPPI N P,YOUNG C,PATIL N,et al.In-Datacenter Per-formance Analysis of a Tensor Processing Unit[C]//the 44th Annual International Symposium,2017:1-12.
[10]AMIRI M,SIDDIQUI F M,KELLY C,et al.FPGA-Based Soft-Core Processors for Image Processing Applications[J].Journal of Signal Processing Systems,2017,87:139-156.
[11]SZE V,CHEN Y H,YANG T J,et al.Efficient Processing of Deep Neural Networks:A Tutorial and Survey[C]//Proceedings of the IEEE.2017.
[12]YANG T J,CHEN Y H,SZE V.Designing Energy-EfficientConvolutional Neural Networks using Energy-Aware Pruning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017.
[13]LU L Q,ZHENG S Z,XIAO Q C,et al.Accelerating convolutional neural networks on FPGAs[J].Science China Information Sciences,2019,49:277-294.
[14]LIANG Y,LU L,XIAO Q,et al.Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2020,39:857-870.
[15]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//CVPR.IEEE,2016:779-788.
[16]TAIGMAN Y,YANG M,RANZATO M,et al.DeepFace:Closing the Gap to Human-Level Performance in Face Verification[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.2014:1701-1708.
[17]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:3431-3440.
[18]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39:1137-1149.
[19]HAN S,POOL J,TRAN J,et al.Learning both Weights and Connections for Efficient Neural Networks[J].Advances in Neural Information Processing Systems,2015,28:1135-1143.
[20]ZHANG Q,ZHANG M,CHEN T,et al.Recent Advances in Convolutional Neural Network Acceleration[J].Neurocomputing,2019,323:37-51.
[21]YOUNG S,WANG Z,TAUBMAN D,et al.Transform Quantization for CNN Compression[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44:5700-5714.
[22]CHEN J,LIU L,LIU Y,et al.A Learning Framework for 〈italic〉n〈/ita-lic〉-Bit Quantized Neural Networks Toward FPGAs[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32:1067-1081.
[23]AYACHI R,SAID Y,BEN ABDELALI A.Optimizing Neural Networks for Efficient FPGA Implementation:A Survey[J].Archives of Computational Methods in Engineering,2021,28:4537-4547.
[24]CHEN Y R,WANG Y T.A survey of architectures of neural network accelerators[J].Science China Information Sciences,2022,52:16.
[25]CHAKRADHAR S T,SANKARADASS M,JAKKULA V,et al.A dynamically configurable coprocessor for convolutional neural networks[C]//37th International Symposium on Computer Architecture(ISCA 2010).Saint-Malo,France,2010.
[26]ZHAO R,NIU X,WU Y,et al.Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms[C]//13th International Symposium on Applied Reconfigurable Computing(ARC).2017:255-267.
[27]GOKHALE V,JIN J,DUNDAR A,et al.A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.2014:696-701.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!