计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220800045-7.doi: 10.11896/jsjkx.220800045
余运俊1, 张鹏飞1, 龚汉城2, 陈敏2
YU Yunjun1, ZHANG Pengfei1, GONG Hancheng2, CHEN Min2
摘要: 随着边缘设备数据的增多和神经网络的不断落地应用,边缘计算为以云计算为核心的大数据技术分担了压力。现场可编程门阵列(FPGA)因灵活的体系结构和低功耗,在边缘计算以及构建神经网络加速器中显示出优异的特性。但是,传统的基于传统卷积算法的 FPGA 解决方案往往受到片上计算单元数量的限制。使用 Zynq 作为硬件加速平台,对参数进行定点量化,利用数组分区提高流水线运行速度。采用 Winograd 快速卷积算法对传统的卷积进行改进,将卷积运算中的乘法运算转换为加法运算,降低了模型的计算复杂度,极大提高了所设计的加速器的计算性能。实验表明,XC7Z035工作在150MHz时钟下获得了 43.5GOP/s 的性能,能效是 Xeon(R) Silver 4214R 的 129倍,是双核 ARM 的 159 倍。所提方案在资源和功耗受限的情况下可以提供较高的性能,适用于网络边缘端对轻量级神经网络的落地应用。
中图分类号:
[1]GIRSHICK R,DONAHUE J,DARRELL T et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition 2014:580-587. [2]LECUN Y,BOTTOU L.Gradient-based learning applied to document recognition[C]//Proceedings of the IEEE.1998:2278-2324. [3]KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNetClassification with Deep Convolutional Neural Networks[C]//NIPS 2012.2012. [4]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1506,2014. [5]SZEGEDY C,LIU W,JIA Y et al.Going Deeper with Convolutions[C]//CVPR.2015:1-9. [6]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:770-778. [7]LU Z,CHEN Y,LI T,et al.Convolutional Neural NetworkConstruction Method for Embedded FPGAs Oriented Edge Computing[J].Compute Research Develope,2018,55:12. [8]COATES A,HUVAL B,WANG T,et al.Deep learning withCOTS HPC systems[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning.JMLR.org,2013:1337-1345. [9]JOUPPI N P,YOUNG C,PATIL N,et al.In-Datacenter Per-formance Analysis of a Tensor Processing Unit[C]//the 44th Annual International Symposium,2017:1-12. [10]AMIRI M,SIDDIQUI F M,KELLY C,et al.FPGA-Based Soft-Core Processors for Image Processing Applications[J].Journal of Signal Processing Systems,2017,87:139-156. [11]SZE V,CHEN Y H,YANG T J,et al.Efficient Processing of Deep Neural Networks:A Tutorial and Survey[C]//Proceedings of the IEEE.2017. [12]YANG T J,CHEN Y H,SZE V.Designing Energy-EfficientConvolutional Neural Networks using Energy-Aware Pruning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017. [13]LU L Q,ZHENG S Z,XIAO Q C,et al.Accelerating convolutional neural networks on FPGAs[J].Science China Information Sciences,2019,49:277-294. [14]LIANG Y,LU L,XIAO Q,et al.Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2020,39:857-870. [15]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//CVPR.IEEE,2016:779-788. [16]TAIGMAN Y,YANG M,RANZATO M,et al.DeepFace:Closing the Gap to Human-Level Performance in Face Verification[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.2014:1701-1708. [17]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:3431-3440. [18]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39:1137-1149. [19]HAN S,POOL J,TRAN J,et al.Learning both Weights and Connections for Efficient Neural Networks[J].Advances in Neural Information Processing Systems,2015,28:1135-1143. [20]ZHANG Q,ZHANG M,CHEN T,et al.Recent Advances in Convolutional Neural Network Acceleration[J].Neurocomputing,2019,323:37-51. [21]YOUNG S,WANG Z,TAUBMAN D,et al.Transform Quantization for CNN Compression[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44:5700-5714. [22]CHEN J,LIU L,LIU Y,et al.A Learning Framework for 〈italic〉n〈/ita-lic〉-Bit Quantized Neural Networks Toward FPGAs[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32:1067-1081. [23]AYACHI R,SAID Y,BEN ABDELALI A.Optimizing Neural Networks for Efficient FPGA Implementation:A Survey[J].Archives of Computational Methods in Engineering,2021,28:4537-4547. [24]CHEN Y R,WANG Y T.A survey of architectures of neural network accelerators[J].Science China Information Sciences,2022,52:16. [25]CHAKRADHAR S T,SANKARADASS M,JAKKULA V,et al.A dynamically configurable coprocessor for convolutional neural networks[C]//37th International Symposium on Computer Architecture(ISCA 2010).Saint-Malo,France,2010. [26]ZHAO R,NIU X,WU Y,et al.Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms[C]//13th International Symposium on Applied Reconfigurable Computing(ARC).2017:255-267. [27]GOKHALE V,JIN J,DUNDAR A,et al.A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.2014:696-701. |
|