计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 8-14.doi: 10.11896/jsjkx.221100104
王晓峰, 李超然, 路坤锋, 栾天娇, 姚娜, 周辉, 谢宇嘉
WANG Xiaofeng, LI Chaoran, LU Kunfeng, LUAN Tianjiao, YAO Na, ZHOU Hui, XIE Yujia
摘要: 基于卷积神经网络的景象匹配算法较传统方法具有更高的匹配精度、更好的适应性以及更强的抗干扰能力。但是,该算法有海量的计算与存储需求,导致在边缘端部署存在巨大困难。为了提升计算实时性,文中设计并实现了一种高效的边缘端加速计算方案。在分析算法的计算特性与整体架构的基础上,基于Winograd快速卷积方法,设计了一种面向特征匹配层的专用加速器,并提出了利用专用加速器与深度学习处理器流水线式计算特征匹配层和特征提取网络的整体加速方案。在Xilinx的ZCU102开发板上进行实验发现,专用加速器的峰值算力达到576 GOPS,实际算力达422.08 GOPS,DSP的使用效率达4.5 Ope-ration/clock。加速计算系统的峰值算力达1600 GOPS,将CNN景象匹配算法的吞吐时延降低至157.89 ms。实验结果表明,该加速计算方案能高效利用FPGA的计算资源,实现CNN景象匹配算法的实时计算。
中图分类号:
[1]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014. [2]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [3]TAN M,LE Q.Efficientnet:Rethinking Model Scaling for Convolutional Neural Networks[C]//International Conference on Machine Learning.PMLR,2019:6105-6114. [4]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149. [5]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.Yolov4:Optimal Speed and Accuracy of Object Detection[J].arXiv:2004.10934,2020. [6]TAN M,PANG R,LE Q V.Efficientdet:Scalable and Efficient Object Detection[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:10781-10790. [7]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultibox Detector[C]//European Confe-rence on Computer Vision.Cham:Springer,2016:21-37. [8]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969. [9]SUN P,ZHANG R,JIANG Y,et al.Sparse R-CNN:End-to-End Object Detection with Learnable Proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14454-14463. [10]REN S H,CHANG W G,LIU X J.A Scene Matching Algo-rithm based on Wavelet Transform and Variable Scale Circle Template Fusion[J].Acta Electronica Sinica,2011,39(9):2200-2203. [11]BO L F,HAN J,ZHANG Y,et al.Infrared and Visible Image Registration Algorithm using Improved Gradient Mutual Information and Particle Swarm Optimization Algorithm[J].Infrared and Laser Engineering,2012,41(1):248-254. [12]CAO Z G,WU B.The Down-View Scene Matching Algorithm using HOG Features[J].Infrared and Laser Engineering,2012,41(2):513-516. [13]ALEKSANDRA S,SIMON B.Optimizing SIFT for Matching of Short Wave Infrared and Visible Wavelength Images[J].Remote Sensing,2013,5(5):2037-2056. [14]CHEN T,DU Z,SUN N,et al.Diannao:A Small-FootprintHigh-Throughput Accelerator for Ubiquitous Machine Learning[J].ACM SIGARCH Computer Architecture News,2014,42(1):269-284. [15]JOUPPI N P,YOUNG C,PATIL N,et al.In-Datacenter Per-formance Analysis of a Tensor Processing Unit[C]//Procee-dings of the 44th Annual International Symposium on Computer Architecture.2017:1-12. [16]CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks[J].IEEE Journal of Solid-State Circuits,2016,52(1):127-138. [17]WILLIAMS S,WATERMAN A,PATTER-SON D A.Roof-line:An Insightful Visual Performance Model for Multicore Architectures[J].Communications of the ACM,2009,52(4):65-76. [18]ZHANG C,LI P,SUN G,et al.Optimizing FPGA-based Acce-lerator Design for Deep Convolutional Neural Networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.2015:161-170. [19]GUO K,SUI L,QIU J,et al.Angel-eye:A Complete DesignFlow for Mapping CNN onto Customized Hardware[C]//2016 IEEE Computer Society Annual Symposium on VLSI(ISVLSI).IEEE,2016:24-29. [20]WANG X F,JIANG P L,ZHOU H,et al.High Parallelism FPGA Accelerator Design for Convolutional Neural Networks[J].Journal of Computer Applications,2021,41(3):812-819. [21]WANG X,GE Y,GAO Y,et al.A More Scalable Deep-LearningProcessing Unit for Depthwise Separable Convolution[C]//2021 6th International Conference on Integrated Circuits and Micro-systems(ICICM).IEEE,2021:285-290. [22]WANG X,LIU G,GE Y,et al.A More Efficient Deep-Learning Processing Unit Architecture with Runtime Configurable Parallelism[C]//2021 China Automation Congress(CAC).IEEE,2021:5941-5945. [23]LU L,LIANG Y,XIAO Q,et al.Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[C]//2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM).IEEE,2017:101-108. [24]SHEN J,HUANG Y,WANG Z,et al.Towards a Uniform Template-Based Architecture for Accelerating 2D and 3D CNNs on FPGA[C]//Proceedings of the 2018 ACM/SIGDA Interna-tional Symposium on Field-Programmable Gate Arrays.2018:97-106. [25]LU L,LIANG Y.SpWA:An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs[C]//Procee-dings s of the 55th Annual Design Automation Conference.2018:1-6. [26]LAVIN A,GRAY S.Fast Algorithms for Convolutional Neural Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4013-4021. [27]XILINX.DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide(PG338)[EB/OL].(2022-06-24)[2022-12-07].https://docs.xilinx.com/r/en-US/pg338-dpu. |
|