Computer Science ›› 2023, Vol. 50 ›› Issue (11): 8-14.doi: 10.11896/jsjkx.221100104

• High Performance Computing • Previous Articles     Next Articles

Acceleration Design and FPGA Implementation of CNN Scene Matching Algorithm

WANG Xiaofeng, LI Chaoran, LU Kunfeng, LUAN Tianjiao, YAO Na, ZHOU Hui, XIE Yujia   

  1. Beijing Aerospace Automatic Control Institute,Beijing 100854,ChinaNational Aerospace Intelligence Control Technology Laboratory,Beijing 100854,China
  • Received:2022-11-12 Revised:2023-05-11 Online:2023-11-15 Published:2023-11-06
  • About author:WANG Xiaofeng,born in 1995,master,engineer.His main research interest is intelligent computing.
  • Supported by:
    State Key Laboratory Fund(61425010302).

Abstract: Compared with traditional methods,the CNN-based scene matching algorithm has higher matching accuracy,better adaptability and stronger anti-interference ability.However,the algorithm has massive computing and storage requirements,which makes it difficult to deploy at the edge.To improve the real-time computing,an efficient edge-side acceleration scheme is designed and implemented.On the basis of analyzing the computation characteristics and overall architecture of the algorithm,correlation specific accelerator(CSA) is designed based on Winograd fast convolution method,and the acceleration scheme using CSA and deep-learning processor unit(DPU) pipelined computing feature correlation layer and feature extraction network is proposed.Experiments on Xilinx's ZCU102 development board finds that the peak perfor-mance of CSA reaches 576 GOPS,the actual performance reaches 422.08 GOPS,and the DSP usage efficiency reaches 4.5 Operation/clock.The peak performance of the accele-ration system reaches 1 600 GOPS,and the throughput delay of the algorithm is reduced to 157.89 ms.Experimental results show that the acceleration scheme can efficiently utilize the computing resources of the FPGA,to realize the real-time computing of the CNN-based scene matching algorithm.

Key words: Acceleration computing, Scene matching algorithm, Deep learning, FPGA, Winograd algorithm, Specific accelerator

CLC Number: 

  • TP391
[1]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[2]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[3]TAN M,LE Q.Efficientnet:Rethinking Model Scaling for Convolutional Neural Networks[C]//International Conference on Machine Learning.PMLR,2019:6105-6114.
[4]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[5]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.Yolov4:Optimal Speed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[6]TAN M,PANG R,LE Q V.Efficientdet:Scalable and Efficient Object Detection[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:10781-10790.
[7]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultibox Detector[C]//European Confe-rence on Computer Vision.Cham:Springer,2016:21-37.
[8]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[9]SUN P,ZHANG R,JIANG Y,et al.Sparse R-CNN:End-to-End Object Detection with Learnable Proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14454-14463.
[10]REN S H,CHANG W G,LIU X J.A Scene Matching Algo-rithm based on Wavelet Transform and Variable Scale Circle Template Fusion[J].Acta Electronica Sinica,2011,39(9):2200-2203.
[11]BO L F,HAN J,ZHANG Y,et al.Infrared and Visible Image Registration Algorithm using Improved Gradient Mutual Information and Particle Swarm Optimization Algorithm[J].Infrared and Laser Engineering,2012,41(1):248-254.
[12]CAO Z G,WU B.The Down-View Scene Matching Algorithm using HOG Features[J].Infrared and Laser Engineering,2012,41(2):513-516.
[13]ALEKSANDRA S,SIMON B.Optimizing SIFT for Matching of Short Wave Infrared and Visible Wavelength Images[J].Remote Sensing,2013,5(5):2037-2056.
[14]CHEN T,DU Z,SUN N,et al.Diannao:A Small-FootprintHigh-Throughput Accelerator for Ubiquitous Machine Learning[J].ACM SIGARCH Computer Architecture News,2014,42(1):269-284.
[15]JOUPPI N P,YOUNG C,PATIL N,et al.In-Datacenter Per-formance Analysis of a Tensor Processing Unit[C]//Procee-dings of the 44th Annual International Symposium on Computer Architecture.2017:1-12.
[16]CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks[J].IEEE Journal of Solid-State Circuits,2016,52(1):127-138.
[17]WILLIAMS S,WATERMAN A,PATTER-SON D A.Roof-line:An Insightful Visual Performance Model for Multicore Architectures[J].Communications of the ACM,2009,52(4):65-76.
[18]ZHANG C,LI P,SUN G,et al.Optimizing FPGA-based Acce-lerator Design for Deep Convolutional Neural Networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.2015:161-170.
[19]GUO K,SUI L,QIU J,et al.Angel-eye:A Complete DesignFlow for Mapping CNN onto Customized Hardware[C]//2016 IEEE Computer Society Annual Symposium on VLSI(ISVLSI).IEEE,2016:24-29.
[20]WANG X F,JIANG P L,ZHOU H,et al.High Parallelism FPGA Accelerator Design for Convolutional Neural Networks[J].Journal of Computer Applications,2021,41(3):812-819.
[21]WANG X,GE Y,GAO Y,et al.A More Scalable Deep-LearningProcessing Unit for Depthwise Separable Convolution[C]//2021 6th International Conference on Integrated Circuits and Micro-systems(ICICM).IEEE,2021:285-290.
[22]WANG X,LIU G,GE Y,et al.A More Efficient Deep-Learning Processing Unit Architecture with Runtime Configurable Parallelism[C]//2021 China Automation Congress(CAC).IEEE,2021:5941-5945.
[23]LU L,LIANG Y,XIAO Q,et al.Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[C]//2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM).IEEE,2017:101-108.
[24]SHEN J,HUANG Y,WANG Z,et al.Towards a Uniform Template-Based Architecture for Accelerating 2D and 3D CNNs on FPGA[C]//Proceedings of the 2018 ACM/SIGDA Interna-tional Symposium on Field-Programmable Gate Arrays.2018:97-106.
[25]LU L,LIANG Y.SpWA:An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs[C]//Procee-dings s of the 55th Annual Design Automation Conference.2018:1-6.
[26]LAVIN A,GRAY S.Fast Algorithms for Convolutional Neural Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4013-4021.
[27]XILINX.DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide(PG338)[EB/OL].(2022-06-24)[2022-12-07].https://docs.xilinx.com/r/en-US/pg338-dpu.
[1] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[2] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[3] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[4] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[5] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[6] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[7] SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
[8] WANG Xu, WU Yanxia, ZHANG Xue, HONG Ruize, LI Guangsheng. Survey of Rotating Object Detection Research in Computer Vision [J]. Computer Science, 2023, 50(8): 79-92.
[9] ZHOU Ziyi, XIONG Hailing. Image Captioning Optimization Strategy Based on Deep Learning [J]. Computer Science, 2023, 50(8): 99-110.
[10] ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[11] LI Kun, GUO Wei, ZHANG Fan, DU Jiayu, YANG Meiyue. Adversarial Malware Generation Method Based on Genetic Algorithm [J]. Computer Science, 2023, 50(7): 325-331.
[12] WANG Mingxia, XIONG Yun. Disease Diagnosis Prediction Algorithm Based on Contrastive Learning [J]. Computer Science, 2023, 50(7): 46-52.
[13] SHEN Zhehui, WANG Kailai, KONG Xiangjie. Exploring Station Spatio-Temporal Mobility Pattern:A Short and Long-term Traffic Prediction Framework [J]. Computer Science, 2023, 50(7): 98-106.
[14] HUO Weile, JING Tao, REN Shuang. Review of 3D Object Detection for Autonomous Driving [J]. Computer Science, 2023, 50(7): 107-118.
[15] ZHOU Bo, JIANG Peifeng, DUAN Chang, LUO Yuetong. Study on Single Background Object Detection Oriented Improved-RetinaNet Model and Its Application [J]. Computer Science, 2023, 50(7): 137-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!