基于FPGA的CNN图像识别加速与优化

doi:10.11896/jsjkx.200600089

计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 205-212.doi: 10.11896/jsjkx.200600089

• 计算机图形学&多媒体 • 上一篇下一篇

基于FPGA的CNN图像识别加速与优化

齐延荣¹, 周夏冰², 李斌¹, 周清雷¹

1 郑州大学信息工程学院郑州450001
2 苏州大学计算机科学与技术学院江苏苏州215006

收稿日期:2020-06-24 修回日期:2020-08-13 出版日期:2021-04-15 发布日期:2021-04-09
通讯作者: 李斌(iebinli@zzu.edu.cn)
基金资助:
国家重点研发计划“公共安全风险防控与应急技术装配”重点专项(2018XXXXXXX01);国家自然科学基金(61702518)

FPGA-based CNN Image Recognition Acceleration and Optimization

QI Yan-rong¹, ZHOU Xia-bing², LI Bin¹, ZHOU Qing-lei¹

1 School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China
2 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China

Received:2020-06-24 Revised:2020-08-13 Online:2021-04-15 Published:2021-04-09
About author:QI Yan-rong,born in 1995,postgra-duate.Her main research interests include image processing and high-performance computing.(17319793885@163.com)
LI Bin,born in 1986,Ph.D,associate professor.His main research interests include information security and high performance computing.
Supported by:
National Key R&D Program “Public Safety Risk Prevention and Control and Emergency Technology Assembly” Key Special Project (2018XXXXXXX01) and National Natural Science Foundation of China(61702518).

摘要/Abstract

摘要： 目前,CNN已广泛应用于许多应用场景中,包括图像分类、语音识别、视频分析、文档分析等。由于CNN计算密集,常以GPU进行加速,但GPU功耗高,不适用于CNN推理阶段。基于此,文中研究了基于FPGA的CNN图像识别加速与优化的应用方法,利用Intel FPGA提供的OpenCL SDK,在FPGA板卡上设计并优化了CNN前向模型。首先,针对计算量问题,通过功能模块划分,充分发挥FPGA的高计算效能优势。其次,优化核心算法,提高运行速度;分析特征图处理操作,利用参数共享策略降低数据存储量;采用通道传输数据,减少访问片外存储次数。最后,对数据缓存、数据流、循环进行优化设计,缓解了FPGA片上的资源限制;通过量化参数降低FPGA内存资源占用量。实验结果表明,FPGA具有较低的功耗,CPU的功耗是其2.1倍,而GPU的功耗是其6.5倍;与近年来相关领域文献中提出的方法相比,所提方法具有较高的吞吐量和计算性能。

关键词: CNN, FPGA, OpenCL, 模块划分, 数据流优化, 图像识别

Abstract: Currently,CNN has been widely used in many application scenarios,including image classification,speech recognition,video analysis,document analysis,etc.Because CNN is computationally intensive,it is often accelerated with GPUs.However,GPU has a high power consumption and is not suitable for CNN inference stage.Based on this,this paper studies the application method of FPGA-based CNN image recognition acceleration and optimization.The OpenCL SDK provided by Intel FPGA is used to design and optimize the CNN forward model on the FPGA board.First of all,for the calculation problem,through the division of functional modules,the advantages of FPGA’s high computing efficiency are fully utilized.Secondly,this paper optimizes the core algorithm to improve the running speed,analyzes the feature map processing operations,uses the parameter sharing strategy to reduce the amount of data storage,uses the pipeline to transfer data,and reduce the number of accesses to off-chip storage.Finally,it optimizes the design of data cache,data flow and loop to alleviate the on-chip resource constraints of FPGA,quantizes the parameters and reduce the amount of FPGA memory resources occupied.Experimental results show that FPGA has lower power consumption,CPU power consumption is 2.1 times that of FPGA,and GPU power consumption is 6.5 times that of FPGA.Compared with the methods proposed in the literature of related fields in recent years,the proposed method has higher throughput and computational performance.

Key words: CNN, Data flow optimization, FPGA, Image recognition, Module division, OpenCL

中图分类号:

TP391

齐延荣, 周夏冰, 李斌, 周清雷. 基于FPGA的CNN图像识别加速与优化[J]. 计算机科学, 2021, 48(4): 205-212. https://doi.org/10.11896/jsjkx.200600089

QI Yan-rong, ZHOU Xia-bing, LI Bin, ZHOU Qing-lei. FPGA-based CNN Image Recognition Acceleration and Optimization[J]. Computer Science, 2021, 48(4): 205-212. https://doi.org/10.11896/jsjkx.200600089

参考文献

[1]ZHOU F Y,JIN L F,DONG J.A review of convolutional neural network research[J].Journal of Computer Science,2017,40(6):1229-1251.
[2]WU Y X,LIANG K,LIU Y,et al.Progress and Trend of DeepLearning FPGA Accelerator[J].Chinese Journal of Computers,2019,42(11):2461-2480.
[3]AYDONAT U,O'CONNELL S,CAPALIJA D,et al.An opencl deep learning accelerator on arria 10[J].arXiv:1701.03534v1,2017.
[4]QIU J,WANG J,YAO S,et al.Going deeper with embedded FPGA platform for convolutional neural network[C]//Acm/Sigda International Symposium on Field-programmable Gate Arrays.2016:26-35.
[5]WANG C,GONG L,YU Q,et al.DLAU:A Scalable DeepLearning Accelerator Unit on FPGA[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2017,36(3):513-517.
[6]WANG D,XU K,JIANG D.PipeCNN:An OpenCL-based open-source FPGA accelerator for convolution neural networks[C]//2017 International Conference on Field Programmable Techno-logy(ICFPT).Melbourne,VIC,2017:279-282.
[7]WANG D,AN J J,XU K.PipeCNN:An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks[J].arXiv:1611.02450v1,2016.
[8]ABDELOUAHAB K,PELCAT M,SÉROT J,et al.Tactics to Directly Map CNN Graphs on Embedded FPGAs[J].IEEE Embedded Systems Letters,2017,9(4):113-116.
[9]WEI X C.Automated systolic array architecture synthesis forhigh throughput CNN inference on FPGAs[C]//2017 54th ACM/EDAC/IEEE Design Automation Conference(DAC).Austin,TX,2017:1-6.
[10]WANG Y,ZHOU H Y,FENG H,et al.Network traffic classification method based on deep convolutional neural network [J].Journal of Communications,2018,39(1):14-23.
[11]LU Y,CHEN Y,LI T,et al.Construction method of embedded FPGA convolutional neural network for edge computing[J].Computer Research and Development,2018,55(3):551-562.
[12]ZHOU Y M,JIANG J F.An FPGA-based accelerator implementation for deep convolutional neural networks[C]//2015 4th International Conference on Computer Science and Network Technology(ICCSNT).Harbin,2015:829-832.
[13]ZHANG C,LI P,SUN J,et al.Optimizing FPGA-based accele-rator design for deep convolutional neural networks[C]//Proc.ACM/SIGDA Int.Symp.Field Program.Gate Arrays.2015:161-170.
[14]JIAN Q,ZHANG P Y,WANG X J.A configurable CNN co-accelerator FPGA implementation method[J].Acta Electronica Sinica,2019,47(7):1525-1531.
[15]CHAKRADHAR S,SANKARADAS M,JAKKULA V,et al.A dynamically configurable coprocessor for convolutional neural networks[C]//Proc.ACM SIGARCH Comput.2010:247-257.
[16]GOKHALE V,JIN J,DUNDAR A,et al.A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.Columbus,OH,2014:696-701.
[17]SUDA N.Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proc.ACM/SIGDA Int.Symp.Field Program.2016:16-25.
[18]LU L,LIANG Y,XIAO Q,et al.Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[C]//2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM).Napa,CA,2017:101-108.
[19]HAN X,ZHOU D,WANG S,et al.CNN-MERP:An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks[C]//2016 IEEE 34th International Conference on Computer Design(ICCD).Scottsdale,AZ,2016:320-327.

相关文章 15

[1]	岳晴, 尹健宇, 王生生. 基于改进CNN的低剂量CT图像肺结节自动检测 Automatic Detection of Pulmonary Nodules in Low-dose CT Images Based on Improved CNN 计算机科学, 2022, 49(6A): 54-59. https://doi.org/10.11896/jsjkx.210400211
[2]	余本功, 张子薇, 王惠灵. 一种融合多层次情感和主题信息的TS-AC-EWM在线商品排序方法 TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information 计算机科学, 2022, 49(6A): 165-171. https://doi.org/10.11896/jsjkx.210400238
[3]	王杉, 徐楚怡, 师春香, 张瑛. 基于CNN-LSTM的卫星云图云分类方法研究 Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM 计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[4]	祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121
[5]	王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
[6]	赵征鹏, 李俊钢, 普园媛. 基于卷积神经网络的Retinex低照度图像增强 Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network 计算机科学, 2022, 49(6): 199-209. https://doi.org/10.11896/jsjkx.210400092
[7]	赵小虎, 叶圣, 李晓. 多算法融合的骨骼重建信息动作分类方法 Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction 计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070
[8]	方仲礼, 王喆, 迟子秋. 面向多标签小样本学习的双流重构网络 Dual-stream Reconstruction Network for Multi-label and Few-shot Learning 计算机科学, 2022, 49(1): 212-218. https://doi.org/10.11896/jsjkx.201100143
[9]	黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合 Multi-focus Image Fusion Method Based on PCANet in NSST Domain 计算机科学, 2021, 48(9): 181-186. https://doi.org/10.11896/jsjkx.200800064
[10]	崔雯昊, 蒋慕蓉, 杨磊, 傅鹏铭, 朱凌霄. 结合MCycleGAN与RFCNN实现太阳斑点图高分辨重建 Combining MCycleGAN and RFCNN to Realize High Resolution Reconstruction of Solar Speckle Image 计算机科学, 2021, 48(6A): 38-42. https://doi.org/10.11896/jsjkx.201000160
[11]	熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别 Image Recognition for Building Components Based on Convolutional Neural Network 计算机科学, 2021, 48(6A): 51-56. https://doi.org/10.11896/jsjkx.200500122
[12]	刘汉卿, 康晓东, 李博, 张华丽, 冯继超, 韩俊玲. 利用深度学习网络对医学影像分类识别的比较研究 Comparative Study on Classification and Recognition of Medical Images Using Deep Learning Network 计算机科学, 2021, 48(6A): 89-94. https://doi.org/10.11896/jsjkx.201000116
[13]	俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法 Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark 计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129
[14]	王登天, 周华, 钱荷玥. LDPC自适应最小和译码算法及其FPGA实现 LDPC Adaptive Minimum Sum Decoding Algorithm and Its FPGA Implementation 计算机科学, 2021, 48(6A): 608-612. https://doi.org/10.11896/jsjkx.200800134
[15]	王中元, 刘惊雷. 基于二阶近邻的核子空间聚类 Kernel Subspace Clustering Based on Second-order Neighbors 计算机科学, 2021, 48(6): 86-95. https://doi.org/10.11896/jsjkx.200800180

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于FPGA的CNN图像识别加速与优化

FPGA-based CNN Image Recognition Acceleration and Optimization

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0