计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 205-212.doi: 10.11896/jsjkx.200600089
齐延荣1, 周夏冰2, 李斌1, 周清雷1
QI Yan-rong1, ZHOU Xia-bing2, LI Bin1, ZHOU Qing-lei1
摘要: 目前,CNN已广泛应用于许多应用场景中,包括图像分类、语音识别、视频分析、文档分析等。由于CNN计算密集,常以GPU进行加速,但GPU功耗高,不适用于CNN推理阶段。基于此,文中研究了基于FPGA的CNN图像识别加速与优化的应用方法,利用Intel FPGA提供的OpenCL SDK,在FPGA板卡上设计并优化了CNN前向模型。首先,针对计算量问题,通过功能模块划分,充分发挥FPGA的高计算效能优势。其次,优化核心算法,提高运行速度;分析特征图处理操作,利用参数共享策略降低数据存储量;采用通道传输数据,减少访问片外存储次数。最后,对数据缓存、数据流、循环进行优化设计,缓解了FPGA片上的资源限制;通过量化参数降低FPGA内存资源占用量。实验结果表明,FPGA具有较低的功耗,CPU的功耗是其2.1倍,而GPU的功耗是其6.5倍;与近年来相关领域文献中提出的方法相比,所提方法具有较高的吞吐量和计算性能。
中图分类号:
[1]ZHOU F Y,JIN L F,DONG J.A review of convolutional neural network research[J].Journal of Computer Science,2017,40(6):1229-1251. [2]WU Y X,LIANG K,LIU Y,et al.Progress and Trend of DeepLearning FPGA Accelerator[J].Chinese Journal of Computers,2019,42(11):2461-2480. [3]AYDONAT U,O'CONNELL S,CAPALIJA D,et al.An opencl deep learning accelerator on arria 10[J].arXiv:1701.03534v1,2017. [4]QIU J,WANG J,YAO S,et al.Going deeper with embedded FPGA platform for convolutional neural network[C]//Acm/Sigda International Symposium on Field-programmable Gate Arrays.2016:26-35. [5]WANG C,GONG L,YU Q,et al.DLAU:A Scalable DeepLearning Accelerator Unit on FPGA[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2017,36(3):513-517. [6]WANG D,XU K,JIANG D.PipeCNN:An OpenCL-based open-source FPGA accelerator for convolution neural networks[C]//2017 International Conference on Field Programmable Techno-logy(ICFPT).Melbourne,VIC,2017:279-282. [7]WANG D,AN J J,XU K.PipeCNN:An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks[J].arXiv:1611.02450v1,2016. [8]ABDELOUAHAB K,PELCAT M,SÉROT J,et al.Tactics to Directly Map CNN Graphs on Embedded FPGAs[J].IEEE Embedded Systems Letters,2017,9(4):113-116. [9]WEI X C.Automated systolic array architecture synthesis forhigh throughput CNN inference on FPGAs[C]//2017 54th ACM/EDAC/IEEE Design Automation Conference(DAC).Austin,TX,2017:1-6. [10]WANG Y,ZHOU H Y,FENG H,et al.Network traffic classification method based on deep convolutional neural network [J].Journal of Communications,2018,39(1):14-23. [11]LU Y,CHEN Y,LI T,et al.Construction method of embedded FPGA convolutional neural network for edge computing[J].Computer Research and Development,2018,55(3):551-562. [12]ZHOU Y M,JIANG J F.An FPGA-based accelerator implementation for deep convolutional neural networks[C]//2015 4th International Conference on Computer Science and Network Technology(ICCSNT).Harbin,2015:829-832. [13]ZHANG C,LI P,SUN J,et al.Optimizing FPGA-based accele-rator design for deep convolutional neural networks[C]//Proc.ACM/SIGDA Int.Symp.Field Program.Gate Arrays.2015:161-170. [14]JIAN Q,ZHANG P Y,WANG X J.A configurable CNN co-accelerator FPGA implementation method[J].Acta Electronica Sinica,2019,47(7):1525-1531. [15]CHAKRADHAR S,SANKARADAS M,JAKKULA V,et al.A dynamically configurable coprocessor for convolutional neural networks[C]//Proc.ACM SIGARCH Comput.2010:247-257. [16]GOKHALE V,JIN J,DUNDAR A,et al.A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.Columbus,OH,2014:696-701. [17]SUDA N.Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proc.ACM/SIGDA Int.Symp.Field Program.2016:16-25. [18]LU L,LIANG Y,XIAO Q,et al.Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs[C]//2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM).Napa,CA,2017:101-108. [19]HAN X,ZHOU D,WANG S,et al.CNN-MERP:An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks[C]//2016 IEEE 34th International Conference on Computer Design(ICCD).Scottsdale,AZ,2016:320-327. |
[1] | 岳晴, 尹健宇, 王生生. 基于改进CNN的低剂量CT图像肺结节自动检测 Automatic Detection of Pulmonary Nodules in Low-dose CT Images Based on Improved CNN 计算机科学, 2022, 49(6A): 54-59. https://doi.org/10.11896/jsjkx.210400211 |
[2] | 余本功, 张子薇, 王惠灵. 一种融合多层次情感和主题信息的TS-AC-EWM在线商品排序方法 TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information 计算机科学, 2022, 49(6A): 165-171. https://doi.org/10.11896/jsjkx.210400238 |
[3] | 王杉, 徐楚怡, 师春香, 张瑛. 基于CNN-LSTM的卫星云图云分类方法研究 Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM 计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177 |
[4] | 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121 |
[5] | 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210 |
[6] | 赵征鹏, 李俊钢, 普园媛. 基于卷积神经网络的Retinex低照度图像增强 Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network 计算机科学, 2022, 49(6): 199-209. https://doi.org/10.11896/jsjkx.210400092 |
[7] | 赵小虎, 叶圣, 李晓. 多算法融合的骨骼重建信息动作分类方法 Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction 计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070 |
[8] | 方仲礼, 王喆, 迟子秋. 面向多标签小样本学习的双流重构网络 Dual-stream Reconstruction Network for Multi-label and Few-shot Learning 计算机科学, 2022, 49(1): 212-218. https://doi.org/10.11896/jsjkx.201100143 |
[9] | 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合 Multi-focus Image Fusion Method Based on PCANet in NSST Domain 计算机科学, 2021, 48(9): 181-186. https://doi.org/10.11896/jsjkx.200800064 |
[10] | 崔雯昊, 蒋慕蓉, 杨磊, 傅鹏铭, 朱凌霄. 结合MCycleGAN与RFCNN实现太阳斑点图高分辨重建 Combining MCycleGAN and RFCNN to Realize High Resolution Reconstruction of Solar Speckle Image 计算机科学, 2021, 48(6A): 38-42. https://doi.org/10.11896/jsjkx.201000160 |
[11] | 熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别 Image Recognition for Building Components Based on Convolutional Neural Network 计算机科学, 2021, 48(6A): 51-56. https://doi.org/10.11896/jsjkx.200500122 |
[12] | 刘汉卿, 康晓东, 李博, 张华丽, 冯继超, 韩俊玲. 利用深度学习网络对医学影像分类识别的比较研究 Comparative Study on Classification and Recognition of Medical Images Using Deep Learning Network 计算机科学, 2021, 48(6A): 89-94. https://doi.org/10.11896/jsjkx.201000116 |
[13] | 俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法 Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark 计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129 |
[14] | 王登天, 周华, 钱荷玥. LDPC自适应最小和译码算法及其FPGA实现 LDPC Adaptive Minimum Sum Decoding Algorithm and Its FPGA Implementation 计算机科学, 2021, 48(6A): 608-612. https://doi.org/10.11896/jsjkx.200800134 |
[15] | 王中元, 刘惊雷. 基于二阶近邻的核子空间聚类 Kernel Subspace Clustering Based on Second-order Neighbors 计算机科学, 2021, 48(6): 86-95. https://doi.org/10.11896/jsjkx.200800180 |
|