Computer Science ›› 2025, Vol. 52 ›› Issue (4): 94-100.doi: 10.11896/jsjkx.241000099

• Smart Embedded Systems • Previous Articles     Next Articles

Efficient Adaptive CNN Accelerator for Resource-limited Chips

PANG Mingyi1, WEI Xianglin2, ZHANG Yunxiang2, WANG Bin2, ZHUANG Jianjun1   

  1. 1 School of Electronics and Information Engineering,Nanjing University of Information Science & Technology,Nanjing 211800,China
    2 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China
  • Received:2024-10-21 Revised:2025-02-22 Online:2025-04-15 Published:2025-04-14
  • About author:PANG Mingyi,born in 2001,postgra-duate.His main research interests include hardware acceleration and edge computing.
    WEI Xianglin,born in 1985,Ph.D,associate researcher.His main research interests include edge computing,deep learning and wireless network security.

Abstract: This paper proposes an adaptive convolutional neural network accelerator(ACNNA) for non-GPU chips with limited resources,which can adaptively generate hardware accelerators based on resource constraints of hardware platform and convolutional neural network structures.Through its reconfigurable feature,ACNNA can effectively accelerate various layer combinations including convolutional layers,pooling layers,activation layers,and fully connected layers.Firstly,a resource folding multi-channel processing engine(PE) array is designed,which folds the idealized convolutional structure to save resources and unfolds on the output channel to support parallel computing.Secondly,multi-level storage and ping-pong caching mechanisms are adopted to optimize the pipeline,effectively improving data processing efficiency.Then,a resource reuse strategy under multi-level storage is proposed,which combined with the design space exploration algorithm can more reasonably schedule hardware resource allocation for network parameters,so that low resource chips can deploy deeper and more parameterized network models.Taking LeNet5 and VGG16 network models as examples,this paper validate ACNNA on the Ultra96 V2 development board.The results show that the ACNNA deployment of VGG16 consumes only 4% of resources of original network.At 100MHz main frequency,LeNet5 accelerator achieves a computing rate of 0.37 GFLOPS with a power consumption of 2.05W; VGG16 accelerator has a computing speed of 1.55 GFLOPS at a power consumption of 2.132W.Compared with existing work,ACNNA increases Frames Per Second(FPS) by over 83%.

Key words: Hardware acceleration, Convolutional neural network, Design space exploration strategy, Field programmable gate array

CLC Number: 

  • TP391
[1]CHEN X,XIE L X,WU J,et al.Cyclic CNN:Image Classification With Multiscale and Multilocation Contexts [J].IEEE Internet of Things Journal,2021,8:7466-7475.
[2]HUANG L,CHEN C,YUN J T,et al.Multi-Scale Feature Fusion Convolutional Neural Network for Indoor Small Target Detection [J].Frontiers in Neurorobotics,2022,16:881021.
[3]HEMA C R,MÁRQUEZ F P G.Emotional speech Recognition using CNN and Deep learning techniques [J].Applied Acoustics,2023,211:109492.
[4]JANG B,KIM M,HARERIMANA G,et al.Bi-LSTM Model to Increase Accuracy in Text Classification:Combining Word2vec CNN and Attention Mechanism [J].Applied Sciences,2020,10:5814.
[5]SYED R T,MARKO S,ULBRICHT M,et al.Towards Reconfigurable CNN Accelerator for FPGA Implementation [J].IEEE Transactions on Circuits and Systems II:Express Briefs,2023,70:1249-1253.
[6]BJERGE K,SCHOUGAARD J H,LARSEN D E.A scalableand efficient convolutional neural network accelerator using HLS for a system-on-chip design [J].Microprocess and Micro-systems,2021,87:104363.
[7]ZHANG Z C,MAHMUD M A,KOUZANI A Z,et al.FitNN:A Low-Resource FPGA-Based CNN Accelerator for Drones [J].IEEE Internet of Things Journal,2022,9:21357-21369.
[8]LI,S Z,WANG Q,JIANG J F,et al.An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs [J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2022,30:1587-1600.
[9]YAN S,LIU Z,WANG Y,et al.An FPGA-based MobileNet Accelerator Considering Network Structure Characteristics[C]//31st International Conference on Field-Programmable Logic and Applications(FPL).2021:17-23.
[10]WANG B,WEI X L,WANG C,et al.Adaptive design and implementation of automatic modulation recognition accelerator [J].Journal of Ambient Intelligence and Humanized Computing,2024,15:1-17.
[11]BAO C,XIE T,FENG W B,et al.A Power-Efficient Optimizing Framework FPGA Accelerator Based on Winograd for YOLO [J].IEEE Access,2020,8:94307-94317.
[12]LECUN Y,BOTTOU B,BENGIO Y,et al.Gradient-basedlearning applied to document recognition [C]//Proceedings of the IEEE.1998:2278-2324.
[13]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-ScaleImage Recognition [J].arXiv:1409.1556,2014.
[14]RIAZATI M,DANESHTALAB M,SJODIN M,et al.Au-toDeepHLS:Deep Neural Network High-level Synthesis using fixed-point precision [C]//2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems(AICAS).2022:122-125.
[15]NANE R,SIMA V M,PILATO C M,et al.A Survey and Evaluation of FPGA High-Level Synthesis Tools [J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2016,35:1591-1604.
[16]CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:An energy efficient reconfigurable accelerator for deep convolutional neural networks [J].IEEE Solid-State Circuits,2017,52:127-138.
[17]BELABED T,SILVA V R,QUENON A,et al.A Novel Automate Python Edge-to-Edge:From Automated Generation on Cloud to User Application Deployment on Edge of Deep Neural Networks for Low Power IoT Systems FPGA-Based Acceleration [J].Sensors,2021,21:6050.
[18]MOUSOULIOTIS P G,PETROU L P.CNN-Grinder:From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs [J].Microprocessors and Micro-systems,2020,73:102990.
[19]VENIERIS S I,BOUGANIS C.fpgaConvNet:A Framework for Mapping Convolutional Neural Networks on FPGAs [C]//2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM).2016:40-47.
[20]RIVERA-ACOSTA M,ORTEGA-CISNEROS S,RIVER A.Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA [J].Electronics,2019,8:641.
[21]MAZOUZ A,BRIDEGS C P.Automated Offline Design-Space Exploration and Online Design Reconfiguration for CNNs [C]//2020 IEEE Conference on Evolving and Adaptive Intelligent Systems(EAIS).2020:1-9.
[22]WANG F,SHEN M,LU Y,et al.TensorMap:A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators [J].IEEE Transactions on Computers,2024,73:1899-1912.
[23]ANDRULIS T,EMER J S,SZE V.CiMLoop:A Flexible,Accurate,and Fast Compute-In-Memory Modeling Tool [C]//2024 IEEE International Symposium on Performance Analysis of Systems and Software(ISPASS).2024:10-23.
[24]WU X,WANG M,LIN J,et al.Amoeba:An Efficient and Flexible FPGA-Based Accelerator for Arbitrary-Kernel CNNs [J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2024,32:1086-1099.
[25]JIA X,ZHANG Y,LIU G,et al.XVDPU:A High Performance CNN Accelerator on the Versal Platform Powered by the AI Engine [C]//2022 32nd International Conference on Field-Programmable Logic and Applications(FPL).2022:1-9.
[26]NAG S,DATTA G,KUNDU S,et al.ViTA:A Vision Transformer Inference Accelerator for Edge Applications [C]//2023 IEEE International Symposium on Circuits and Systems(ISCAS).2023:1-5.
[27]XU Y,LUO J,SUN W.Flare:An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure [J].Sensors,2024,24:2239.
[28]CHEN T Q,LI M,LI Y,et al.MXNet:A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems [J].arXiv:1512.01274,2015.
[29]WANG E,DAVISJ J,CHEUNG P Y.A PYNQ-Based Framework for Rapid CNN Prototyping [C]//2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM).2018:223-223.
[30]CHEN S H,WU J M,PENG K J et al.Design and Implementation of Convolutional Neural Network Accelerator Based on ZYNQ Platform [J].Chinese Automation and Information Engineering,2024,45(1):30-34.
[31]WANG Y L,XIE K L,CHEN S Y,et al.A universal design on hardware acceleration of convolutional neural networks [J].Chinese Computer Science Engineering,2023,45(4):577-581.
[1] WEI Xiaohui, GUAN Zeyu, WANG Chenyang, YUE Hengshan, WU Qi. Hardware-Software Co-design Fault-tolerant Strategies for Systolic Array Accelerators [J]. Computer Science, 2025, 52(5): 91-100.
[2] ZHAO Chuan, HE Zhangzhao, WANG Hao, KONG Fanxing, ZHAO Shengnan, JING Shan. Lightweight Heterogeneous Secure Function Computing Acceleration Framework [J]. Computer Science, 2025, 52(4): 301-309.
[3] XIONG Qibing, MIAO Qiguang, YANG Tian, YUAN Benzheng, FEI Yangyang. Malicious Code Detection Method Based on Hybrid Quantum Convolutional Neural Network [J]. Computer Science, 2025, 52(3): 385-390.
[4] HUANG Rui, XU Ji. Text Classification Based on Invariant Graph Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230900018-5.
[5] WEI Niannian, HAN Shuguang. New Solution for Traveling Salesman Problem Based on Graph Convolution and AttentionNeural Network [J]. Computer Science, 2024, 51(6A): 230700222-8.
[6] WU Yibo, HAO Yingguang, WANG Hongyu. Rice Defect Segmentation Based on Dual-stream Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230600107-8.
[7] SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6.
[8] YUAN Zhen, LIU Jinfeng. Denoising Autoencoders Based on Lossy Compress Coding [J]. Computer Science, 2024, 51(6A): 230400172-7.
[9] DAI Yongdong, JIN Yang, DAI Yufan, FU Jing, WANG Maofei, LIU Xi. Study on Intelligent Defect Recognition Algorithm of Aerial Insulator Image [J]. Computer Science, 2024, 51(6A): 230700172-5.
[10] LYU Yiming, WANG Jiyang. Iron Ore Image Classification Method Based on Improved Efficientnetv2 [J]. Computer Science, 2024, 51(6A): 230600212-6.
[11] ZHANG Huazhong, PAN Yuekai, TU Xiaoguang, LIU Jianhua, XU Luopeng, ZHOU Chao. Facial Expression Recognition Integrating 3D Facial Dynamic Information and Optical Flow Information [J]. Computer Science, 2024, 51(6A): 230700210-7.
[12] LIU Hui, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun. Malicious Attack Detection in Recommendation Systems Combining Graph Convolutional Neural Networks and Ensemble Methods [J]. Computer Science, 2024, 51(6A): 230700003-9.
[13] ZHAO Tong, SHA Chaofeng. Revisiting Test Sample Selection for CNN Under Model Calibration [J]. Computer Science, 2024, 51(6): 34-43.
[14] ZHANG Liying, SUN Haihang, SUN Yufa , SHI Bingbo. Review of Node Classification Methods Based on Graph Convolutional Neural Networks [J]. Computer Science, 2024, 51(4): 95-105.
[15] WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping. Review of Vision-based Neural Network 3D Dynamic Gesture Recognition Methods [J]. Computer Science, 2024, 51(4): 193-208.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!