计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 1-8.doi: 10.11896/jsjkx.250600014
蔡成欢1, 王一品1, 许嘉滨2, 张逢喆3, 周学功3, 曹伟3, 张帆3, 余新胜4
CAI Chenghuan1, WANG Yipin1, XU Jiabin2, ZHANG Fengzhe3, ZHOU Xuegong3, CAO Wei3, ZHANG Fan3, YU Xinsheng4
摘要: 针对现阶段以RISC-V为核心的神经网络加速器对Transformer架构模型中矩阵计算及非线性计算加速不足的问题,开展了基于RISC-V指令扩展的神经网络计算加速架构研究,提出名为Taurus的神经网络加速器架构。针对模型架构特点,进行了矩阵指令扩展,并使用脉动阵列进行矩阵乘累加计算;为支持非线性计算加速,进行向量指令扩展,并设计特殊向量单元完成LayerNorm和Softmax的计算;为保证数据供给平衡,优化访存指令扩展,以保证矩阵计算单元、向量计算单元的数据供给,在进行指令扩展时采用标量寄存器的扩展方式,将运算数据信息存入寄存器中增大了寻址空间,以保证进行大规模数据运算时生成较少的指令条数。Taurus神经网络加速器架构在Gem5平台上完成了周期精确的模拟仿真,与开源加速器Gemmini相比,进行通用矩阵乘法运算时,脉动阵列利用率提高80%;在ResNet50和BERT模型推理中,Taurus与Gemmini相比,分别获得1.3倍和31.3倍的加速;与RISC-V相比,性能分别获得1 467倍和4 513倍的加速。
中图分类号:
| [1]YOU H,SUN Z,SHI H,et al.Vitcod:Vision transformer acceleration via dedicated algorithm and accelerator co-design[C]//2023 IEEE International Symposium on High-Performance Computer Architecture(HPCA).IEEE,2023:273-286. [2]WANG T,GONG L,WANG C,et al.Via:A novel vision-transformer accelerator based on fpga[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,41(11):4088-4099. [3]JOUPPI N P,YOUNG C,PATIL N,et al.In-datacenter per-formance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture.2017:1-12. [4]LIANG X Y.Ascend AI Processor Architecture and Programming:In-Depth Understanding of CANN Technology Principles and Applications [M].Beijing:Tsinghua University Press,2019. [5]LIU M Y,LI C H,LIN C Y,et al.Matrix Accelerator Designed for Vision Transformer[C]//2024 IEEE International Confe-rence on Consumer Electronics-Asia(ICCE-Asia).IEEE,2024:1-2. [6]KIM S,HOOPER C,WATTANAWONGT,et al.Full stack optimization of transformer inference:a survey[J].arXiv:2302.14017,2023. [7]CUI E,LI T,WEI Q.Risc-v instruction set architecture extensions:A survey[J].IEEE Access,2023,11:24696-24711. [8]CAMMARATA D,PEROTTI M,BERTULETTI M,et al.Quadrilatero:A RISC-V programmable matrix coprocessor for low-power edge applications[J].arXiv:2504.07565,2025. [9]PUROHIT Y,PAREEK D,SAVANI V.Development of a Sys-tem on Chip(SoC) for Matrix Multiplication Utilizing RISC-V and Vector Processor[C]//International Conference on Sustainable and Innovative Solutions for Current Challenges in Engineering & Technology.Springer,2025:1-12. [10]JIAO Q,HU W,LIU F,et al.Risc-vtf:Risc-v based extended instruction set for transformer[C]//2021 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2021:1565-1570. [11]BUTKO A,GARIBOTTI R,OST L,et al.Accuracy evaluation of gem5 simulator system[C]//7th International Workshop on Reconfigurable and Communication-centric Systems-on-chip(ReCoSoC).IEEE,2012:1-7. [12]LOWE-POWER J,AHMAD A M,AKRAM A,et al.The gem5 simulator:Version 20.0+[J].arXiv:2007.03152,2020. [13]SHAO Y S,XI S L,SRINIVASAN V,et al.Co-designing acce-lerators and SoC interfaces using gem5-Aladdin[C]//2016 49th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO).IEEE,2016:1-12. [14]ROGERS S,SLYCORD J,BAHARANI M,et al.gem5-salam:A system architecture for llvm-based accelerator modeling[C]//2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture(MICRO).IEEE,2020:471-482. [15]VIEIRA J,ROMA N,FALCAO G,et al.Gem5-accel:A pre-RTL simulation toolchain for accelerator architecture validation[J].IEEE Computer Architecture Letters,2023,23(1):1-4. [16]FEIST T.Vivado design suite[Z].White Paper,2012:24. [17]GENC H,KIM S,AMID A,et al.Gemmini:Enabling systematic deep-learning architecture evaluation via full-stack integration[C]//2021 58th ACM/IEEE Design Automation Conference(DAC).IEEE,2021:769-774. [18]CAVALCANTE M,SCHUIKI F,ZARUBAF,et al.Ara:A 1-GHz+scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22-nm FD-SOI[J].IEEE Transactions on Very Large Scale Integration Systems,2019,28(2):530-543. [19]GAURAV T,BHATT A,PAREKH R.Design and Implementation of low power RISC V ISA based coprocessor design for Matrix multiplication[C]//2021 Second International Conference on Electronics and Sustainable Communication Systems(ICESC).IEEE,2021:189-195. [20]TAI H Y.Enhanced RISC-V Matrix Extension Architecture[D].Taiwan:National Yang Ming Chiao Tung University,2023. [21]YI X,ANTONIO R,DUMOULIN J,et al.OpenGeMM:A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling[J].arXiv:2411.09543,2024. [22]Working draft of the proposed RISC-V V vector extension[EB/OL].https://github.com/riscv/riscv-v-spec. [23]CHEN C,XIANG X,LIU C,et al.Xuantie-910:A commercial multi-core 12-stage pipeline out-of-order 64-bit high perfor-mance RISC-V processor with vector extension:Industrial pro-duct[C]//2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture(ISCA).IEEE,2020:52-64. [24]KIRAN D C,GURUNARAYANAN S,MISRAJ P,et al.Register allocation for fine grain threads on multicore processor[J].Journal of King Saud University-Computer and Information Sciences,2017,29(1):85-92. [25]PALA D.Design and programming of a coprocessor for a RISC-V architecture[D].Torino:Politecnico di Torino,2017. [26]WATERMAN A,LEE Y,PATTERSON D A,et al.The RISC-V instruction set manual,volume I:User-level ISA,version 2.0:Tech.Rep.:UCB/EECS-2014-54[R].EECS Department,University of California,Berkeley,2014:4. [27]SZE V,CHEN Y H,YANG T J,et al.Efficient processing of deep neural networks:A tutorial and survey[C]//Proceedings of the IEEE.2017:2295-2329. [28]CAPRA M,BUSSOLINO B,MARCHISIO A,et al.Hardwareand software optimizations for accelerating deep neural networks:Survey of current trends,challenges,and the road ahead[J].IEEE Access,2020,8:225134-225180. [29]THOMASD,MOORBY P.The Verilog © hardware description language[M].Springer Science & Business Media,2008. |
|
||