计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 83-90.doi: 10.11896/jsjkx.241200074
陈煦豪1,2,4, 胡思鹏1, 刘洪超1,3,4, 刘伯然4,5, 唐丹1,4, 赵地4,5
CHEN Xuhao1,2,4, HU Sipeng1, LIU Hongchao1,3,4, LIU Boran4,5, TANG Dan1,4, ZHAO Di4,5
摘要: 鉴于边缘AI的高性能与低功耗需求,基于 RISC-V 指令集架构,针对边缘设备数字信号处理的实际问题,设计了一种边缘AI的专用指令集处理器,在有限的硬件开销下,提升了边缘AI的执行效率,降低了边缘AI的能量消耗,能够满足边缘AI应用中进行高效大语言模型(LLM) 推理计算的需求。 针对大语言模型的特性,基于RISC-V指令集扩展了自定义指令完成矢量点积计算,在专用的矢量点积加速硬件上进行大语言模型的运算加速;基于开源高性能RISC-V处理器核“香山”nanhu版本架构,实现了矢量点积专用指令集处理器nanhu-vdot,其在高性能处理器“香山”(nanhu版本)的基础上增加了矢量点积计算单元以及流水线处理逻辑;对nanhu-vdot进行FPGA硬件测试,在几乎没有增加额外的硬件资源和功耗消耗的前提下,矢量点积运算速度相比标量方法提高4倍以上,使用软硬件协同方案进行第二代生成式预训练(Generative Pre-Trained-2,GPT-2)模型推理,相比纯软件实现,速度提高了约30%。
中图分类号:
[1]LI Y,ZHU J,FU Y,et al.Circular Reconfigurable Parallel Processor for Edge Computing:Industrial Product[C]//2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture(ISCA).IEEE,2024:863-875. [2]DAGHERO F,PAGLIARI D J,PONCINO M.Energy-efficient deep learning inference on edgedevices[M]//Advances in Computers.Elsevier,2021:247-301. [3]DHAR S,GUO J,LIU J,et al.A survey of on-device machine learning:An algorithmsand learning theory perspective[J].ACM Transactions on Internet of Things,2021,2(3):1-49. [4]CHANDER V N,VARGHESE K.A Soft RISC-V Vector Processor for Edge-AI[C]//2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems(VLSID).IEEE,2022:263-268. [5]SINGH R,GILL S S.Edge AI:a survey[J].Internet of Things and Cyber-Physical Systems,2023,3:71-92. [6]HE T,CHEN X,WANG G.Research on Open Source Processor and Analysis of Current Development Dilemma Based on RISC-V[C]//2023 8th International Conference on Computer and Communication Systems(ICCCS).2023:768-774. [7]GAO Y,QIAN W,CUI E F.RISC-V ISA Extension Toolchain Supports:A Survey[C]//Proceedings of the 2023 4th International Conference on Computing,Networks and Internet of Things(CNIOT '23).2023. [8]KUSSWURM D.Streaming simd extensions[M]//Modern X86 Assembly Language Programming.2014:179-206. [9]EMERY R.How AI and ML Applications Will Benefit fromVector Processing[EB/OL].https://www.enterpriseai.news/2020/07/31/how-ai-and-ml-applications-will-benefit-from-vector-processing. [10]JOUPPI N P,YOUNG C,PATIL N,et al.In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture.2017:1-12. [11]ZHOU L,ZHAO Z Q,PANG T,et al.Design of a Graph Convolutional Neural Network Accelerator Based on RISC-V [J].Computer Engineering and Science,2023,45(12):2113-2120. [12]LIU C,WU Y J,WU J Z,et al.A Review of RISC-V Instruction Set Architecture Research [J].Journal of Software,2021,32(12):3992-4024. [13]LI F,GUO S Z,HAO J W,et al.Implementation of a BasicMathematical Library for RISC-V [J].Journal of Electronics,2024,52(5):1633-1647. [14]CUI E,LI T,WEI Q.RISC-V Instruction Set Architecture Extensions:A Survey[J].IEEE Access,2023,11:24696-24711. [15]TORRES-SÁNCHEZ E,ALASTRUEY-BENEDÉ J,TORRES-MORENO E.Developing an AI IoT application with open software on a RISC-V SoC[C]//2020 XXXV Conference on Design of Circuits and Integrated Systems (DCIS).IEEE,2020:1-6. [16]HAIDARZHY V.RISC-V Unleashed:The Definitive Guide to Next-Gen Computing[EB/OL].https://sirinsoftware.com/blog/risc-v-unleashed-the-definitive-guide-to-next-gen-computing. [17]XU Y N,YU Z H,DAN T,et al.Towards Developing High Per-formance RISC-V Processors Using Agile Methodology[C]//2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO).IEEE,2022:1178-1199. [18]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI blog,2019,1(8):9. [19]Working draft of the proposed RISC-V V vector extension[EB/OL].https://github.com/riscv/riscv-v-spec. [20]CHEN C,XIANG X,LIU C,et al.Xuantie-910:A commercial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension:Industrial product[C]//2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).IEEE,2020:52-64. [21]GENC H,KIM S,AMID A,et al.Gemmini:Enabling systematic deep-learning architecture evaluation via full-stack integration[C]//2021 58th ACM/IEEE Design Automation Conference (DAC).IEEE,2021:769-774. [22]ZHAO J,KORPAN B,GONZALEZ A,et al.Sonicboom:The 3rd generation berkeley out-of-order machine[C]//Fourth Workshop on Computer Architecture Research with RISC-V.2020:1-7. [23]BASHA S H S,DUBEY S R,PULABAIGARI V,et al.Impact of fully connected layers on performance of convolutional neural networks for image classification[J].Neurocomputing,2020,378:112-119. [24]SHALEV-SHWARTZ S,BEN-DAVID S.Understanding ma-chine learning:From theory to algorithms[M].Cambridge:Cambridge University Press,2014. [25]A Review of Transformer Models[EB/OL].https://www.researchgate.net/profile/Jennifer-Dsouza-6/publication/373757234_A_Review_of_Transformer_Models/links/64faeef25ce6b724f916364b/A-Review-of-Transformer-Models.pdf. [26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[J].arXiv:1706.03762,2017. [27]DEROSE J F,WANG J,BERGER M.Attention flows:Analyzing and comparing attention mechanisms in language models[J].IEEE Transactions on Visualization and Computer Gra-phics,2020,27(2):1160-1170. [28]UPRETY S,JAISWAL A K,LIU H,et al.Investigating Context Effects in Similarity Judgements in Large Language Models[J].arXiv:2408.10711,2024. [29]WANG X,XIONG Y,WEI Y,et al.LightSeq:A high performance inference library for transformers[J].arXiv:2010.13887,2020. [30]JIANG S J.Design of an FFT-Specific Instruction Set Processor Based on RISC-V [D].Guangzhou:South China University of Technology,2023. [31]BACHRACH J,VO H,RICHARDS B,et al.Chisel:construc-ting hardware in a scala embedded language[C]//Proceedings of the 49th Annual Design Automation Conference.2012:1216-1225. [32]ZHU Y,ZHENG J,DING S,et al.Hardware Data Prefetch for XiangShan Processor[C]//2022 7th International Conference on Integrated Circuits and Microsystems (ICICM).2022:394-397. [33]ZOU J R,TANG D,CAI Y,et al.A design of fetch target buffer implemented on XiangShan processor[C]// International Conference on Cloud Computing,Internet of Things,and Computer Applications.2022. [34]Xilinx.Product Overview:1-1dt42z7 Development Board[EB/OL].https://china.xilinx.com/products/boards-and-kits/1-1dt42z7.html. [35]LI P S,IZRAELEVITZ A M,BACHRACH J.Specification for the FIRRTL Language[EB/OL].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-9.pdf. [36]IZRAELEVITZ A,JACK K,LI P,et al.Reusability is FIRRTL ground:Hardware construction languages,compiler frameworks,and transformations[C]//2017 IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD).2017:209-216. [37]FREY S,GUERMANDI M,BENATTI S,et al.BioGAP:a 10-Core FP-capable Ultra-Low Power IoT Processor,with Medical-Grade AFE and BLE Connectivity for Wearable Biosignal Processing[EB/OL].https://ieeexplore.ieee.org/abstract/document/10189286. |
|