计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200184-11.doi: 10.11896/jsjkx.231200184
裴雪1, 魏帅1, 邵阳雪2, 于洪1, 葛晨洋1
PEI Xue1, WEI Shuai1, SHAO Yangxue2, YU Hong1, GE Chenyang1
摘要: 针对密码算法的不同编译需求,提出一种不同粒度密码算子抽象方法,通过对不同粒度算子的编译优化及映射来解决高阶密码算子在FPGA上快速高效部署的问题。从密码算法中抽象出热点算子来构建算子库,使用多级编译优化对密码算法进行优化与部署,并通过数据张量化以及寄存器优化方法来提升高阶密码算子在VTA硬件架构的部署及运算效率。实验结果表明,采用张量化和寄存器优化方法,执行效率较原始编译部署方法提升了32倍,较OpenCL提升约34倍,且可以根据构建的算子库对密码算法进行快速开发与实现。
中图分类号:
[1]AMIR A,DATTA P,RISK W P,et al.Cognitive computingprogramming paradigm:A corelet language for composing networks of neurosynaptic cores[C]//International Joint Confe-rence on Neural Networks(IJCNN).2013:1-10. [2]DAVISON A P,BRUDERLE D,EPPLER J,et al.PyNN:ACommon Interface for Neuronal Network Simulators[J/OL].Frontiers in Neuroinformatics,2008:2.http://doi.org/10.3389/neuro.11.011.2008. [3]CHEN T Q,MOREAU T,JIANG Z H,et al.TVM:An automated end-to-end optimizing compiler for deep learning[C]//13th Symposium on Operating Systems Design and Implementation.2018:578-594. [4]VASILACHE N,ZINENKO O,THEODORIDIS T,et al.Tensor comprehensions:Framework-agnostic high-performance machine learning abstractions[J].arXiv:1802.04730,2018. [5]ROTEM N,FIX J,ABDULRASOOLl S,et al.Glow:GraphLowering Compiler Techniques for Neural Networks[J].arXiv:1805.00907,2018. [6]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel ngraph:An intermediate representation,compiler,and executor for deep learning[J].arXiv:1801.08058,2018. [7]ALEXANDER M,THIEN N.A MLIR Dialect for QuantumAssembly Languages[C]//2021 IEEE International Conference on Quantum Computing and Engineering(QCE).2021:255-264. [8]LATTNER C,ADVE V.LLVM:A compilation framework for lifelong program analysis & transformation[C]//Proceedings of the international symposium on Code generation and optimization:feedback-directed and runtime optimization.2004:75-86. [9]CHEN C,LI K L,OUYANG A,et al.FlinkCL:AN OpenCL-based in memory computing architecture on heterogeneous CPU-GPU clusters for big date[J].IEEE Transactions on Computers,2018,67(12):1765-1779. [10]ASHOURI A H,KILLIAN W,CAVAZOS J,et al.A Survey on Compiler Autotuning using Machine Learning[J].ACM Computing Surveys,2018,51(5):1-42. [11]ROESCH J,LYUBOMIRSKY S,KIRISAME M,et al.Relay:AHigh-Level Compiler for Deep Learning[J].arXiv:1904.08368,2019. [12]ZHENG B W.Hash Algorithm Accelerator Based on FPGA[D].Wuxi:Jiangnan University,2023. [13]ZHENG S,LIANG Y,WANG S,et al.FlexTensor:An Auto-matic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System[C]//Procee-dings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS'20).2020:859-873. [14]ZHAO J,DI P.Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data[C]//53rd Annual IEEE/ACM International Symposium on Microarchitecture(MICRO) 2020.2020:427-441. [15]LI L D.Parallel Optimization of Data Intensive Computing onSunway TaihuLight[D].Beijing:Tsinghua University,2021. [16]WANG X Y,YU H B.A review of cryptographic hash algo-rithms[J].Information Security Research,2015,1(1):19-30. [17]LI Y B,ZHAO R C,HAN L,et al.Parallelizing compliationframework for heterogeneous manycore processors[J].Journal of Software,2019,30(4):98101001. [18]CONG J,LIU B,NEUENDORFFER S,et al.High-level synthesis for FPGAs:From prototyping to deployment[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2011,30(4):473-491. [19]EJJEH A,ADVE V S,RUTENBAR R A.Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL[C]//The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(FPGA'20).2020. [20]MOREAU T,CHEN T,JIANG Z,et al.VTA:An Open Hardware-Software Stack for Deep Learning[J].arXiv.1807.04188,2018. [21]LIN S Y.Optimized Fully-pipelined Architecture of SHA-2 and MD5 on FPGA[D].Xiamen:Xiamen University,2020. [22]TAO X H,ZHU Y,PANG J M,et al.Parallel Cod-e Generation for Sunway Heterogeneous Architecture[J].Journal of Software,2023,34(4):1570-1593. |
|