计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200184-11.doi: 10.11896/jsjkx.231200184

• 计算机软件&体系架构 • 上一篇    下一篇

高阶密码算子在FPGA的编译优化与实现

裴雪1, 魏帅1, 邵阳雪2, 于洪1, 葛晨洋1   

  1. 1 信息工程大学信息技术研究所 郑州 450002
    2 嵩山实验室 郑州 450046
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 裴雪(peixue1949@163.com)
  • 基金资助:
    国家重点研发计划重点专项(2022YFB4401401);嵩山实验室项目(纳入河南省重大科技专项管理体系)(221100211100-01)

Compilation Optimization and Implementation of High-order Cryptographic Operators on FPGA

PEI Xue1, WEI Shuai1, SHAO Yangxue2, YU Hong1, GE Chenyang1   

  1. 1 Institute of Information Technology,Information Engineering University,Zhengzhou 450002,China
    2 Songshan Laboratory,Zhengzhou 450046,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:PEI Xue,born in 1992,master,assistant researcher.Her main research interest is software and hardware co-compilation.
  • Supported by:
    National Key Research and Development Program of China(2022YFB4401401) and Program of Songshan Laboratory(included in the managerment of Major Science and Technology Program of Henan Province)(221100211100-01).

摘要: 针对密码算法的不同编译需求,提出一种不同粒度密码算子抽象方法,通过对不同粒度算子的编译优化及映射来解决高阶密码算子在FPGA上快速高效部署的问题。从密码算法中抽象出热点算子来构建算子库,使用多级编译优化对密码算法进行优化与部署,并通过数据张量化以及寄存器优化方法来提升高阶密码算子在VTA硬件架构的部署及运算效率。实验结果表明,采用张量化和寄存器优化方法,执行效率较原始编译部署方法提升了32倍,较OpenCL提升约34倍,且可以根据构建的算子库对密码算法进行快速开发与实现。

关键词: 算子抽象, 编译优化, 代码生成, 密码算法

Abstract: Aiming at the different compilation requirements of cryptographic algorithms,a method of abstracting cryptographic operators at different granularities is proposed.This method addresses the issue of rapid and efficient deployment of high-order cryptographic operators on FPGAs through compilation optimization and mapping of operators at different granularities.Hotspot operators are abstracted from cryptographic algorithms to construct an operator library.Multi-level compilation optimization is used to optimize and deploy cryptographic algorithms.Data tensorization and register optimization methods are employed to enhance the deployment and computation efficiency of high-order cryptographic operators on the VTA hardware architecture.Experimental results show that the execution efficiency using tensorization and register optimization methods is 32 times higher than the original compilation and deployment methods,and approximately 34 times higher than OpenCL.Additionally,the constructed operator library allows for the rapid development and implementation of cryptographic algorithms.

Key words: Operator abstraction, Compilation optimization, Code generation, Cryptographic algorithm

中图分类号: 

  • TP311
[1]AMIR A,DATTA P,RISK W P,et al.Cognitive computingprogramming paradigm:A corelet language for composing networks of neurosynaptic cores[C]//International Joint Confe-rence on Neural Networks(IJCNN).2013:1-10.
[2]DAVISON A P,BRUDERLE D,EPPLER J,et al.PyNN:ACommon Interface for Neuronal Network Simulators[J/OL].Frontiers in Neuroinformatics,2008:2.http://doi.org/10.3389/neuro.11.011.2008.
[3]CHEN T Q,MOREAU T,JIANG Z H,et al.TVM:An automated end-to-end optimizing compiler for deep learning[C]//13th Symposium on Operating Systems Design and Implementation.2018:578-594.
[4]VASILACHE N,ZINENKO O,THEODORIDIS T,et al.Tensor comprehensions:Framework-agnostic high-performance machine learning abstractions[J].arXiv:1802.04730,2018.
[5]ROTEM N,FIX J,ABDULRASOOLl S,et al.Glow:GraphLowering Compiler Techniques for Neural Networks[J].arXiv:1805.00907,2018.
[6]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel ngraph:An intermediate representation,compiler,and executor for deep learning[J].arXiv:1801.08058,2018.
[7]ALEXANDER M,THIEN N.A MLIR Dialect for QuantumAssembly Languages[C]//2021 IEEE International Conference on Quantum Computing and Engineering(QCE).2021:255-264.
[8]LATTNER C,ADVE V.LLVM:A compilation framework for lifelong program analysis & transformation[C]//Proceedings of the international symposium on Code generation and optimization:feedback-directed and runtime optimization.2004:75-86.
[9]CHEN C,LI K L,OUYANG A,et al.FlinkCL:AN OpenCL-based in memory computing architecture on heterogeneous CPU-GPU clusters for big date[J].IEEE Transactions on Computers,2018,67(12):1765-1779.
[10]ASHOURI A H,KILLIAN W,CAVAZOS J,et al.A Survey on Compiler Autotuning using Machine Learning[J].ACM Computing Surveys,2018,51(5):1-42.
[11]ROESCH J,LYUBOMIRSKY S,KIRISAME M,et al.Relay:AHigh-Level Compiler for Deep Learning[J].arXiv:1904.08368,2019.
[12]ZHENG B W.Hash Algorithm Accelerator Based on FPGA[D].Wuxi:Jiangnan University,2023.
[13]ZHENG S,LIANG Y,WANG S,et al.FlexTensor:An Auto-matic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System[C]//Procee-dings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS'20).2020:859-873.
[14]ZHAO J,DI P.Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data[C]//53rd Annual IEEE/ACM International Symposium on Microarchitecture(MICRO) 2020.2020:427-441.
[15]LI L D.Parallel Optimization of Data Intensive Computing onSunway TaihuLight[D].Beijing:Tsinghua University,2021.
[16]WANG X Y,YU H B.A review of cryptographic hash algo-rithms[J].Information Security Research,2015,1(1):19-30.
[17]LI Y B,ZHAO R C,HAN L,et al.Parallelizing compliationframework for heterogeneous manycore processors[J].Journal of Software,2019,30(4):98101001.
[18]CONG J,LIU B,NEUENDORFFER S,et al.High-level synthesis for FPGAs:From prototyping to deployment[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2011,30(4):473-491.
[19]EJJEH A,ADVE V S,RUTENBAR R A.Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL[C]//The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(FPGA'20).2020.
[20]MOREAU T,CHEN T,JIANG Z,et al.VTA:An Open Hardware-Software Stack for Deep Learning[J].arXiv.1807.04188,2018.
[21]LIN S Y.Optimized Fully-pipelined Architecture of SHA-2 and MD5 on FPGA[D].Xiamen:Xiamen University,2020.
[22]TAO X H,ZHU Y,PANG J M,et al.Parallel Cod-e Generation for Sunway Heterogeneous Architecture[J].Journal of Software,2023,34(4):1570-1593.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!