计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230800138-9.doi: 10.11896/jsjkx.230800138

• 计算机软件&体系架构 • 上一篇    下一篇

Dilithium算法的FPGA高效扩展性优化

燕云飞, 李斌, 魏源鑫, 张博林, 马添翼, 周清雷   

  1. 郑州大学计算机与人工智能学院 郑州 450001
  • 发布日期:2024-06-06
  • 通讯作者: 李斌(iebinli@zzu.edu.cn)
  • 作者简介:(994982837@qq.com)
  • 基金资助:
    河南省科技攻关项目(232102211055);河南省网络密码技术重点实验室研究课题(LNCT2022-A14)

FPGA Efficient Scalability Optimization of Dilithium

YAN Yunfei, LI Bin, WEI Yuanxin, ZHANG Bolin, MA Tianyi, ZHOU Qinglei   

  1. School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
  • Published:2024-06-06
  • About author:YAN Yunfei,born in 1999,postgra-duate.His main research interests include post-quantum cryptography and high-performance computing.
    LI Bin,born in 1986,Ph.D,lecturer.His main research interests include high-performance computing and information security.
  • Supported by:
    Key Science and Technology Research Project of Henan Province,China(232102211055) and Research Project of Key Laboratory of Network Cryptography Technology of Henan Province,China(LNCT2022-A14).

摘要: 为提高Dilithium在实际应用中的运行效率,提出了一种Dilithium算法的现场可编程门阵列(Field Programmable Gate Array,FPGA)高效扩展性优化实现。具体在以下几个方面进行优化:将KOA(Karatsuba-Offman-Algorithm)算法与快速模约减算法相结合,构成快速模乘单元,优化数论转换(Number TheoreticTransform,NTT)实现的大量多项式乘法;采用多RAM(Random Access Memory)存取参与运算的多项式系数,根据Dilithium算法的特点,设计了一种多项式系数读取策略,以快速、正确地读取RAM中的多项式系数。针对方案中的采样和散列工作,分析了SHAKE算法系列的特点,设计了一种低延迟可扩展的Keccak硬件架构,使得其能够根据输入信号的不同执行不同的SHAKE算法。实验结果表明,所提方案在频率方面相比其他方案提升了60.7%~131.9%,兼顾硬件的资源消耗和执行效率。

关键词: Dilithium算法, 现场可编程门阵列, 数论变换, 硬件实现

Abstract: To improve the operational efficiency of Dilithium in practical applications,an efficient field programmable gate array(FPGA) implementation of the Dilithium algorithm is proposed.Optimization is carried out in several aspects,including combining the Karatsuba-Offman algorithm(KOA) with the fast modular reduction algorithm to create a fast modular multiplication unit,optimizing the extensive polynomial multiplication achieved through number theoretic transform(NTT) implementation.Multiple RAM accesses are employed for polynomial coefficient operations,and a coefficient reading strategy tailored to the characteristics of the Dilithium algorithm is designed to achieve rapid and accurate reading of polynomial coefficients from RAM.For the sampling and hashing tasks in the scheme,the characteristics of the SHAKE algorithm series are analyzed,leading to the development of a low-latency and scalable Keccak hardware architecture,allowing it to execute different SHAKE algorithms based on the input signal.Experimental results demonstrate that the working frequency of the proposed algorithm is increased by 60.7%~131.9%,while balancing hardware resource consumption and execution efficiency.

Key words: Dilithium algorithm, FPGA, NTT, Hardware implementation

中图分类号: 

  • TP309.7
[1]CHEN L,CHEN L,JORDAN S,et al.Report on post-quantum cryptography[M].Gaithersburg,MD,USA:US Department of Commerce,National Institute of Standards and Technology,2016.
[2]DANG V B,FARAHMAND F,ANDRZEJCZAK M,et al.Im-plementation and benchmarking of round 2 candidates in the NIST post-quantum cryptography standardization process using hardware and software/hardware co-design approaches[J].IACR Cryptol EPrint Arch,2020,2020(795):1-86.
[3]LAND G,SASDRICH P,GÜNEYSU T.A hard crystal-implementing dilithium on reconfigurable hardware[C]//Smart Card Research and Advanced Applications.2022:210-230.
[4]MERT A C,JACQUEMIN D,DAS A,et al.A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange[J].IEEE Transactions on Computers,2022,14(8):1-13.
[5]RICCI S,MALINA L,JEDLICKA P,et al.Implementing crys-tals-dilithium signature scheme on fpgas[C]//The 16th International Conference on Availability,Reliability and Security.2021:1-11.
[6]ZHAO C,ZHANG N,WANG H,et al.A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium[J].IACR Trans.Cryptogr.Hardw.Embed.Syst.,2022,2022(1):270-295.
[7]BECKER H,HWANG V,KANNWISCHER M J,et al.Neonntt:Faster dilithium,kyber,and saber on cortex-a72 and apple m1[J/OL].Cryptology ePrint Archive,2021.https://eprint.iacr.org/2021/986.
[8]SONI D,BASU K,NABEEL M,et al.CRYSTALS-dilithium[M]//Hardware Architectures for Post-Quantum Digital Signature Schemes.2021:13-30.
[9]BANERJEE U,UKYAB T S,CHANDRAKASAN A P.Sap-phire:A configurable crypto-processor for post-quantum lattice-based protocols[J].IACR Transactions on Cryptographic Hardware and Embedded Systems,2019,2019(4):17-61.
[10]DWORKIN M J.SHA-3 standard:Permutation-based hash and extendable-output functions[S].National Institute of Standards and Technology,Gaithersburg.2015.
[11]ASSAD F,ELOTMANI F,FETTACH M,et al.An optimalhardware implementation of the KECCAK hash function on virtex-5 FPGA[C]//2019 International Conference on Systems of Collaboration Big Data,Internet of Things & Security(SysCoBIoTS).IEEE,2019:1-5.
[12]SONI D,KARRI R.Efficient hardware implementation of PQC primitives and pqc algorithms using high-level synthesis[C]//2021 IEEE Computer Society Annual Symposium on VLSI(ISVLSI).IEEE,2021:296-301.
[13]DOLMETA A.Hardware architecture for CRYSTALS-Kybercryptographic primitives[D].Politecnico di Torino,2022.
[14]BECKWITH L,NGUYEN D T,GAJ K.High-PerformanceHardware Implementation of CRYSTALS-Dilithium[C]//2021 International Conference on Field-Programmable Technology(ICFPT).IEEE,2021:1-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!