Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230800138-9.doi: 10.11896/jsjkx.230800138

• Computer Software & Architecture • Previous Articles     Next Articles

FPGA Efficient Scalability Optimization of Dilithium

YAN Yunfei, LI Bin, WEI Yuanxin, ZHANG Bolin, MA Tianyi, ZHOU Qinglei   

  1. School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
  • Published:2024-06-06
  • About author:YAN Yunfei,born in 1999,postgra-duate.His main research interests include post-quantum cryptography and high-performance computing.
    LI Bin,born in 1986,Ph.D,lecturer.His main research interests include high-performance computing and information security.
  • Supported by:
    Key Science and Technology Research Project of Henan Province,China(232102211055) and Research Project of Key Laboratory of Network Cryptography Technology of Henan Province,China(LNCT2022-A14).

Abstract: To improve the operational efficiency of Dilithium in practical applications,an efficient field programmable gate array(FPGA) implementation of the Dilithium algorithm is proposed.Optimization is carried out in several aspects,including combining the Karatsuba-Offman algorithm(KOA) with the fast modular reduction algorithm to create a fast modular multiplication unit,optimizing the extensive polynomial multiplication achieved through number theoretic transform(NTT) implementation.Multiple RAM accesses are employed for polynomial coefficient operations,and a coefficient reading strategy tailored to the characteristics of the Dilithium algorithm is designed to achieve rapid and accurate reading of polynomial coefficients from RAM.For the sampling and hashing tasks in the scheme,the characteristics of the SHAKE algorithm series are analyzed,leading to the development of a low-latency and scalable Keccak hardware architecture,allowing it to execute different SHAKE algorithms based on the input signal.Experimental results demonstrate that the working frequency of the proposed algorithm is increased by 60.7%~131.9%,while balancing hardware resource consumption and execution efficiency.

Key words: Dilithium algorithm, FPGA, NTT, Hardware implementation

CLC Number: 

  • TP309.7
[1]CHEN L,CHEN L,JORDAN S,et al.Report on post-quantum cryptography[M].Gaithersburg,MD,USA:US Department of Commerce,National Institute of Standards and Technology,2016.
[2]DANG V B,FARAHMAND F,ANDRZEJCZAK M,et al.Im-plementation and benchmarking of round 2 candidates in the NIST post-quantum cryptography standardization process using hardware and software/hardware co-design approaches[J].IACR Cryptol EPrint Arch,2020,2020(795):1-86.
[3]LAND G,SASDRICH P,GÜNEYSU T.A hard crystal-implementing dilithium on reconfigurable hardware[C]//Smart Card Research and Advanced Applications.2022:210-230.
[4]MERT A C,JACQUEMIN D,DAS A,et al.A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange[J].IEEE Transactions on Computers,2022,14(8):1-13.
[5]RICCI S,MALINA L,JEDLICKA P,et al.Implementing crys-tals-dilithium signature scheme on fpgas[C]//The 16th International Conference on Availability,Reliability and Security.2021:1-11.
[6]ZHAO C,ZHANG N,WANG H,et al.A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium[J].IACR Trans.Cryptogr.Hardw.Embed.Syst.,2022,2022(1):270-295.
[7]BECKER H,HWANG V,KANNWISCHER M J,et al.Neonntt:Faster dilithium,kyber,and saber on cortex-a72 and apple m1[J/OL].Cryptology ePrint Archive,2021.https://eprint.iacr.org/2021/986.
[8]SONI D,BASU K,NABEEL M,et al.CRYSTALS-dilithium[M]//Hardware Architectures for Post-Quantum Digital Signature Schemes.2021:13-30.
[9]BANERJEE U,UKYAB T S,CHANDRAKASAN A P.Sap-phire:A configurable crypto-processor for post-quantum lattice-based protocols[J].IACR Transactions on Cryptographic Hardware and Embedded Systems,2019,2019(4):17-61.
[10]DWORKIN M J.SHA-3 standard:Permutation-based hash and extendable-output functions[S].National Institute of Standards and Technology,Gaithersburg.2015.
[11]ASSAD F,ELOTMANI F,FETTACH M,et al.An optimalhardware implementation of the KECCAK hash function on virtex-5 FPGA[C]//2019 International Conference on Systems of Collaboration Big Data,Internet of Things & Security(SysCoBIoTS).IEEE,2019:1-5.
[12]SONI D,KARRI R.Efficient hardware implementation of PQC primitives and pqc algorithms using high-level synthesis[C]//2021 IEEE Computer Society Annual Symposium on VLSI(ISVLSI).IEEE,2021:296-301.
[13]DOLMETA A.Hardware architecture for CRYSTALS-Kybercryptographic primitives[D].Politecnico di Torino,2022.
[14]BECKWITH L,NGUYEN D T,GAJ K.High-PerformanceHardware Implementation of CRYSTALS-Dilithium[C]//2021 International Conference on Field-Programmable Technology(ICFPT).IEEE,2021:1-10.
[1] YU Yunjun, ZHANG Pengfei, GONG Hancheng, CHEN Min. Lightweight Network Hardware Acceleration Design for Edge Computing [J]. Computer Science, 2023, 50(11A): 220800045-7.
[2] WANG Xiaofeng, LI Chaoran, LU Kunfeng, LUAN Tianjiao, YAO Na, ZHOU Hui, XIE Yujia. Acceleration Design and FPGA Implementation of CNN Scene Matching Algorithm [J]. Computer Science, 2023, 50(11): 8-14.
[3] WANG Yuzhan, GUO Bin, WANG Hongli, LIU Sicong. Adaptive Model Quantization Method for Intelligent Internet of Things Terminal [J]. Computer Science, 2023, 50(11): 306-316.
[4] ZHANG Bolin, LI Bin, YAN Yunfei, WEI Yuanxin, ZHOU Qinglei. ZUC High Performance Data Encryption Scheme Based on FPGA [J]. Computer Science, 2023, 50(11): 374-382.
[5] ZHAI Jia-qi, LI Bin, ZHOU Qing-lei, CHEN Xiao-jie. Implementation of FPGA-based High-performance and Scalable SM4-GCM Algorithm [J]. Computer Science, 2022, 49(10): 74-82.
[6] WANG Deng-tian, ZHOU Hua, QIAN He-yue. LDPC Adaptive Minimum Sum Decoding Algorithm and Its FPGA Implementation [J]. Computer Science, 2021, 48(6A): 608-612.
[7] GUO Biao, TANG Qi, WEN Zhi-min, FU Juan, WANG Ling, WEI Ji-bo. List-based Software and Hardware Partitioning Algorithm for Dynamic Partial Reconfigurable System-on-Chip [J]. Computer Science, 2021, 48(6): 19-25.
[8] QI Yan-rong, ZHOU Xia-bing, LI Bin, ZHOU Qing-lei. FPGA-based CNN Image Recognition Acceleration and Optimization [J]. Computer Science, 2021, 48(4): 205-212.
[9] WANG Zhe, TANG Qi, WANG Ling, WEI Ji-bo. Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing [J]. Computer Science, 2020, 47(8): 26-31.
[10] CHEN Li-feng, ZHU Lu-ping. Encrypted Dynamic Configuration Method of FPGA Based on Cloud [J]. Computer Science, 2020, 47(7): 278-281.
[11] ZHAO Bo, YANG Ming, TANG Zhi-wei and CAI Yu-xin. Intelligent Video Surveillance Systems Based on FPGA [J]. Computer Science, 2020, 47(6A): 609-611.
[12] ZHU Li-hua, WANG Ling, TANG Qi, WEI Ji-bo. Efficient MILP Model for HW/SW Partitioning of Dynamic Partial Reconfigurable SoC [J]. Computer Science, 2020, 47(4): 18-24.
[13] LI Bin, ZHOU Qing-lei, SI Xue-ming, CHEN Xiao-jie. Optimized Implementation of Office Password Recovery Based on FPGA Cluster [J]. Computer Science, 2020, 47(11): 32-41.
[14] ZHOU Hui-ting, ZHOU Jie. Simulation and Analysis on Improved NC-OFDM Algorithm [J]. Computer Science, 2020, 47(10): 263-268.
[15] JIA Xun, QIAN Lei, WU Gui-ming, WU Dong, XIE Xiang-hui. Research Advances and Future Challenges of FPGA-based High Performance Computing [J]. Computer Science, 2019, 46(11): 11-19.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!