计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 160-169.doi: 10.11896/jsjkx.250200083

• 高性能计算 • 上一篇    下一篇

动态旋转因子生成的流水线NTT高效硬件实现

韩迎梅1,2, 李斌1,2, 李坤1,2, 周清雷1,3, 于世梁1   

  1. 1 郑州大学计算机与人工智能学院 郑州 450001
    2 嵩山实验室 郑州 450001
    3 国家超级计算郑州中心 郑州 450001
  • 收稿日期:2025-02-20 修回日期:2025-05-15 出版日期:2025-09-15 发布日期:2025-09-11
  • 通讯作者: 李斌(iebinli@zzu.edu.cn)
  • 作者简介:(hanyingmei_hym@163.com)
  • 基金资助:
    先进密码技术与系统安全四川省重点实验室开放课题资助项目(SKLACSS-202408);嵩山实验室资助项目(241110210200)

Efficient Hardware Implementation of Pipelined NTT for Dynamic Rotation Factor Generation

HAN Yingmei1,2, LI Bin1,2, LI Kun1,2, ZHOU Qinglei1,3, YU Shiliang1   

  1. 1 College of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
    2 Songshan Laboratory,Zhengzhou 450001,China
    3 National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China
  • Received:2025-02-20 Revised:2025-05-15 Online:2025-09-15 Published:2025-09-11
  • About author:HAN Yingmei,born in 2000,postgra-duate.Her main research interests include information security and full homomorphic cipher.
    LI Bin,born in 1986,Ph.D,lecturer.His main research interests include information security and reconfigurable computing.
  • Supported by:
    Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province(SKLACSS-202408) and Songshan Laboratory Funded Project(241110210200).

摘要: 在全同态加密领域,多项式乘法的高复杂度计算一直是性能提升的难点。为加速此过程,数论变换(Number TheoreticTransform,NTT)被广泛应用,但传统基于多路径延迟交换的NTT架构存在硬件利用率不足、旋转因子开销大等缺点。针对这些问题,提出了一种基于流水线架构的NTT硬件加速器设计方案。该方案在多路径延迟交换架构的基础上进行优化,通过统一的NTT/INTT(Inverse NTT)框架,结合流水线技术实现高度并行计算。在设计过程中,优化集成的蝶形单元引入流水线友好的改进K-RED算法,灵活适配多种运算策略,加速大规模数据处理。此外,该设计方案采用动态旋转因子生成技术,旋转因子生成器(Twiddle Factor Generation,TFG)与蝶形单元紧密协作。在初始小参数阶段,从ROM读取预存旋转因子以节省逻辑资源;在参数较大时,启用TFG动态生成旋转因子并直接分配给相应的蝶形单元,实现资源高效平衡,有效降低了旋转因子的存储需求。现场可编程门阵列实现结果表明,与以往的研究相比,所提方案在全同态加密参数集的NTT上的执行速度提高了1.14~40.81倍;同时,在硬件资源占用方面,该方案在面积-时间乘积指标上减少了23%~82%。

关键词: 全同态加密, 多项式乘法, 数论变换, 流水线, 动态旋转因子

Abstract: In the field of fully homomorphic encryption,the high computational complexity of polynomial multiplication has always been a bottleneck for performance improvement.To accelerate this process,the Number Theoretic Transform(NTT) has been widely used,but traditional NTT architectures based on multipath delay commutation have several drawbacks,such as insufficient hardware utilization and large overhead of twiddle factors.To address these issues,this paper proposes a novel NTT hardware accelerator design based on a pipelined architecture.This design optimizes the multipath delay commutation architecture through a unified NTT/INTT(Inverse NTT) framework,combined with pipelining techniques to achieve highly parallel computation.During the design process,the optimized integrated butterfly units introduce a pipeline-friendly improved K-RED algorithm,which can flexibly adapt to various computational strategies and accelerate large-scale data processing.Moreover,the design employs a dynamic twiddle factor generation technique,with a Twiddle Factor Generation(TFG) unit closely collaborating with the butterfly units.In the initial small parameter stage,pre-stored twiddle factors are read from ROM to save logic resources.When parameters are larger,the TFG dynamically generates twiddle factors and directly assigns them to the corresponding butterfly units,achieving efficient resource balancing and effectively reducing the storage requirements for twiddle factors.Field Programmable Gate Array implementation results show that,compared with previous studies,the proposed solution improves the execution speed of NTT on fully homomorphic encryption parameter sets by 1.14 to 40.81 times.At the same time,in terms of hardware resource usage,the solution reduces the area-time product metric by 23% to 82%.

Key words: Fully homomorphic encryption, Polynomial multiplication, Number theoretic transform, Pipeline, Dynamic twiddle factor

中图分类号: 

  • TP309
[1]RIVEST R L,ADLEMAN L,DERTOUZOS M L.On databanks and privacy homomorphisms[M]//Foundations of Secure Computation.New York:Academic Press,1978:169-179.
[2]GENTRY C.Fully homomorphic encryption using ideal lattices[C]//Proceedings of the 41st Annual ACM Symposium on Theo-ry of Computing.ACM,2009:169-178.
[3]VIAND A,JATTKE P,HITHNAWI A.SoK:Fully homomorphic encryption compilers[C]//2021 IEEE Symposium on Security and Privacy(SP).IEEE,2021:1092-1108.
[4]CHEN Z,MA Y,CHEN T,et al.Towards efficient Kyber onFPGAs:A processor for vector of polynomials[C]//2020 25th Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2020:247-252.
[5]ZHANG N,YANG B,CHEN C,et al.Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT[J].IACR Transactions on Cryptographic Hardware and Embedded Systems,2020(2):49-72.
[6]MU J,REN Y,WANG W,et al.Scalable and conflict-free NTT hardware accelerator design:Methodology,proof,and implementation[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,42(5):1504-1517.
[7]DUONG-NGOC P,LEE H.Configurable mixed-radix numbertheoretic transform architecture for lattice-based cryptography[J].IEEE Access,2022,10:12732-12741.
[8]ZHANG Y,WANG C,KHALID A,et al.Ultra high-speed polynomial multiplications for lattice-based cryptography on FPGAs[J].IEEE Transactions on Emerging Topics in Computing,2022,10(4):1993-2005.
[9]YE Z,CHEUNG R C C,HUANG K.PipeNTT:A pipelinednumber theoretic transform architecture[J].IEEE Transactions on Circuits and Systems II:Express Briefs,2022,69(10):4068-4072.
[10]RENTERÍA-MEJÍA C P,VELASCO-MEDINA J.Hardwaredesign of an NTT-based polynomial multiplier[C]//2014 IX Southern Conference on Programmable Logic(SPL).IEEE,2014:1-5.
[11]RENTERÍA-MEJÍA C P,VELASCO-MEDINA J.High-throughput ring-LWE cryptoprocessors[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2017,25(8):2332-2345.
[12]TAN W,WANG A,LAO Y,et al.Pipelined high-throughput NTT architecture for lattice-based cryptography[C]//2021 Asian Hardware Oriented Security and Trust Symposium(AsianHOST).IEEE,2021:1-4.
[13]LYUBASHEVSKY V,MICCIANCIO D,PEIKERT C,et al.SWIFFT:A modest proposal for FFT hashing[C]//Fast Software Encryption:15th International Workshop.Berlin:Sprin-ger,2008:54-72.
[14]HIRNER F,MERT A C,ROY S S.Proteus:A Pipelined NTT Architecture Generator[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2024,32(7):1228-1238.
[15]ZHAO Y,XIE R,XIN G,et al.A high-performance domain-specific processor with matrix extension of RISC-V for module-LWE applications[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2022,69(7):2871-2884.
[16]RIAZI M S,LAINE K,PELTON B,et al.HEAX:An architecture for computing on encrypted data[C]//Proceedings of the Twenty-fifth International Conference on Architectural Support for Programming Languages and Operating Systems.2020:1295-1309.
[17]PALUDO R,SOUSA L.NTT architecture for a linux-readyRISC-V fully-homomorphic encryption accelerator[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2022,69(7):2669-2682.
[18]KURNIAWAN S,DUONG-NGOC P,LEE H.Configurablememory-based NTT architecture for homomorphic encryption[J].IEEE Transactions on Circuits and Systems II:Express Briefs,2023,70(10):3942-3946.
[19]DUONG-NGOC P,KWON S,YOO D,et al.Area-efficient number theoretic transform architecture for homomorphic encryption[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2022,70(3):1270-1283.
[20]SAMARDZIC N,FELDMANN A,KRASTEV A,et al.F1:Afast and programmable accelerator for fully homomorphic encryption[C]//MICRO-54:54th Annual IEEE/ACM International Symposium on Microarchitecture.2021:238-252.
[21]LONGA P,NAEHRIG M.Speeding up the number theoretictransform for faster ideal lattice-based cryptography[C]//Cryptology and Network Security:15th International Conference.Springer,2016:124-139.
[22]NI Z,KHALID A,LIU W,et al.Towards a lightweight CRYSTALS-Kyber in FPGAs:An ultra-lightweight BRAM-free NTT core[C]//Proceedings of the IEEE International Symposium on Circuits and Systems(ISCAS).2023:1-5.
[23]JATI A,GUPTA N,CHATTOPADHYAY A,et al.A configurable CRYSTALS-Kyber hardware implementation with side-channel protection[J].ACM Transactions on Embedded Computing Systems,2024,23(2):1-25.
[24]NGUYEN D T,DANG V B,GAJ K.High-level synthesis in implementing and benchmarking number theoretic transform in lattice-based post-quantum cryptography using software/hardware codesign[C]//Applied Reconfigurable Computing.Architectures,Tools,and Applications:16th International Sympo-sium.Springer,2020:247-257.
[25]DI MATTEO S,GERFO M L,SAPONARA S.VLSI Design and FPGA implementation of an NTT hardware accelerator for Homomorphic seal-embedded library[J].IEEE Access,2023,11(1):72498-72508.
[26]GENG Y,HU X,LI M,et al.Rethinking parallel memory access pattern in number theoretic transform design[J].IEEE Transactions on Circuits and Systems II:Express Briefs,2023,70(5):1689-1693.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!