Computer Science ›› 2025, Vol. 52 ›› Issue (9): 160-169.doi: 10.11896/jsjkx.250200083

• High Performance Computing • Previous Articles     Next Articles

Efficient Hardware Implementation of Pipelined NTT for Dynamic Rotation Factor Generation

HAN Yingmei1,2, LI Bin1,2, LI Kun1,2, ZHOU Qinglei1,3, YU Shiliang1   

  1. 1 College of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
    2 Songshan Laboratory,Zhengzhou 450001,China
    3 National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China
  • Received:2025-02-20 Revised:2025-05-15 Online:2025-09-15 Published:2025-09-11
  • About author:HAN Yingmei,born in 2000,postgra-duate.Her main research interests include information security and full homomorphic cipher.
    LI Bin,born in 1986,Ph.D,lecturer.His main research interests include information security and reconfigurable computing.
  • Supported by:
    Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province(SKLACSS-202408) and Songshan Laboratory Funded Project(241110210200).

Abstract: In the field of fully homomorphic encryption,the high computational complexity of polynomial multiplication has always been a bottleneck for performance improvement.To accelerate this process,the Number Theoretic Transform(NTT) has been widely used,but traditional NTT architectures based on multipath delay commutation have several drawbacks,such as insufficient hardware utilization and large overhead of twiddle factors.To address these issues,this paper proposes a novel NTT hardware accelerator design based on a pipelined architecture.This design optimizes the multipath delay commutation architecture through a unified NTT/INTT(Inverse NTT) framework,combined with pipelining techniques to achieve highly parallel computation.During the design process,the optimized integrated butterfly units introduce a pipeline-friendly improved K-RED algorithm,which can flexibly adapt to various computational strategies and accelerate large-scale data processing.Moreover,the design employs a dynamic twiddle factor generation technique,with a Twiddle Factor Generation(TFG) unit closely collaborating with the butterfly units.In the initial small parameter stage,pre-stored twiddle factors are read from ROM to save logic resources.When parameters are larger,the TFG dynamically generates twiddle factors and directly assigns them to the corresponding butterfly units,achieving efficient resource balancing and effectively reducing the storage requirements for twiddle factors.Field Programmable Gate Array implementation results show that,compared with previous studies,the proposed solution improves the execution speed of NTT on fully homomorphic encryption parameter sets by 1.14 to 40.81 times.At the same time,in terms of hardware resource usage,the solution reduces the area-time product metric by 23% to 82%.

Key words: Fully homomorphic encryption, Polynomial multiplication, Number theoretic transform, Pipeline, Dynamic twiddle factor

CLC Number: 

  • TP309
[1]RIVEST R L,ADLEMAN L,DERTOUZOS M L.On databanks and privacy homomorphisms[M]//Foundations of Secure Computation.New York:Academic Press,1978:169-179.
[2]GENTRY C.Fully homomorphic encryption using ideal lattices[C]//Proceedings of the 41st Annual ACM Symposium on Theo-ry of Computing.ACM,2009:169-178.
[3]VIAND A,JATTKE P,HITHNAWI A.SoK:Fully homomorphic encryption compilers[C]//2021 IEEE Symposium on Security and Privacy(SP).IEEE,2021:1092-1108.
[4]CHEN Z,MA Y,CHEN T,et al.Towards efficient Kyber onFPGAs:A processor for vector of polynomials[C]//2020 25th Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2020:247-252.
[5]ZHANG N,YANG B,CHEN C,et al.Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT[J].IACR Transactions on Cryptographic Hardware and Embedded Systems,2020(2):49-72.
[6]MU J,REN Y,WANG W,et al.Scalable and conflict-free NTT hardware accelerator design:Methodology,proof,and implementation[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,42(5):1504-1517.
[7]DUONG-NGOC P,LEE H.Configurable mixed-radix numbertheoretic transform architecture for lattice-based cryptography[J].IEEE Access,2022,10:12732-12741.
[8]ZHANG Y,WANG C,KHALID A,et al.Ultra high-speed polynomial multiplications for lattice-based cryptography on FPGAs[J].IEEE Transactions on Emerging Topics in Computing,2022,10(4):1993-2005.
[9]YE Z,CHEUNG R C C,HUANG K.PipeNTT:A pipelinednumber theoretic transform architecture[J].IEEE Transactions on Circuits and Systems II:Express Briefs,2022,69(10):4068-4072.
[10]RENTERÍA-MEJÍA C P,VELASCO-MEDINA J.Hardwaredesign of an NTT-based polynomial multiplier[C]//2014 IX Southern Conference on Programmable Logic(SPL).IEEE,2014:1-5.
[11]RENTERÍA-MEJÍA C P,VELASCO-MEDINA J.High-throughput ring-LWE cryptoprocessors[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2017,25(8):2332-2345.
[12]TAN W,WANG A,LAO Y,et al.Pipelined high-throughput NTT architecture for lattice-based cryptography[C]//2021 Asian Hardware Oriented Security and Trust Symposium(AsianHOST).IEEE,2021:1-4.
[13]LYUBASHEVSKY V,MICCIANCIO D,PEIKERT C,et al.SWIFFT:A modest proposal for FFT hashing[C]//Fast Software Encryption:15th International Workshop.Berlin:Sprin-ger,2008:54-72.
[14]HIRNER F,MERT A C,ROY S S.Proteus:A Pipelined NTT Architecture Generator[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2024,32(7):1228-1238.
[15]ZHAO Y,XIE R,XIN G,et al.A high-performance domain-specific processor with matrix extension of RISC-V for module-LWE applications[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2022,69(7):2871-2884.
[16]RIAZI M S,LAINE K,PELTON B,et al.HEAX:An architecture for computing on encrypted data[C]//Proceedings of the Twenty-fifth International Conference on Architectural Support for Programming Languages and Operating Systems.2020:1295-1309.
[17]PALUDO R,SOUSA L.NTT architecture for a linux-readyRISC-V fully-homomorphic encryption accelerator[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2022,69(7):2669-2682.
[18]KURNIAWAN S,DUONG-NGOC P,LEE H.Configurablememory-based NTT architecture for homomorphic encryption[J].IEEE Transactions on Circuits and Systems II:Express Briefs,2023,70(10):3942-3946.
[19]DUONG-NGOC P,KWON S,YOO D,et al.Area-efficient number theoretic transform architecture for homomorphic encryption[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2022,70(3):1270-1283.
[20]SAMARDZIC N,FELDMANN A,KRASTEV A,et al.F1:Afast and programmable accelerator for fully homomorphic encryption[C]//MICRO-54:54th Annual IEEE/ACM International Symposium on Microarchitecture.2021:238-252.
[21]LONGA P,NAEHRIG M.Speeding up the number theoretictransform for faster ideal lattice-based cryptography[C]//Cryptology and Network Security:15th International Conference.Springer,2016:124-139.
[22]NI Z,KHALID A,LIU W,et al.Towards a lightweight CRYSTALS-Kyber in FPGAs:An ultra-lightweight BRAM-free NTT core[C]//Proceedings of the IEEE International Symposium on Circuits and Systems(ISCAS).2023:1-5.
[23]JATI A,GUPTA N,CHATTOPADHYAY A,et al.A configurable CRYSTALS-Kyber hardware implementation with side-channel protection[J].ACM Transactions on Embedded Computing Systems,2024,23(2):1-25.
[24]NGUYEN D T,DANG V B,GAJ K.High-level synthesis in implementing and benchmarking number theoretic transform in lattice-based post-quantum cryptography using software/hardware codesign[C]//Applied Reconfigurable Computing.Architectures,Tools,and Applications:16th International Sympo-sium.Springer,2020:247-257.
[25]DI MATTEO S,GERFO M L,SAPONARA S.VLSI Design and FPGA implementation of an NTT hardware accelerator for Homomorphic seal-embedded library[J].IEEE Access,2023,11(1):72498-72508.
[26]GENG Y,HU X,LI M,et al.Rethinking parallel memory access pattern in number theoretic transform design[J].IEEE Transactions on Circuits and Systems II:Express Briefs,2023,70(5):1689-1693.
[1] QIAN Zekai, DING Xiaoou, SUN Zhe, WANG Hongzhi, ZHANG Yan. Intelligent Evidence Set Selection Method for Diverse Data Cleaning Tasks [J]. Computer Science, 2024, 51(8): 124-132.
[2] ZHONG Zhenyu, LIN Yongliang, WANG Haotian, LI Dongwen, SUN Yufei, ZHANG Yuzhi. Automatic Pipeline Parallel Training Framework for General-purpose Computing Devices [J]. Computer Science, 2024, 51(12): 129-136.
[3] DENG Shengnan, LUO Taiyu, HUANG Jingcai, REN Yuqing, SONG Wei, SU Chang, LEI Lili, HU Guanghui, XU Hong. Design and Implementation of Natural Gas Intelligent Scheduling Computer Platform System [J]. Computer Science, 2023, 50(6A): 220700258-7.
[4] ZHANG Bolin, LI Bin, YAN Yunfei, WEI Yuanxin, ZHOU Qinglei. ZUC High Performance Data Encryption Scheme Based on FPGA [J]. Computer Science, 2023, 50(11): 374-382.
[5] FU Si-qing, LI Tie-jun, ZHANG Jian-min. Architecture Design for Particle Transport Code Acceleration [J]. Computer Science, 2022, 49(6): 81-88.
[6] QIN Xiao-yue, HUANG Ru-wei, YANG Bo. NTRU Type Fully Homomorphic Encryption Scheme over Prime Power Cyclotomic Rings [J]. Computer Science, 2022, 49(5): 341-346.
[7] ZHU Zong-wu, HUANG Ru-wei. Secure Multi-party Computing Protocol Based on Efficient Fully Homomorphic Encryption [J]. Computer Science, 2022, 49(11): 345-350.
[8] LIU Dan, GUO Shao-zhong, HAO Jiang-wei, XU Jin-chen. Implementation of Transcendental Functions on Vectors Based on SIMD Extensions [J]. Computer Science, 2021, 48(6): 26-33.
[9] ZHANG Yuan-ming, YU Jia-rui, JIANG Jian-bo, LU Jia-wei, XIAO Gang. Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework [J]. Computer Science, 2021, 48(2): 41-46.
[10] WANG Guo-peng, YANG Jian-xin, YIN Fei, JIANG Sheng-jian. Computing Resources Allocation with Load Balance in Modern Processor [J]. Computer Science, 2020, 47(8): 41-48.
[11] CHEN Xiao-jie,ZHOU Qing-lei,LI Bin. Energy-efficient Password Recovery Method for 7-Zip Document Based on FPGA [J]. Computer Science, 2020, 47(1): 321-328.
[12] LI Meng-tian, HU Bin. RLWE-based Fully Homomorphic Encryption Scheme with Batch Technique [J]. Computer Science, 2019, 46(3): 209-216.
[13] SHI Jing-qi, YANG Geng, SUN Yan-jun, BAI Shuang-jie and MIN Zhao-e. Efficient Parallel Algorithm of Fully Homomorphic Encryption Supporting Operation of Floating-point Number [J]. Computer Science, 2018, 45(5): 116-122.
[14] MAO He-feng, HU Bin. Homomorphic Evaluation of Lightweight Block Cipher over Integers [J]. Computer Science, 2018, 45(11): 169-175.
[15] WU Qi, WANG Xing-wei, HUANG Min. OpenFlow Switch Packets Pipeline Processing Mechanism Based on SDN [J]. Computer Science, 2018, 45(10): 295-299.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!