Computer Science ›› 2022, Vol. 49 ›› Issue (10): 74-82.doi: 10.11896/jsjkx.210900137

• High Perfonnance Computing • Previous Articles     Next Articles

Implementation of FPGA-based High-performance and Scalable SM4-GCM Algorithm

ZHAI Jia-qi1, LI Bin1, ZHOU Qing-lei1,2, CHEN Xiao-jie2   

  1. 1 School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China
  • Received:2021-09-16 Revised:2022-03-14 Online:2022-10-15 Published:2022-10-13
  • About author:ZHAI Jia-qi,born in 1998,postgra-duate.His main research interests include high-performance computing and information security.
    LI Bin,born in 1986,Ph.D,lecturer.His main research interests include high-performance computing and information security.
  • Supported by:
    National Natural Science Foundation of China(61702518) and National Key R & D Program “Public Safety Risk Preventionand Controland Emergency Technology Assembly” Key Special Project(2018XXXXXXX01).

Abstract: In the context of vigorous development of big data and 5G technology,information encryption in high-speed communication systems has become a new research hotspot.How to increase data throughput and reduce the difficulty of adapting encryption algorithms to different application scenarios while ensuring high data security has become important research topics.Aiming at the problem that traditional software’s SM4-GCM algorithm has a low throughput rate and is difficult to apply in changing 5G and big data scenarios,this paper analyzes the characteristics of SM4-GCM algorithm based on the reconfigurable characteristics of FPGA,using Mastrovito,Karatsuba and fast remainder algorithms.Two high-performance,CNC-separated and expandable circuit structures are designed.Full-pipeline technology and four-degree parallel technology are used to accelerate the optimization of SM4-GCM algorithm.While ensuring high security,it can achieve a high throughput rate,and can be flexibly transplanted to various application scenarios.Experimental results show that the throughput rates of the proposed two solutions in this paper for a single SM4-GCM module have reach 28.16 Gbps and 28.8 Gbps,respectively,which are superior to similar published designs in terms of performance and scalability.

Key words: SM4, Galois/Counter Mode, FPGA, High throughput rate, Scalable

CLC Number: 

  • TP309
[1]GB/T 32907-2016 Information Security Technology SM4 BlockCipher Algorithm [S].Beijing:China Standard Press,2016.
[2]IEEE Std 802.1AEbn.IEEE Standard for Local and Metropolitan Area Networks-Media Access Control(MAC) SecurityAmendment 1:Galois Counter Mode-Advanced Encryption Standard-256(GCM-AES-256) Cipher Suite,September 2011,[OL].http://www.ieee802.org/l/pages/802.laebn.html.
[3]FU T S,LI S G.A High-throughput ASIC implementation ofSM4 algorithm CBC mode[J].Microelectronics and Computer,2016,33(10):13-18.
[4]LI L,YANG F,PAN Y M,et al.An implementation method for SM4-GCM on FPGA[C]//2017 IEEE 2nd Advanced Information Technology,Electronic and Automation Control Conference(IAEAC).IEEE,2017:1921-1925.
[5]CHENG W Z,ZHENG F Y,PAN W Q,et al.High-performance symmetric cryptography server with GPU acceleration[C]//International Conference on Information and Communications Security.Cham:Springer,2017:529-540.
[6]WANG Z F,TANG Z J.High-throughput ASIC implementation of SM4 algorithm CTR mode[J].Electronic Devices,2019,42(1):173-177.
[7]QIU S,BAI G Q.Power analysis of a FPGA implementation of SM4[C]//Fifth International Conference on Computing,Communications and Networking Technologies(ICCCNT).IEEE,2014:1-6.
[8]LI J,XIE W B,LI L C,et al.Parallel Implementation and Optimization of SM4 Based on CUDA[C]//EAI International Conference on Applied Cryptography in Computer and Communications.Cham:Springer,2021:93-104.
[9]OSCAR F,SRINIVASAN S,RAMESH C,et al.A Survey on High-Throughput Non-Binary LDPC Decoders:ASIC,FPGA,and GPU Architectures[J].IEEE Communications Surveys & Tutorials,2021,24(1):524-556.
[10]LIU J J,SHI J J,ZHANG D J,et al.Hardware implementation and application of SM4 algorithm in wireless communication [J].Computer Engineering and Applications,2016,52(17):118-122.
[11]XU J F,YANG Y H.Parallel mapping of SM4 algorithm on coarse-grained array platform[J].Application of Electronic Technology,2017,43(4):39-42.
[12]ZHANG X,ZHOU Q L,LI B.Research and Implementation of Reconfigurable SM4 Cipher Algorithm Based on HRCA[J].Journal of Network and Information Security,2020,6(5):101-109.
[13]ZHANG J,WU W L.Authenticated encryption algorithm based on SM4 round function design[J].Acta Electronica Sinica,2018,46(6):1294-1299.
[14]SA'ED A,REEM J,BASSAM J M,et al.Performance evaluation of the SM4 cipher based on field-programmable gate array implementation[J].IET Circuits,Devices & Systems,2021,15(2):121-135.
[15]MOZAFFARI-KERMANI M,REYHANI-MASOLEH A.Efficient and high-performance parallel hardware architectures for the AES-GCM[J].IEEE Transactions on Computers,2011,61(8):1165-1178.
[16]SANDHYA K,AMITABH D,KESHAB K P.FPGA implementation and comparison of AES-GCM and Deoxys authenticated encryption schemesp[C]//2017 IEEE International Symposium on Circuits and Systems(ISCAS).2017:1-4.
[17]ZHANG Z,WANG X,HAO Q,et al.High-efficiency parallelcryptographic accelerator for real-time guaranteeing dynamic data security in embedded systems[J].Micromachines,2021,12(5):560-584.
[18]KARIM M A,ROSELYNE C A HABIB M,et al.AES-GCM and AEGIS:Efficient and High Speed Hardware Implementations[J].Journal of Signal Processing Systems,2017,88(1):1-12.
[19]AHMAD,NABIHAH,LIM M W,et al.Advanced Encryption Standard with Galois Counter Mode using Field Programmable Gate Array[J].Journal of Physics:Conference Series,2018,1019(1):1-7.
[20]LI Y,MA X P,ZHANG Y,et al.Mastrovito form of non-recursive Karatsuba multiplier for all trinomials[J].IEEE Transactions on Computers,2017,66(9):1573-1584.
[21]GUERON S,KOUNAVIS M.Efficient implementation of theGalois Counter Mode using a carry-less multiplier and a fast reduction algorithm[J].Information Processing Letters,2010,110(14):549-553.
[22]SUNAR B,KOC C K.Mastrovito multiplier for all trinomials[J].IEEE Transactions on Computers,1999,48(5):522-527.
[23]HALBUTOGULLARI A,KOC C K.Mastrovito multiplier for general irreducible polynomials[J].IEEE Transactions on Computers,2000,49(5):503-518.
[24]SKOWYRA R,XU L,GU G F,et al.Effective topology tampering attacks and defenses in software-defined networks[C]//2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).IEEE,2018:374-385.
[25]HE S Y,LI H,LI F H.The FPGA optimization implementation method of SM4 algorithm[J].Journal of Xidian University,2021,48(3):155-162.
[26]YANG G Q,DING H C,ZOU J,et al.A big data securityscheme based on high-performance cryptography[J].Computer Research and Development,2019,56(10):2207-2215.
[27]GUAN Z,LI Y,SHANG T,et al.Implementation of SM4 onFPGA:Trade-off analysis between area and speed[C]//2018 IEEE International Conference on Intelligence and Safety for Robotics(ISR).IEEE,2018:192-197.
[28]QU S X.Research and implementation of GCM encryption authentication algorithm based on FPGA[D].Beijing:Beijing University of Posts and Telecommunications,2010.
[29]VLIEGEN J,REQARAZ O,MENTENS N.Maximizing thethroughput of threshold-protected AES-GCM implementations on FPGA[C]//2017 IEEE 2nd International Verification and Security Workshop(IVSW).IEEE,2017:140-145.
[1] CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[2] WANG Deng-tian, ZHOU Hua, QIAN He-yue. LDPC Adaptive Minimum Sum Decoding Algorithm and Its FPGA Implementation [J]. Computer Science, 2021, 48(6A): 608-612.
[3] GUO Biao, TANG Qi, WEN Zhi-min, FU Juan, WANG Ling, WEI Ji-bo. List-based Software and Hardware Partitioning Algorithm for Dynamic Partial Reconfigurable System-on-Chip [J]. Computer Science, 2021, 48(6): 19-25.
[4] QI Yan-rong, ZHOU Xia-bing, LI Bin, ZHOU Qing-lei. FPGA-based CNN Image Recognition Acceleration and Optimization [J]. Computer Science, 2021, 48(4): 205-212.
[5] JI Xiao-xiang, SHEN Hang, BAI Guang-wei. Non-orthogonal Multiple Access Enabled Scalable Video Multicast in HetNets [J]. Computer Science, 2021, 48(11): 356-362.
[6] WANG Zhe, TANG Qi, WANG Ling, WEI Ji-bo. Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing [J]. Computer Science, 2020, 47(8): 26-31.
[7] CHEN Li-feng, ZHU Lu-ping. Encrypted Dynamic Configuration Method of FPGA Based on Cloud [J]. Computer Science, 2020, 47(7): 278-281.
[8] ZHAO Bo, YANG Ming, TANG Zhi-wei and CAI Yu-xin. Intelligent Video Surveillance Systems Based on FPGA [J]. Computer Science, 2020, 47(6A): 609-611.
[9] ZHU Li-hua, WANG Ling, TANG Qi, WEI Ji-bo. Efficient MILP Model for HW/SW Partitioning of Dynamic Partial Reconfigurable SoC [J]. Computer Science, 2020, 47(4): 18-24.
[10] LI Bin, ZHOU Qing-lei, SI Xue-ming, CHEN Xiao-jie. Optimized Implementation of Office Password Recovery Based on FPGA Cluster [J]. Computer Science, 2020, 47(11): 32-41.
[11] ZHOU Hui-ting, ZHOU Jie. Simulation and Analysis on Improved NC-OFDM Algorithm [J]. Computer Science, 2020, 47(10): 263-268.
[12] ZHU Ren-jie. Study on SM4 Differential Fault Attack Under Extended Fault Injection Range [J]. Computer Science, 2019, 46(11A): 493-495.
[13] JIA Xun, QIAN Lei, WU Gui-ming, WU Dong, XIE Xiang-hui. Research Advances and Future Challenges of FPGA-based High Performance Computing [J]. Computer Science, 2019, 46(11): 11-19.
[14] LI Yun-bo, TANG Si-qi, ZHOU Xing-yu, PAN Zhi-song. Crowd Counting Method via Scalable Modularized Convolutional Neural Network [J]. Computer Science, 2018, 45(8): 17-21.
[15] LIU Guo-qi and LI Chen-jing. Image Segmentation Method of Level Set Regularization Based on Bessel Filter [J]. Computer Science, 2018, 45(3): 283-287.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!