计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 36-42.doi: 10.11896/jsjkx.201200023

• 计算机体系结构* 上一篇    下一篇

基于“嵩山”超级计算机系统的量子傅里叶变换模拟

谢景明1, 胡伟方1, 韩林2, 赵荣彩2, 荆丽娜3   

  1. 1 郑州大学信息工程学院 郑州450000
    2 郑州大学河南省超算中心 郑州450000
    3 郑州大学中原网络安全研究院 郑州450001
  • 收稿日期:2020-12-02 修回日期:2021-03-19 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 韩林(strollerlin@163.com)
  • 作者简介:624797981@qq.com
  • 基金资助:
    国家重点研发计划(2018YFB0505000)

Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System

XIE Jing-ming1, HU Wei-fang1, HAN Lin2, ZHAO Rong-cai2, JING Li-na3   

  1. 1 Information Engineering Institute,Zhengzhou University,Zhengzhou 450000,China
    2 Henan Supercomputing Center,Zhengzhou University,Zhengzhou 450000,China
    3 Zhong Yuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450001,China
  • Received:2020-12-02 Revised:2021-03-19 Online:2021-12-15 Published:2021-11-26
  • About author:XIE Jing-ming,born in 1995,postgra-duate.His main research interests include high-performance parallel computation and heterogeneous computation.
    HAN Lin,born in 1978,Ph.D,associate professor,master instructor,is a member of China Computer Federation.His main research interests include high-performance computing and compilation optimization.
  • Supported by:
    National Key R & D Program of China(2018YFB0505000).

摘要: “嵩山”超级计算机系统是中国自主研发的新一代异构超级计算机集群,其搭载的CPU和DCU加速器均为我国自主研发。为扩充该平台的科学计算生态,验证量子计算研究在该平台上开展的可行性,文中使用异构编程模型实现了量子傅里叶变换模拟在“嵩山”超级计算机系统上的异构版本,将程序的计算热点部分分配至DCU上运行;然后使用MPI在单计算节点上开启多进程,实现DCU加速器数据传输和计算的并发;最后,通过计算与通信的隐藏避免了DCU在数据传输时处于较长时间的空闲状态。实验首次在超算系统上实现了44 Qubits规模的量子傅里叶变换模拟,结果显示,异构版本的量子傅里叶变换模拟充分利用了DCU加速器计算资源,相较于传统CPU版本,其取得了11.594的加速比,且在集群上具有良好的可拓展性,该方法为其他量子算法在“嵩山”超级计算机系统上的模拟实现以及优化提供了参考。

关键词: DCU加速器, HIP-C, MPI, 量子傅里叶变换, 通信隐藏, 异构计算

Abstract: The “Songshan” supercomputer system is a new generation of heterogeneous supercomputer cluster independently developed by China.The CPU and DCU accelerators it carries are all independently developed by my country.In order to expand the scientific computing ecology of the platform and verify the feasibility of quantum computing research on this platform,the paper uses a heterogeneous programming model to implement a heterogeneous version of the quantum Fourier transform simulation on the “Songshan” supercomputer system.The computing hotspots of the program are allocated to run on the DCU;then MPI is used to enable multiple processes on a single computing node to realize the concurrent data transmission and calculation of the DCU accelerator;finally,the hiding of computing and communication prevents the DCU from being in the middle of data transmission.The experiment implements a 44 Qubits-scale quantum Fourier transform simulation on a supercomputing system for the first time.The results show that the heterogeneous version of the quantum Fourier transform module makes full use of the computing resources of the DCU accelerator and achieves 11.594 compared to the traditional CPU version.The speedup ratio is high,and it has good scalability on the cluster.This implementation method provides a reference for the simulation implementation and optimization of other quantum algorithms on the “Songshan” supercomputer system.

Key words: Communication hiding, DCU accelerator, Heterogeneous computing, HIP-C, MPI, Quantum Fourier transform

中图分类号: 

  • TP387
[1]GIBNEY E.Quantum computer race intensifies as alternative technology gains steam[J].Nature,2020,587(7834):342-343.
[2]CHO A.Google claims quantum computing milestone[J]. Science,2019,365(6460):1364.
[3]LLOYD S,GARNERONE S,ZANARDI P.Quantum algorithms for topological and geometric analysis of data[J].Nature Communications,2016,7:10138.
[4]ZHOU S S,LOKE T,IZAAC J A,et al.Quantum Fourier transform in computational basis[J].Quantum Information Proces-sing,2017,16(3):1-19.
[5] LIU X N,JING L N.Large scale Quantum Fourier Transform Simulation Based on SW26010[J].Computer Science,2020,47(8):93-97.
[6]LIU X,YANG Z,YANG Y.A nested split load balancing algorithm for Tianhe No.2[J].Computer Research and Development,2018,55(2):418-425.
[7]BAKHODA A,YUAN G L,FUNG W W L,et al.Analyzing CUDA workloads using a detailed gpu simulator[C]//the 2009 IEEE International Symposium on Performance Analysis of Systems and Software.2009.
[8]JOHN C.Professional CUDAC Programming[M].Wiley Inter Science,2014.
[9]GUPTA S,BABU M R.Generating Performance Analysis of GPU Compared to Single-core and Multi-core CPU for Natural Language Applications[J].International Journal of Advanced Computer Science and Applications,2011,2(5):50-53.
[10]CHENG S Y.Research on performance evaluation and optimization technology of heterogeneous(CPU-GPU) computer systems[D].National University of Defense Technology,2011.
[11]HASANIJAFARI S,PARSAMEHR S.Solving the Fourier Transform Issue Using Quantum Coherent States[J].International Journal of Theoretical Physics,2019,58(8):2407-2413.
[12]LIU X,GUO H,SUN R J,et al.The Characteristic Analysis and Exascale Scalability Research of Large Scale Parallel Applications on Sunway Taihulight Supercomputer[J].Journal of Computer,2018,14(10):2209-2220.
[13]YE C,ZHENG S G,LONG C,et al.Quantum Fourier Transform and Phase Estimation in Qudit System[J].Communications in Theoretical Physics,2011,55(5):790-794.
[14]COURTNEY D.The guide to CUDA[M].Create Space Independent Publishing Platform,2015.
[15]YU Q,CHILDERS B,HUANG L,et al.A quantitative evaluation of unified memory in GPUs[J].The Journal of Supercomputing,2020,76(2):2958-2985.
[16]SEREN S,CAN Ö.Integer programming based heterogeneous CPU-GPU cluster schedulers for SLURM resource manager[J].Journal of Computer and System Sciences,2015,81(1):38-56.
[17]FORUM M P.MPI:A Message-Passing Interface Standard[J]. Intl J of Supercomputing Applications,1994,8(2):179.
[18]WANG Y H,QIAO J Z,LIN S K, et al. An Optimization Stra- tegy for Improving Throughput of GPU Global Memory[J].Journal of Grey System,2018,30(2):42-56.
[1] 冯雁, 王蕊聪.
基于量子傅里叶变换求和的量子投票协议
Quantum Voting Protocol Based on Quantum Fourier Transform Summation
计算机科学, 2022, 49(5): 311-317. https://doi.org/10.11896/jsjkx.210300058
[2] 刘江, 刘文博, 张矩.
OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法
Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam
计算机科学, 2022, 49(3): 3-10. https://doi.org/10.11896/jsjkx.210700060
[3] 蒋化南, 张帅, 林宇斐, 李豪.
基于MPI的分布式并行Gazebo仿真优化与测试
Simulation Optimization and Testing Based on Gazebo of MPI Distributed Parallelism
计算机科学, 2021, 48(11A): 672-677. https://doi.org/10.11896/jsjkx.210100109
[4] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘.
异构混合并行计算综述
Survey of Heterogeneous Hybrid Parallel Computing
计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045
[5] 郭杰, 高希然, 陈莉, 傅游, 刘颖.
用数据驱动的编程模型并行多重网格应用
Parallelizing Multigrid Application Using Data-driven Programming Model
计算机科学, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093
[6] 刘晓楠, 荆丽娜, 王立新, 王美玲.
基于申威26010处理器的大规模量子傅里叶变换模拟
Large-scale Quantum Fourier Transform Simulation Based on SW26010
计算机科学, 2020, 47(8): 93-97. https://doi.org/10.11896/jsjkx.200300015
[7] 陶小涵, 庞建民, 高伟, 王琦, 姚金阳.
基于SW26010处理器的FT程序的性能优化
Performance Optimization of FT Program Based on SW26010 Processor
计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050
[8] 姚庆, 郑凯, 刘垚, 王肃, 孙军, 徐梦轩.
SOM算法在申威众核上的实现和优化
Implementation and Optimization of SOM Algorithm on Sunway Many-core Processors
计算机科学, 2018, 45(11A): 591-596.
[9] 张帅, 徐顺, 刘倩, 金钟.
基于GPU的分子动力学模拟Cell Verlet算法实现及其并行性能分析
Cell Verlet Algorithm of Molecular Dynamics Simulation Based on GPU and Its Parallel Performance Analysis
计算机科学, 2018, 45(10): 291-294. https://doi.org/10.11896/j.issn.1002-137X.2018.10.054
[10] 周杰,李文敬.
基于三层混合编程模型的Petri网并行算法研究
Research on Parallel Algorithm of Petri Net Based on Three-layer Mixed Programming Model
计算机科学, 2017, 44(Z11): 586-591. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.126
[11] 王明清,李明,张清,张广勇,吴韶华.
基于MIC集群平台的GMRES算法并行加速
Speedup of GMRES Based on MIC Heterogeneous Cluster Platform
计算机科学, 2017, 44(4): 197-201. https://doi.org/10.11896/j.issn.1002-137X.2017.04.043
[12] 唐兵,Laurent BOBELIN,贺海武.
基于MPI和OpenMP混合编程的非负矩阵分解并行算法
Parallel Algorithm of Nonnegative Matrix Factorization Based on Hybrid MPI and OpenMP Programming Model
计算机科学, 2017, 44(3): 51-54. https://doi.org/10.11896/j.issn.1002-137X.2017.03.013
[13] 韦建文,许志耿,王丙强,Simon SEE,林新华.
异构集群上的宏基因组聚类优化
Accelerating Gene Clustering on Heterogeneous Clusters
计算机科学, 2017, 44(3): 20-22. https://doi.org/10.11896/j.issn.1002-137X.2017.03.005
[14] 曾志平,萧海东,张新鹏.
基于国产X86处理器的异构计算平台构建及敏感数据保护
Construction Heterogeneous Computing Platforms and Sensitive Data Protection Based on Domestic X86 Processors
计算机科学, 2015, 42(Z11): 317-322.
[15] 余莹,李肯立,郑光勇.
一种基于GPU集群的深度优先并行算法设计与实现
Implementation of Depth First Search Parallel Algorithm on Cluster of GPUs
计算机科学, 2015, 42(1): 82-85. https://doi.org/10.11896/j.issn.1002-137X.2015.01.019
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!