计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 36-42.doi: 10.11896/jsjkx.201200023

• 计算机体系结构* 上一篇    下一篇

基于“嵩山”超级计算机系统的量子傅里叶变换模拟

谢景明1, 胡伟方1, 韩林2, 赵荣彩2, 荆丽娜3   

  1. 1 郑州大学信息工程学院 郑州450000
    2 郑州大学河南省超算中心 郑州450000
    3 郑州大学中原网络安全研究院 郑州450001
  • 收稿日期:2020-12-02 修回日期:2021-03-19 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 韩林(strollerlin@163.com)
  • 作者简介:624797981@qq.com
  • 基金资助:
    国家重点研发计划(2018YFB0505000)

Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System

XIE Jing-ming1, HU Wei-fang1, HAN Lin2, ZHAO Rong-cai2, JING Li-na3   

  1. 1 Information Engineering Institute,Zhengzhou University,Zhengzhou 450000,China
    2 Henan Supercomputing Center,Zhengzhou University,Zhengzhou 450000,China
    3 Zhong Yuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450001,China
  • Received:2020-12-02 Revised:2021-03-19 Online:2021-12-15 Published:2021-11-26
  • About author:XIE Jing-ming,born in 1995,postgra-duate.His main research interests include high-performance parallel computation and heterogeneous computation.
    HAN Lin,born in 1978,Ph.D,associate professor,master instructor,is a member of China Computer Federation.His main research interests include high-performance computing and compilation optimization.
  • Supported by:
    National Key R & D Program of China(2018YFB0505000).

摘要: “嵩山”超级计算机系统是中国自主研发的新一代异构超级计算机集群,其搭载的CPU和DCU加速器均为我国自主研发。为扩充该平台的科学计算生态,验证量子计算研究在该平台上开展的可行性,文中使用异构编程模型实现了量子傅里叶变换模拟在“嵩山”超级计算机系统上的异构版本,将程序的计算热点部分分配至DCU上运行;然后使用MPI在单计算节点上开启多进程,实现DCU加速器数据传输和计算的并发;最后,通过计算与通信的隐藏避免了DCU在数据传输时处于较长时间的空闲状态。实验首次在超算系统上实现了44 Qubits规模的量子傅里叶变换模拟,结果显示,异构版本的量子傅里叶变换模拟充分利用了DCU加速器计算资源,相较于传统CPU版本,其取得了11.594的加速比,且在集群上具有良好的可拓展性,该方法为其他量子算法在“嵩山”超级计算机系统上的模拟实现以及优化提供了参考。

关键词: 异构计算, 量子傅里叶变换, DCU加速器, HIP-C, MPI, 通信隐藏

Abstract: The “Songshan” supercomputer system is a new generation of heterogeneous supercomputer cluster independently developed by China.The CPU and DCU accelerators it carries are all independently developed by my country.In order to expand the scientific computing ecology of the platform and verify the feasibility of quantum computing research on this platform,the paper uses a heterogeneous programming model to implement a heterogeneous version of the quantum Fourier transform simulation on the “Songshan” supercomputer system.The computing hotspots of the program are allocated to run on the DCU;then MPI is used to enable multiple processes on a single computing node to realize the concurrent data transmission and calculation of the DCU accelerator;finally,the hiding of computing and communication prevents the DCU from being in the middle of data transmission.The experiment implements a 44 Qubits-scale quantum Fourier transform simulation on a supercomputing system for the first time.The results show that the heterogeneous version of the quantum Fourier transform module makes full use of the computing resources of the DCU accelerator and achieves 11.594 compared to the traditional CPU version.The speedup ratio is high,and it has good scalability on the cluster.This implementation method provides a reference for the simulation implementation and optimization of other quantum algorithms on the “Songshan” supercomputer system.

Key words: Heterogeneous computing, Quantum Fourier transform, DCU accelerator, HIP-C, MPI, Communication hiding

中图分类号: 

  • TP387
[1]GIBNEY E.Quantum computer race intensifies as alternative technology gains steam[J].Nature,2020,587(7834):342-343.
[2]CHO A.Google claims quantum computing milestone[J]. Science,2019,365(6460):1364.
[3]LLOYD S,GARNERONE S,ZANARDI P.Quantum algorithms for topological and geometric analysis of data[J].Nature Communications,2016,7:10138.
[4]ZHOU S S,LOKE T,IZAAC J A,et al.Quantum Fourier transform in computational basis[J].Quantum Information Proces-sing,2017,16(3):1-19.
[5] LIU X N,JING L N.Large scale Quantum Fourier Transform Simulation Based on SW26010[J].Computer Science,2020,47(8):93-97.
[6]LIU X,YANG Z,YANG Y.A nested split load balancing algorithm for Tianhe No.2[J].Computer Research and Development,2018,55(2):418-425.
[7]BAKHODA A,YUAN G L,FUNG W W L,et al.Analyzing CUDA workloads using a detailed gpu simulator[C]//the 2009 IEEE International Symposium on Performance Analysis of Systems and Software.2009.
[8]JOHN C.Professional CUDAC Programming[M].Wiley Inter Science,2014.
[9]GUPTA S,BABU M R.Generating Performance Analysis of GPU Compared to Single-core and Multi-core CPU for Natural Language Applications[J].International Journal of Advanced Computer Science and Applications,2011,2(5):50-53.
[10]CHENG S Y.Research on performance evaluation and optimization technology of heterogeneous(CPU-GPU) computer systems[D].National University of Defense Technology,2011.
[11]HASANIJAFARI S,PARSAMEHR S.Solving the Fourier Transform Issue Using Quantum Coherent States[J].International Journal of Theoretical Physics,2019,58(8):2407-2413.
[12]LIU X,GUO H,SUN R J,et al.The Characteristic Analysis and Exascale Scalability Research of Large Scale Parallel Applications on Sunway Taihulight Supercomputer[J].Journal of Computer,2018,14(10):2209-2220.
[13]YE C,ZHENG S G,LONG C,et al.Quantum Fourier Transform and Phase Estimation in Qudit System[J].Communications in Theoretical Physics,2011,55(5):790-794.
[14]COURTNEY D.The guide to CUDA[M].Create Space Independent Publishing Platform,2015.
[15]YU Q,CHILDERS B,HUANG L,et al.A quantitative evaluation of unified memory in GPUs[J].The Journal of Supercomputing,2020,76(2):2958-2985.
[16]SEREN S,CAN Ö.Integer programming based heterogeneous CPU-GPU cluster schedulers for SLURM resource manager[J].Journal of Computer and System Sciences,2015,81(1):38-56.
[17]FORUM M P.MPI:A Message-Passing Interface Standard[J]. Intl J of Supercomputing Applications,1994,8(2):179.
[18]WANG Y H,QIAO J Z,LIN S K, et al. An Optimization Stra- tegy for Improving Throughput of GPU Global Memory[J].Journal of Grey System,2018,30(2):42-56.
[1] 蒋化南, 张帅, 林宇斐, 李豪. 基于MPI的分布式并行Gazebo仿真优化与测试[J]. 计算机科学, 2021, 48(11A): 672-677.
[2] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘. 异构混合并行计算综述[J]. 计算机科学, 2020, 47(8): 5-16.
[3] 郭杰, 高希然, 陈莉, 傅游, 刘颖. 用数据驱动的编程模型并行多重网格应用[J]. 计算机科学, 2020, 47(8): 32-40.
[4] 刘晓楠, 荆丽娜, 王立新, 王美玲. 基于申威26010处理器的大规模量子傅里叶变换模拟[J]. 计算机科学, 2020, 47(8): 93-97.
[5] 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化[J]. 计算机科学, 2019, 46(4): 321-328.
[6] 姚庆, 郑凯, 刘垚, 王肃, 孙军, 徐梦轩. SOM算法在申威众核上的实现和优化[J]. 计算机科学, 2018, 45(11A): 591-596.
[7] 张帅, 徐顺, 刘倩, 金钟. 基于GPU的分子动力学模拟Cell Verlet算法实现及其并行性能分析[J]. 计算机科学, 2018, 45(10): 291-294.
[8] 周杰,李文敬. 基于三层混合编程模型的Petri网并行算法研究[J]. 计算机科学, 2017, 44(Z11): 586-591.
[9] 王明清,李明,张清,张广勇,吴韶华. 基于MIC集群平台的GMRES算法并行加速[J]. 计算机科学, 2017, 44(4): 197-201.
[10] 唐兵,Laurent BOBELIN,贺海武. 基于MPI和OpenMP混合编程的非负矩阵分解并行算法[J]. 计算机科学, 2017, 44(3): 51-54.
[11] 韦建文,许志耿,王丙强,Simon SEE,林新华. 异构集群上的宏基因组聚类优化[J]. 计算机科学, 2017, 44(3): 20-22.
[12] 曾志平,萧海东,张新鹏. 基于国产X86处理器的异构计算平台构建及敏感数据保护[J]. 计算机科学, 2015, 42(Z11): 317-322.
[13] 余莹,李肯立,郑光勇. 一种基于GPU集群的深度优先并行算法设计与实现[J]. 计算机科学, 2015, 42(1): 82-85.
[14] 王文义,王春霞,王杰. 基于CMP多核集群的混合并行编程技术研究[J]. 计算机科学, 2014, 41(2): 19-22.
[15] 万金梁,宋金宝,叶龙,李淑红. 实时图像纹理替换算法[J]. 计算机科学, 2013, 40(9): 288-292.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 曾健荣, 张仰森, 郑佳, 黄改娟, 陈若愚. 面向多数据源的网络爬虫实现技术及应用[J]. 计算机科学, 2019, 46(5): 304 -309 .
[2] 王文刀, 王润泽, 魏鑫磊, 漆云亮, 马义德. 基于堆叠式双向LSTM的心电图自动识别算法[J]. 计算机科学, 2020, 47(7): 118 -124 .
[3] 廖义辉, 杨恩君, 刘安东, 俞立. 基于改进变邻域搜索的数控裁床路径优化[J]. 计算机科学, 2020, 47(10): 233 -239 .
[4] 周凯, 任怡, 汪哲, 管剑波, 张芳, 赵言亢. 基于主题模型的Ubuntu操作系统缺陷报告的分类及分析[J]. 计算机科学, 2020, 47(12): 35 -41 .
[5] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[6] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[7] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[8] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[9] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[10] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .