计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 36-42.doi: 10.11896/jsjkx.201200023
谢景明1, 胡伟方1, 韩林2, 赵荣彩2, 荆丽娜3
XIE Jing-ming1, HU Wei-fang1, HAN Lin2, ZHAO Rong-cai2, JING Li-na3
摘要: “嵩山”超级计算机系统是中国自主研发的新一代异构超级计算机集群,其搭载的CPU和DCU加速器均为我国自主研发。为扩充该平台的科学计算生态,验证量子计算研究在该平台上开展的可行性,文中使用异构编程模型实现了量子傅里叶变换模拟在“嵩山”超级计算机系统上的异构版本,将程序的计算热点部分分配至DCU上运行;然后使用MPI在单计算节点上开启多进程,实现DCU加速器数据传输和计算的并发;最后,通过计算与通信的隐藏避免了DCU在数据传输时处于较长时间的空闲状态。实验首次在超算系统上实现了44 Qubits规模的量子傅里叶变换模拟,结果显示,异构版本的量子傅里叶变换模拟充分利用了DCU加速器计算资源,相较于传统CPU版本,其取得了11.594的加速比,且在集群上具有良好的可拓展性,该方法为其他量子算法在“嵩山”超级计算机系统上的模拟实现以及优化提供了参考。
中图分类号:
[1]GIBNEY E.Quantum computer race intensifies as alternative technology gains steam[J].Nature,2020,587(7834):342-343. [2]CHO A.Google claims quantum computing milestone[J]. Science,2019,365(6460):1364. [3]LLOYD S,GARNERONE S,ZANARDI P.Quantum algorithms for topological and geometric analysis of data[J].Nature Communications,2016,7:10138. [4]ZHOU S S,LOKE T,IZAAC J A,et al.Quantum Fourier transform in computational basis[J].Quantum Information Proces-sing,2017,16(3):1-19. [5] LIU X N,JING L N.Large scale Quantum Fourier Transform Simulation Based on SW26010[J].Computer Science,2020,47(8):93-97. [6]LIU X,YANG Z,YANG Y.A nested split load balancing algorithm for Tianhe No.2[J].Computer Research and Development,2018,55(2):418-425. [7]BAKHODA A,YUAN G L,FUNG W W L,et al.Analyzing CUDA workloads using a detailed gpu simulator[C]//the 2009 IEEE International Symposium on Performance Analysis of Systems and Software.2009. [8]JOHN C.Professional CUDAC Programming[M].Wiley Inter Science,2014. [9]GUPTA S,BABU M R.Generating Performance Analysis of GPU Compared to Single-core and Multi-core CPU for Natural Language Applications[J].International Journal of Advanced Computer Science and Applications,2011,2(5):50-53. [10]CHENG S Y.Research on performance evaluation and optimization technology of heterogeneous(CPU-GPU) computer systems[D].National University of Defense Technology,2011. [11]HASANIJAFARI S,PARSAMEHR S.Solving the Fourier Transform Issue Using Quantum Coherent States[J].International Journal of Theoretical Physics,2019,58(8):2407-2413. [12]LIU X,GUO H,SUN R J,et al.The Characteristic Analysis and Exascale Scalability Research of Large Scale Parallel Applications on Sunway Taihulight Supercomputer[J].Journal of Computer,2018,14(10):2209-2220. [13]YE C,ZHENG S G,LONG C,et al.Quantum Fourier Transform and Phase Estimation in Qudit System[J].Communications in Theoretical Physics,2011,55(5):790-794. [14]COURTNEY D.The guide to CUDA[M].Create Space Independent Publishing Platform,2015. [15]YU Q,CHILDERS B,HUANG L,et al.A quantitative evaluation of unified memory in GPUs[J].The Journal of Supercomputing,2020,76(2):2958-2985. [16]SEREN S,CAN Ö.Integer programming based heterogeneous CPU-GPU cluster schedulers for SLURM resource manager[J].Journal of Computer and System Sciences,2015,81(1):38-56. [17]FORUM M P.MPI:A Message-Passing Interface Standard[J]. Intl J of Supercomputing Applications,1994,8(2):179. [18]WANG Y H,QIAO J Z,LIN S K, et al. An Optimization Stra- tegy for Improving Throughput of GPU Global Memory[J].Journal of Grey System,2018,30(2):42-56. |
[1] | 冯雁, 王蕊聪. 基于量子傅里叶变换求和的量子投票协议 Quantum Voting Protocol Based on Quantum Fourier Transform Summation 计算机科学, 2022, 49(5): 311-317. https://doi.org/10.11896/jsjkx.210300058 |
[2] | 刘江, 刘文博, 张矩. OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法 Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam 计算机科学, 2022, 49(3): 3-10. https://doi.org/10.11896/jsjkx.210700060 |
[3] | 蒋化南, 张帅, 林宇斐, 李豪. 基于MPI的分布式并行Gazebo仿真优化与测试 Simulation Optimization and Testing Based on Gazebo of MPI Distributed Parallelism 计算机科学, 2021, 48(11A): 672-677. https://doi.org/10.11896/jsjkx.210100109 |
[4] | 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘. 异构混合并行计算综述 Survey of Heterogeneous Hybrid Parallel Computing 计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045 |
[5] | 郭杰, 高希然, 陈莉, 傅游, 刘颖. 用数据驱动的编程模型并行多重网格应用 Parallelizing Multigrid Application Using Data-driven Programming Model 计算机科学, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093 |
[6] | 刘晓楠, 荆丽娜, 王立新, 王美玲. 基于申威26010处理器的大规模量子傅里叶变换模拟 Large-scale Quantum Fourier Transform Simulation Based on SW26010 计算机科学, 2020, 47(8): 93-97. https://doi.org/10.11896/jsjkx.200300015 |
[7] | 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化 Performance Optimization of FT Program Based on SW26010 Processor 计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050 |
[8] | 姚庆, 郑凯, 刘垚, 王肃, 孙军, 徐梦轩. SOM算法在申威众核上的实现和优化 Implementation and Optimization of SOM Algorithm on Sunway Many-core Processors 计算机科学, 2018, 45(11A): 591-596. |
[9] | 张帅, 徐顺, 刘倩, 金钟. 基于GPU的分子动力学模拟Cell Verlet算法实现及其并行性能分析 Cell Verlet Algorithm of Molecular Dynamics Simulation Based on GPU and Its Parallel Performance Analysis 计算机科学, 2018, 45(10): 291-294. https://doi.org/10.11896/j.issn.1002-137X.2018.10.054 |
[10] | 周杰,李文敬. 基于三层混合编程模型的Petri网并行算法研究 Research on Parallel Algorithm of Petri Net Based on Three-layer Mixed Programming Model 计算机科学, 2017, 44(Z11): 586-591. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.126 |
[11] | 王明清,李明,张清,张广勇,吴韶华. 基于MIC集群平台的GMRES算法并行加速 Speedup of GMRES Based on MIC Heterogeneous Cluster Platform 计算机科学, 2017, 44(4): 197-201. https://doi.org/10.11896/j.issn.1002-137X.2017.04.043 |
[12] | 唐兵,Laurent BOBELIN,贺海武. 基于MPI和OpenMP混合编程的非负矩阵分解并行算法 Parallel Algorithm of Nonnegative Matrix Factorization Based on Hybrid MPI and OpenMP Programming Model 计算机科学, 2017, 44(3): 51-54. https://doi.org/10.11896/j.issn.1002-137X.2017.03.013 |
[13] | 韦建文,许志耿,王丙强,Simon SEE,林新华. 异构集群上的宏基因组聚类优化 Accelerating Gene Clustering on Heterogeneous Clusters 计算机科学, 2017, 44(3): 20-22. https://doi.org/10.11896/j.issn.1002-137X.2017.03.005 |
[14] | 曾志平,萧海东,张新鹏. 基于国产X86处理器的异构计算平台构建及敏感数据保护 Construction Heterogeneous Computing Platforms and Sensitive Data Protection Based on Domestic X86 Processors 计算机科学, 2015, 42(Z11): 317-322. |
[15] | 余莹,李肯立,郑光勇. 一种基于GPU集群的深度优先并行算法设计与实现 Implementation of Depth First Search Parallel Algorithm on Cluster of GPUs 计算机科学, 2015, 42(1): 82-85. https://doi.org/10.11896/j.issn.1002-137X.2015.01.019 |
|