计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 1-9.doi: 10.11896/jsjkx.220700162

• 高性能计算 • 上一篇    下一篇

第一性原理极化率计算中的众核优化方法研究

罗海文, 吴扬俊, 商红慧   

  1. 中国科学院计算技术研究所处理器芯片全国重点实验室 北京 100190
  • 收稿日期:2022-07-18 修回日期:2022-11-26 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 商红慧(shanghonghui@ict.ac.cn)
  • 作者简介:(luohaiwen20g@ict.ac.cn)
  • 基金资助:
    国家重点研发计划(2020YFB1709500);国家自然科学基金(22003073)

Many-core Optimization Method for the Calculation of Ab initio Polarizability

LUO Haiwen, WU Yangjun, SHANG Honghui   

  1. State Key Laboratory of Processors,Institute of Computing Technology,Chinese Acadamy of Science,Beijing 100190,China
  • Received:2022-07-18 Revised:2022-11-26 Online:2023-06-15 Published:2023-06-06
  • About author:LUO Haiwen,born in 1998,postgra-duate,is a member of China Computer Federation.His main research interests include high performance computing and parallel software.SHANG Honghui,born in 1984,Ph.D,associate professor.Her main research interests include the development of the first-priciples methods and their applications on the high-performance computer systems.
  • Supported by:
    National Key Research and Development Program of China(2020YFB1709500) and National Natural Science Foundation of China(22003073).

摘要: 基于量子力学的密度泛函微扰理论(DFPT)可以用来计算分子和材料的多种物理化学性质,目前被广泛应用于新材料等领域的研究中;同时,异构众核处理器架构逐渐成为超算的主流。因此,针对异构众核处理器重新设计和优化DFPT程序以提升其计算效率,对物理化学性质的计算及其科学应用具有重要意义。文中对DFPT中一阶响应密度和一阶响应哈密顿矩阵的计算针对众核处理器体系结构进行了优化,并在新一代神威处理器上进行了验证。优化技术包括循环分块、离散访存处理和协同规约。其中,循环分块对任务进行划分从而由众核并行地执行;离散访存处理将离散访存转换为更高效的连续访存;协同规约解决了写冲突问题。实验结果表明,在一个核组上,优化后的程序性能较优化前提高了8.2~74.4倍,并且具有良好的强可扩展性和弱可扩展性。

关键词: 密度函数微扰理论, 第一性原理计算, 高性能计算, 新一代神威异构众核处理器

Abstract: Density-functional perturbation theory(DFPT) based on quantum mechanics can be used to calculate a variety of physicochemical properties of molecules and materials and is now widely used in the research of new materials.Meanwhile,heteroge-neous many-core processor architectures are becoming the mainstream of supercomputing.Therefore,redesigning and optimizing DFPT programs for heterogeneous many-core processors to improve their computational efficiency is of great importance for the computation of physicochemical properties and their scientific applications.In this work,the computation of first-order response density and first-order response Hamiltonian matrix in DFPT is optimized for many-core processor architecture and verified on the new generation Sunway processors.Optimization techniques include loop tiling,discrete memory access processing and colla-borative reduction.Among them,loop tiling divides tasks so that they can be executed by many cores in parallel;discrete memory access processing converts discrete accesses into more efficient continuous memory accesses;collaborative reduction solves the write conflict problem.Experimental results show that the performance of the optimized program improves by 8.2 to 74.4 times over the pre-optimization program on one core group,and has good strong scalability and weak scalability.

Key words: Density-functional perturbation theory, First-principle calculation, High-performance computing, New generation Sunway heterogeneous many-core processor

中图分类号: 

  • TP391
[1]GONZE X.First-principles responses of solids to atomic dis-placements and homogeneous electric fields:Implementation of a conjugate-gradient algorithm [J].Physical Review B,1997,55(16):10337-10354.
[2]GONZE X,LEE C.Dynamical matrices,Born effective charges,dielectric permittivity tensors,and interatomic force constants from density-functional perturbation theory [J].Physical Review B,1997,55(16):10355-10368.
[3]VEITHEN M,GONZE X,GHOSEZ P.Nonlinear optical susceptibilities,Raman efficiencies,and electro-optic tensors from first-principles density functional perturbation theory [J].Phy-sical Review B,2005,71(12):125107.
[4]SHANG H,CARBOGNO C,RINKE P,et al.Lattice dynamics calculations based on density-functional perturbation theory in real space [J].Computer Physics Communications,2017,215:26-46.
[5]BLUM V,GEHRKE R,HANKE F,et al.Ab initio molecularsimulations with numeric atom-centered orbitals [J].Computer Physics Communications,2009,180(11):2175-2196.
[6]REN X G,RINKE P,BLUM V,et al.Resolution-of-identity approach to Hartree-Fock,hybrid density functionals,RPA,MP2 and GW with numeric atom-centered orbital basis functions [J].New Journal of Physics,2012,14(5):053020.
[7]HOHENBERG P,KOHN W.Inhomogeneous Electron Gas [J].Physical Review B,1964,136(3B):B864-B871.
[8]SHANG H,LI F,ZHANG Y Q,et al.Accelerating all-electron ab initio simulation of raman spectra for biological systems [C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New York,NY,USA:ACM,2021:1-15.
[9]LEBEDEV V I.Quadratures on a sphere [J].USSR Computational Mathematics and Mathematical Physics,1976,16(2):10-24.
[10]HAVU V,BLUM V,HAVU P,et al.Efficient O(N) integra-tion for all-electron electronic structure calculation using nume-ric basis functions [J].Journal of Computational Physics,2009,228(22):8367-8379.
[11]WOLF M,LAM M S.A data locality optimizing algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.New York,NY,USA:ACM,1991:30-44.
[12]COLEMAN S,MCKINLEY K S.Tile size selection using cache organization and data layout[C]//Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation.New York,NY,USA:ACM,1995:279-290.
[13]MEHTA S,GARG R,TRIVEDI N,et al.Leveraging prefe-tching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercomputing.New York,NY,USA:ACM,2016:1-12.
[14]WANG X L,LIU W F,XUE W,et al.swSpTRSV:a fast sparse triangular solve with sparse level tile layout on sunway architectures[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York,NY,USA:ACM,2018:338-353.
[15]DUAN X H,GAO P,ZHANG T J,et al.Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:148-159.
[16]SHANG H,LI F,ZHANG Y Q,et al.Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New York,NY,USA:ACM,2021:1-13.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!