计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 1-9.doi: 10.11896/jsjkx.220700162
罗海文, 吴扬俊, 商红慧
LUO Haiwen, WU Yangjun, SHANG Honghui
摘要: 基于量子力学的密度泛函微扰理论(DFPT)可以用来计算分子和材料的多种物理化学性质,目前被广泛应用于新材料等领域的研究中;同时,异构众核处理器架构逐渐成为超算的主流。因此,针对异构众核处理器重新设计和优化DFPT程序以提升其计算效率,对物理化学性质的计算及其科学应用具有重要意义。文中对DFPT中一阶响应密度和一阶响应哈密顿矩阵的计算针对众核处理器体系结构进行了优化,并在新一代神威处理器上进行了验证。优化技术包括循环分块、离散访存处理和协同规约。其中,循环分块对任务进行划分从而由众核并行地执行;离散访存处理将离散访存转换为更高效的连续访存;协同规约解决了写冲突问题。实验结果表明,在一个核组上,优化后的程序性能较优化前提高了8.2~74.4倍,并且具有良好的强可扩展性和弱可扩展性。
中图分类号:
[1]GONZE X.First-principles responses of solids to atomic dis-placements and homogeneous electric fields:Implementation of a conjugate-gradient algorithm [J].Physical Review B,1997,55(16):10337-10354. [2]GONZE X,LEE C.Dynamical matrices,Born effective charges,dielectric permittivity tensors,and interatomic force constants from density-functional perturbation theory [J].Physical Review B,1997,55(16):10355-10368. [3]VEITHEN M,GONZE X,GHOSEZ P.Nonlinear optical susceptibilities,Raman efficiencies,and electro-optic tensors from first-principles density functional perturbation theory [J].Phy-sical Review B,2005,71(12):125107. [4]SHANG H,CARBOGNO C,RINKE P,et al.Lattice dynamics calculations based on density-functional perturbation theory in real space [J].Computer Physics Communications,2017,215:26-46. [5]BLUM V,GEHRKE R,HANKE F,et al.Ab initio molecularsimulations with numeric atom-centered orbitals [J].Computer Physics Communications,2009,180(11):2175-2196. [6]REN X G,RINKE P,BLUM V,et al.Resolution-of-identity approach to Hartree-Fock,hybrid density functionals,RPA,MP2 and GW with numeric atom-centered orbital basis functions [J].New Journal of Physics,2012,14(5):053020. [7]HOHENBERG P,KOHN W.Inhomogeneous Electron Gas [J].Physical Review B,1964,136(3B):B864-B871. [8]SHANG H,LI F,ZHANG Y Q,et al.Accelerating all-electron ab initio simulation of raman spectra for biological systems [C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New York,NY,USA:ACM,2021:1-15. [9]LEBEDEV V I.Quadratures on a sphere [J].USSR Computational Mathematics and Mathematical Physics,1976,16(2):10-24. [10]HAVU V,BLUM V,HAVU P,et al.Efficient O(N) integra-tion for all-electron electronic structure calculation using nume-ric basis functions [J].Journal of Computational Physics,2009,228(22):8367-8379. [11]WOLF M,LAM M S.A data locality optimizing algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.New York,NY,USA:ACM,1991:30-44. [12]COLEMAN S,MCKINLEY K S.Tile size selection using cache organization and data layout[C]//Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation.New York,NY,USA:ACM,1995:279-290. [13]MEHTA S,GARG R,TRIVEDI N,et al.Leveraging prefe-tching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercomputing.New York,NY,USA:ACM,2016:1-12. [14]WANG X L,LIU W F,XUE W,et al.swSpTRSV:a fast sparse triangular solve with sparse level tile layout on sunway architectures[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York,NY,USA:ACM,2018:338-353. [15]DUAN X H,GAO P,ZHANG T J,et al.Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:148-159. [16]SHANG H,LI F,ZHANG Y Q,et al.Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New York,NY,USA:ACM,2021:1-13. |
|