Computer Science ›› 2023, Vol. 50 ›› Issue (6): 1-9.doi: 10.11896/jsjkx.220700162

• High Performance Computing • Previous Articles     Next Articles

Many-core Optimization Method for the Calculation of Ab initio Polarizability

LUO Haiwen, WU Yangjun, SHANG Honghui   

  1. State Key Laboratory of Processors,Institute of Computing Technology,Chinese Acadamy of Science,Beijing 100190,China
  • Received:2022-07-18 Revised:2022-11-26 Online:2023-06-15 Published:2023-06-06
  • About author:LUO Haiwen,born in 1998,postgra-duate,is a member of China Computer Federation.His main research interests include high performance computing and parallel software.SHANG Honghui,born in 1984,Ph.D,associate professor.Her main research interests include the development of the first-priciples methods and their applications on the high-performance computer systems.
  • Supported by:
    National Key Research and Development Program of China(2020YFB1709500) and National Natural Science Foundation of China(22003073).

Abstract: Density-functional perturbation theory(DFPT) based on quantum mechanics can be used to calculate a variety of physicochemical properties of molecules and materials and is now widely used in the research of new materials.Meanwhile,heteroge-neous many-core processor architectures are becoming the mainstream of supercomputing.Therefore,redesigning and optimizing DFPT programs for heterogeneous many-core processors to improve their computational efficiency is of great importance for the computation of physicochemical properties and their scientific applications.In this work,the computation of first-order response density and first-order response Hamiltonian matrix in DFPT is optimized for many-core processor architecture and verified on the new generation Sunway processors.Optimization techniques include loop tiling,discrete memory access processing and colla-borative reduction.Among them,loop tiling divides tasks so that they can be executed by many cores in parallel;discrete memory access processing converts discrete accesses into more efficient continuous memory accesses;collaborative reduction solves the write conflict problem.Experimental results show that the performance of the optimized program improves by 8.2 to 74.4 times over the pre-optimization program on one core group,and has good strong scalability and weak scalability.

Key words: Density-functional perturbation theory, First-principle calculation, High-performance computing, New generation Sunway heterogeneous many-core processor

CLC Number: 

  • TP391
[1]GONZE X.First-principles responses of solids to atomic dis-placements and homogeneous electric fields:Implementation of a conjugate-gradient algorithm [J].Physical Review B,1997,55(16):10337-10354.
[2]GONZE X,LEE C.Dynamical matrices,Born effective charges,dielectric permittivity tensors,and interatomic force constants from density-functional perturbation theory [J].Physical Review B,1997,55(16):10355-10368.
[3]VEITHEN M,GONZE X,GHOSEZ P.Nonlinear optical susceptibilities,Raman efficiencies,and electro-optic tensors from first-principles density functional perturbation theory [J].Phy-sical Review B,2005,71(12):125107.
[4]SHANG H,CARBOGNO C,RINKE P,et al.Lattice dynamics calculations based on density-functional perturbation theory in real space [J].Computer Physics Communications,2017,215:26-46.
[5]BLUM V,GEHRKE R,HANKE F,et al.Ab initio molecularsimulations with numeric atom-centered orbitals [J].Computer Physics Communications,2009,180(11):2175-2196.
[6]REN X G,RINKE P,BLUM V,et al.Resolution-of-identity approach to Hartree-Fock,hybrid density functionals,RPA,MP2 and GW with numeric atom-centered orbital basis functions [J].New Journal of Physics,2012,14(5):053020.
[7]HOHENBERG P,KOHN W.Inhomogeneous Electron Gas [J].Physical Review B,1964,136(3B):B864-B871.
[8]SHANG H,LI F,ZHANG Y Q,et al.Accelerating all-electron ab initio simulation of raman spectra for biological systems [C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New York,NY,USA:ACM,2021:1-15.
[9]LEBEDEV V I.Quadratures on a sphere [J].USSR Computational Mathematics and Mathematical Physics,1976,16(2):10-24.
[10]HAVU V,BLUM V,HAVU P,et al.Efficient O(N) integra-tion for all-electron electronic structure calculation using nume-ric basis functions [J].Journal of Computational Physics,2009,228(22):8367-8379.
[11]WOLF M,LAM M S.A data locality optimizing algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.New York,NY,USA:ACM,1991:30-44.
[12]COLEMAN S,MCKINLEY K S.Tile size selection using cache organization and data layout[C]//Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation.New York,NY,USA:ACM,1995:279-290.
[13]MEHTA S,GARG R,TRIVEDI N,et al.Leveraging prefe-tching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercomputing.New York,NY,USA:ACM,2016:1-12.
[14]WANG X L,LIU W F,XUE W,et al.swSpTRSV:a fast sparse triangular solve with sparse level tile layout on sunway architectures[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York,NY,USA:ACM,2018:338-353.
[15]DUAN X H,GAO P,ZHANG T J,et al.Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:148-159.
[16]SHANG H,LI F,ZHANG Y Q,et al.Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New York,NY,USA:ACM,2021:1-13.
[1] LI Hao-dong, HU Jie, FAN Qin-qin. Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application [J]. Computer Science, 2022, 49(5): 212-220.
[2] LI Zhi-ying, MA Shuo, ZHOU Chao, MA Ying-jin, LIU Qian, JIN Zhong. “AI+HPC”-based Time Prediction for the First Principle Calculations and Its Applications in Biomed Community [J]. Computer Science, 2022, 49(10): 36-43.
[3] HUANG Qiu-lan, LI Hai-bo, SHI Jing-yan, SUN Zhen-yu, WU Wen-jing, CHENG Yao-dong and CHENG Zhen-jing. Openstack-based Virtualized Computing Cluster and Application for High Energy Physics [J]. Computer Science, 2017, 44(10): 59-63.
[4] LAI Ji-bao,MENG Yuan,YU Tao,WANG Yu-jing,LIN Ying-hao and LV Tian-ran. Research on Cubic Convolution Interpolation Parallel Algorithm Based on Dual-GPU [J]. Computer Science, 2013, 40(8): 24-27.
[5] . Research of Job Scheduling Strategy of High-performance Computer Based on Adaptive Power Management [J]. Computer Science, 2012, 39(10): 313-317.
[6] . [J]. Computer Science, 2009, 36(3): 21-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!