Computer Science ›› 2022, Vol. 49 ›› Issue (10): 36-43.doi: 10.11896/jsjkx.220100129

• High Perfonnance Computing • Previous Articles     Next Articles

“AI+HPC”-based Time Prediction for the First Principle Calculations and Its Applications in Biomed Community

LI Zhi-ying1,2, MA Shuo1,2, ZHOU Chao1,2, MA Ying-jin1, LIU Qian1, JIN Zhong1   

  1. 1 Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083,China
    2 University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-01-14 Revised:2022-05-10 Online:2022-10-15 Published:2022-10-13
  • About author:LI Zhi-ying,born in 1997,postgra-duate,is a member of China Computer Federation.Her main research interests include machine learning,load balancing,and first-principles calculation.
    JIN Zhong,born in 1974,Ph.D,resear-cher,is a member of China Computer Federation.His main research interests include quantum chemical calculation,biomed community,and parallel computing.
  • Supported by:
    National Key Research and Development Program of China(2020YFB0204802),National Natural Science Foundation of China(22173114),Youth Innovation Promotion Association of CAS(2022168) and GHfund B(202107020447).

Abstract: In the commonly used first-principles methods,density functional theory(DFT) has the characteristics of low scale and high accuracy,so it has been more and more widely used in the fields of chemistry,biology,medicine and so on.However,in practical applications,its relatively high computational cost has posed new challenges to the decision-making on calculation parameters for users and the assignment of tasks for the computing centers.We have recently developed a time prediction system for DFT calculations based on machine learning technique,which can predict the actual computational cost before calculations.The mean relative errors are normally less than 0.15,so that it meets the prediction accuracy requirements in actual scenarios.In this work,we further promote and improve the prediction system,providing multi-GPU parallel computing functions and modular additions to the machine learning models;combined it with the biomed community to realize real-time display of the computing tasks submitted to the platform,which will be convenient for users to coordinate;an intelligent load balancing module is developed,which can improve the efficiency of first-principles calculations for the super-large molecules and cluster systems.These efforts improve the practicalities of the forecasting system,and the preliminary applications are reported in both the community platform and parallel computing.

Key words: Density functional theory, High-performance computing, Community services, Machine learning, Load balancing

CLC Number: 

  • TP391
[1]QIAN D P.Building the New Generation of Computing Infra-structure Supporting Technology Innovation[J].Frontiers of Data and Computing,2020,2(1):1-17.
[2]ZHANG Y Q,YUAN L,YUAN G X,et al.State-of-Art Anal-ysis and Perspectives of China HPC Development in 2019[J].Frontiers of Data and Computing,2020,2(1):18-26.
[3]WHITFIELD J D,LOVE P J,ASPURU-GUZIK A.Computa-tional complexity in electronic structure[J].Physical Chemistry Chemical Physics,2013,15(2):397-411.
[4]CSCS.CSCS annual report 2019[R/OL].Swiss National Supercomputing Centre,2020:7.https://www.cscs.ch/publications/annual-reports/cscs-annual-report-2019/.
[5]HELGAKER T,JORGENSEN P,OLSEN J.Molecular Elec-tronic-Structure Theory[M].John Wiley & Sons Ltd.,2000.
[6]PAPAY J,ATHERTON T J,ZEMERLY M,et al.Performance prediction of parallel self-consistent field computation[J].Parallel Algorithms and Applications,1996,10 (1/2):127-143.
[7]MNISZEWSKI S M,JUNGHANS C,VOTER A F,et al.Dis-crete Event-based Performance Prediction for Temperature Accelerated Dynamics[C]//APS March Meeting Abstracts.2014:M27.012.
[8]HEINEN S,SCHWILK M,VON RUDORFF G F,et al.Ma-chine learning the computational cost of quantum chemistry[J].Machine Learning:Science and Technology,2020,1(2):025002.
[9]WEI J W,WANG L Z,WANG Y C.Predicting job runtime via machine learning:using Gaussian09 as an example[C]//Proceedings of 2021 CCF National Annual Conference on High Performance Computing.Zhuhai:China Computer Federation,2021:519-527.
[10]ZHOU L F,YANG W X,HAN Y G,et al.A Method for Predicting Job Running Time Based on Hierarchical Clustering of Job Names[C]//Proceedings of 2021 CCF National Annual Confe-rence on High Performance Computing.Zhuhai:China Computer Federation,2021:287-297.
[11]HU Y H,PAN E Y,MAO J F,et al.Research on Optimal Allocation strategy of Cluster resources Based on LSTM Time prediction network and Genetic Algorithm[C]//Proceedings of 2021 CCF National Annual Conference on High Performance Computing.Zhuhai:China Computer Federation,2021:85-95.
[12]MA S,MA Y,ZHANG B,et al.Forecasting System of Computational Time of DFT/TDDFT Calculations under the Multiverse Ansatz via Machine Learning and Cheminformatics[J].ACSOmega,2021,6(3):2001-2024.
[13]GORDON M S,FEDOROV D G,PRUITT S R,et al.Fragmentation Methods:A Route to Accurate Calculations on Large Systems[J].Chemical Reviews,2012,112(1):632-672.
[14]COLLINS M A,BETTENS R P A.Energy-Based MolecularFragmentation Methods[J].Chemical Reviews,2015,115(12):5607-5642.
[15]ZHANG B,MA Y,JIN X,et al.GridMol2.0:Implementationand application of linear-scale quantum mechanics methods and molecular visualization[J].International Journal of Quantum Chemistry,2020,120(23):e26402.
[16]DE JONG W A,BYLASKA E,GOVIND N,et al.Utilizing high performance computing for chemistry:parallel computational chemistry[J].Physical Chemistry Chemical Physics,2010,12(26):6896-6920.
[17]VON ARNIM M,AHLRICHS R.Performance of parallel TURBOMOLE for density functional calculations[J].Journal of Computational Chemistry,1998,19(15):1746-1757.
[18]BRODE S,HORN H,EHRIG M,et al.Parallel direct SCF and gradient program for workstation clusters[J].Journal of Computational Chemistry,1993,14(10):1142-1148.
[19]LI Y,WRINN M,NEWSAM J,et al.Parallel implementation of a mesh-based density functional electronic structure code[J].Journal of Computational Chemistry,1995,16(2):226-234.
[20]CHASMAN D,BEACHY M D,WANG L M,et al.Parallelpseudospectral electronic structure:I.Hartree-Fock calculations[J].Journal of Computational Chemistry,1998,19(9):1017-1029.
[21]GUERRA C F,VISSER O,SNIJDERS J,et al.Parallellisation of the Amsterdam Density Functional Programme[M]//Host Publication.STEF,1995:303-395.
[22]COLVIN M E,JANSSEN C L,WHITESIDE R A,et al.Parallel direct SCF for large-scale calculations[J].Theoretica Chimica Acta,1993,84(4/5):301-314.
[23]MÁRQUEZ A M,OVIEDO J,SANZ J F,et al.Parallel Computation of second derivatives of RHF energy on distributed me-mory computers[J].Journal of Computational Chemistry,1997,18(2):159-168.
[24]NEESE F,WENNMOHS F,HANSEN A,et al.Efficient,ap-proximate and parallel Hartree-Fock and hybrid DFT calculations.A ‘chain-of-spheres' algorithm for the Hartree-Fock exchange[J].Journal of Chemical Physics,2009,356(1/2/3):98-109.
[25]GAN C K,CHALLACOMBE M.Linear scaling computation of the Fock matrix.VI.Data parallel computation of the exchange-correlation matrix[J].Journal of Chemical Physics,2003,118(20):9128-9135.
[26]JANSSEN C L,NIELSEN I M.Parallel computing in quantum chemistry[M].CRC press:2008.
[27]FERRIGHI L,FREDIANI L,FOSSGAARD E,et al.Parallelization of the integral equation formulation of the polarizable continuum model for higher-order response functions[J].Journal of Chemical Physics,2006,125(15):154112.
[28]YOSHIHIRO T,SATO F,KASHIWAGI H.Distributed parallel processing by using the object-oriented technology in ProteinDF program for all-electron calculations on proteins[J].Chemical Physics Letters,2001,346(3/4):313-321.
[29]BAKER J,PULAY P.An efficient parallel algorithm for the calculation of unrestricted canonical MP2 energies[J].Journal of Computational Chemistry,2011,32(15):3304-3312.
[30]KUMAR V,GRAMA A Y,VEMPATY N R.Scalable load ba-lancing techniques for parallel computers[J].Journal of Parallel and Distributed Computing,1994,22 (1):60-79.
[31]DINAN J,LARKINS D B,SADAYAPPAN P,et al.In Scalable work stealing,Proceedings of the Conference on High Performance Computing Networking[C]//Storage and Analysis.IEEE:2009:1-11.
[32]NIKODEM A,MATVEEV A V,SOINI T M,et al.Load balancing by work-stealing in quantum chemistry calculations:Application to hybrid density functional methods[J].International Journal of Quantum Chemistry,2014,114 (12):813-822.
[33]MA Y.A Forecasting System under MWI,ML,and Chem-informatics[OL].https://github.com/yingjin-ma/Fcst_sys_public.
[34]CHEN G,CHEN P,HSIEH C Y,et al.Alchemy:A quantumchemistry dataset for benchmarking ai models[J].arXiv:1906.09427,2019.
[35]PERDEW J P,SCHMIDT K.Jacob's ladder of density functio-nal approximations for the exchange-correlation energy[C]//AIP Conference Proceedings.American Institute of Physics,2001:1-20.
[36]SCHÜTT K T,KINDERMANS P J,SAUCEDA H E,et al.Schnet:A continuous-filter convolutional neural network for modeling quantum interactions[J/OL].Advances in Neural Information Processing Systems,2017,30.https://koreauniv.pure.elsevier.com/en/publications/schnet-a-continuous-filter-convolutional-neural-network-for-model.
[37]GGDH.Gaussian Software Parallel Efficiency Retest(2017.1)-Hardware Configuration and Procurement-Computational Chemistry Commune [EB/OL].(2017-01-02)[2021-03-14].http://bbs.keinsci.com/thread-4841-1-1.html.
[38]DENG L,YU D.Deep Learning-Methods and Applications[J].Foundations and Trends in Signal Processing,2014,7(3/4),197-387.
[39]PRITCHARD B P,ALTARAWY D,DIDIER B,et al.New basis set exchange:An open,up-to-date resource for the molecular sciences community[J].Journal of Chemical Information and Modeling,2019,59(11):4814-4820.
[40]MARQUES M A,OLIVEIRA M J,BURNUS T.Libxc:A library of exchange and correlation functionals for density functional theory[J].Computer Physics Communications,2012,183(10):2272-2281.
[41]O'BOYLE N M,BANCK M,JAMES C A,et al.Open Babel:An open chemical toolbox[J].Journal of Cheminformatics,2011,3(1):1-14.
[42]The RDKit Documentation-The RDKit 2019.03.1 documentation[OL].http://www.rdkit.org/docs/index.html.
[43]ZHANG Y,SUO B,WANG Z,et al.BDF:A relativistic electronic structure program package[J].The Journal of Chemical Physics,2020,152(6):064113.
[44]STROUT D L,SCUSERIA G E.A quantitative study of thescaling properties of the Hartree-Fock method[J].The Journal of Chemical Physics 1995,102:8448-8452.
[1] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[4] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[5] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[6] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[7] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[8] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[9] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[10] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[11] TIAN Zhen-zhen, JIANG Wei, ZHENG Bing-xu, MENG Li-min. Load Balancing Optimization Scheduling Algorithm Based on Server Cluster [J]. Computer Science, 2022, 49(6A): 639-644.
[12] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[13] XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.
[14] GAO Jie, LIU Sha, HUANG Ze-qiang, ZHENG Tian-yu, LIU Xin, QI Feng-bin. Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor [J]. Computer Science, 2022, 49(5): 355-362.
[15] LI Hao-dong, HU Jie, FAN Qin-qin. Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application [J]. Computer Science, 2022, 49(5): 212-220.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!