计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 36-43.doi: 10.11896/jsjkx.220100129
李治莹1,2, 马硕1,2, 周超1,2, 马英晋1, 刘倩1, 金钟1
LI Zhi-ying1,2, MA Shuo1,2, ZHOU Chao1,2, MA Ying-jin1, LIU Qian1, JIN Zhong1
摘要: 密度泛函方法在常用的第一原理计算方法中有着计算标度低、计算精度高的特点,因此其在化学、生物、医药等领域得到了越来越广泛的应用。然而,在实际应用中,其较为高昂的计算代价对用户计算参数的决策以及计算中心的作业分配都提出了新的挑战。近期开发的基于机器学习的密度泛函计算时间预测系统,能够在算前预测实际的计算开销,预测结果的平均相对误差一般小于0.15,符合实际计算场景下的预测精度要求。文中进一步推进和完善了该预测系统,提供了多GPU并行计算功能、机器学习模型的模块化增补;将其与生物医药社区相结合,实现了对平台计算任务的实时机时显示,方便用户统筹;并基于此开发了智能负载均衡模块,可以提高超大分子及团簇体系的第一性原理并行计算效率。通过多个方面的推进,改善了预测系统的实用性,并在社区平台和并行计算方面得到了初步应用。
中图分类号:
[1]QIAN D P.Building the New Generation of Computing Infra-structure Supporting Technology Innovation[J].Frontiers of Data and Computing,2020,2(1):1-17. [2]ZHANG Y Q,YUAN L,YUAN G X,et al.State-of-Art Anal-ysis and Perspectives of China HPC Development in 2019[J].Frontiers of Data and Computing,2020,2(1):18-26. [3]WHITFIELD J D,LOVE P J,ASPURU-GUZIK A.Computa-tional complexity in electronic structure[J].Physical Chemistry Chemical Physics,2013,15(2):397-411. [4]CSCS.CSCS annual report 2019[R/OL].Swiss National Supercomputing Centre,2020:7.https://www.cscs.ch/publications/annual-reports/cscs-annual-report-2019/. [5]HELGAKER T,JORGENSEN P,OLSEN J.Molecular Elec-tronic-Structure Theory[M].John Wiley & Sons Ltd.,2000. [6]PAPAY J,ATHERTON T J,ZEMERLY M,et al.Performance prediction of parallel self-consistent field computation[J].Parallel Algorithms and Applications,1996,10 (1/2):127-143. [7]MNISZEWSKI S M,JUNGHANS C,VOTER A F,et al.Dis-crete Event-based Performance Prediction for Temperature Accelerated Dynamics[C]//APS March Meeting Abstracts.2014:M27.012. [8]HEINEN S,SCHWILK M,VON RUDORFF G F,et al.Ma-chine learning the computational cost of quantum chemistry[J].Machine Learning:Science and Technology,2020,1(2):025002. [9]WEI J W,WANG L Z,WANG Y C.Predicting job runtime via machine learning:using Gaussian09 as an example[C]//Proceedings of 2021 CCF National Annual Conference on High Performance Computing.Zhuhai:China Computer Federation,2021:519-527. [10]ZHOU L F,YANG W X,HAN Y G,et al.A Method for Predicting Job Running Time Based on Hierarchical Clustering of Job Names[C]//Proceedings of 2021 CCF National Annual Confe-rence on High Performance Computing.Zhuhai:China Computer Federation,2021:287-297. [11]HU Y H,PAN E Y,MAO J F,et al.Research on Optimal Allocation strategy of Cluster resources Based on LSTM Time prediction network and Genetic Algorithm[C]//Proceedings of 2021 CCF National Annual Conference on High Performance Computing.Zhuhai:China Computer Federation,2021:85-95. [12]MA S,MA Y,ZHANG B,et al.Forecasting System of Computational Time of DFT/TDDFT Calculations under the Multiverse Ansatz via Machine Learning and Cheminformatics[J].ACSOmega,2021,6(3):2001-2024. [13]GORDON M S,FEDOROV D G,PRUITT S R,et al.Fragmentation Methods:A Route to Accurate Calculations on Large Systems[J].Chemical Reviews,2012,112(1):632-672. [14]COLLINS M A,BETTENS R P A.Energy-Based MolecularFragmentation Methods[J].Chemical Reviews,2015,115(12):5607-5642. [15]ZHANG B,MA Y,JIN X,et al.GridMol2.0:Implementationand application of linear-scale quantum mechanics methods and molecular visualization[J].International Journal of Quantum Chemistry,2020,120(23):e26402. [16]DE JONG W A,BYLASKA E,GOVIND N,et al.Utilizing high performance computing for chemistry:parallel computational chemistry[J].Physical Chemistry Chemical Physics,2010,12(26):6896-6920. [17]VON ARNIM M,AHLRICHS R.Performance of parallel TURBOMOLE for density functional calculations[J].Journal of Computational Chemistry,1998,19(15):1746-1757. [18]BRODE S,HORN H,EHRIG M,et al.Parallel direct SCF and gradient program for workstation clusters[J].Journal of Computational Chemistry,1993,14(10):1142-1148. [19]LI Y,WRINN M,NEWSAM J,et al.Parallel implementation of a mesh-based density functional electronic structure code[J].Journal of Computational Chemistry,1995,16(2):226-234. [20]CHASMAN D,BEACHY M D,WANG L M,et al.Parallelpseudospectral electronic structure:I.Hartree-Fock calculations[J].Journal of Computational Chemistry,1998,19(9):1017-1029. [21]GUERRA C F,VISSER O,SNIJDERS J,et al.Parallellisation of the Amsterdam Density Functional Programme[M]//Host Publication.STEF,1995:303-395. [22]COLVIN M E,JANSSEN C L,WHITESIDE R A,et al.Parallel direct SCF for large-scale calculations[J].Theoretica Chimica Acta,1993,84(4/5):301-314. [23]MÁRQUEZ A M,OVIEDO J,SANZ J F,et al.Parallel Computation of second derivatives of RHF energy on distributed me-mory computers[J].Journal of Computational Chemistry,1997,18(2):159-168. [24]NEESE F,WENNMOHS F,HANSEN A,et al.Efficient,ap-proximate and parallel Hartree-Fock and hybrid DFT calculations.A ‘chain-of-spheres' algorithm for the Hartree-Fock exchange[J].Journal of Chemical Physics,2009,356(1/2/3):98-109. [25]GAN C K,CHALLACOMBE M.Linear scaling computation of the Fock matrix.VI.Data parallel computation of the exchange-correlation matrix[J].Journal of Chemical Physics,2003,118(20):9128-9135. [26]JANSSEN C L,NIELSEN I M.Parallel computing in quantum chemistry[M].CRC press:2008. [27]FERRIGHI L,FREDIANI L,FOSSGAARD E,et al.Parallelization of the integral equation formulation of the polarizable continuum model for higher-order response functions[J].Journal of Chemical Physics,2006,125(15):154112. [28]YOSHIHIRO T,SATO F,KASHIWAGI H.Distributed parallel processing by using the object-oriented technology in ProteinDF program for all-electron calculations on proteins[J].Chemical Physics Letters,2001,346(3/4):313-321. [29]BAKER J,PULAY P.An efficient parallel algorithm for the calculation of unrestricted canonical MP2 energies[J].Journal of Computational Chemistry,2011,32(15):3304-3312. [30]KUMAR V,GRAMA A Y,VEMPATY N R.Scalable load ba-lancing techniques for parallel computers[J].Journal of Parallel and Distributed Computing,1994,22 (1):60-79. [31]DINAN J,LARKINS D B,SADAYAPPAN P,et al.In Scalable work stealing,Proceedings of the Conference on High Performance Computing Networking[C]//Storage and Analysis.IEEE:2009:1-11. [32]NIKODEM A,MATVEEV A V,SOINI T M,et al.Load balancing by work-stealing in quantum chemistry calculations:Application to hybrid density functional methods[J].International Journal of Quantum Chemistry,2014,114 (12):813-822. [33]MA Y.A Forecasting System under MWI,ML,and Chem-informatics[OL].https://github.com/yingjin-ma/Fcst_sys_public. [34]CHEN G,CHEN P,HSIEH C Y,et al.Alchemy:A quantumchemistry dataset for benchmarking ai models[J].arXiv:1906.09427,2019. [35]PERDEW J P,SCHMIDT K.Jacob's ladder of density functio-nal approximations for the exchange-correlation energy[C]//AIP Conference Proceedings.American Institute of Physics,2001:1-20. [36]SCHÜTT K T,KINDERMANS P J,SAUCEDA H E,et al.Schnet:A continuous-filter convolutional neural network for modeling quantum interactions[J/OL].Advances in Neural Information Processing Systems,2017,30.https://koreauniv.pure.elsevier.com/en/publications/schnet-a-continuous-filter-convolutional-neural-network-for-model. [37]GGDH.Gaussian Software Parallel Efficiency Retest(2017.1)-Hardware Configuration and Procurement-Computational Chemistry Commune [EB/OL].(2017-01-02)[2021-03-14].http://bbs.keinsci.com/thread-4841-1-1.html. [38]DENG L,YU D.Deep Learning-Methods and Applications[J].Foundations and Trends in Signal Processing,2014,7(3/4),197-387. [39]PRITCHARD B P,ALTARAWY D,DIDIER B,et al.New basis set exchange:An open,up-to-date resource for the molecular sciences community[J].Journal of Chemical Information and Modeling,2019,59(11):4814-4820. [40]MARQUES M A,OLIVEIRA M J,BURNUS T.Libxc:A library of exchange and correlation functionals for density functional theory[J].Computer Physics Communications,2012,183(10):2272-2281. [41]O'BOYLE N M,BANCK M,JAMES C A,et al.Open Babel:An open chemical toolbox[J].Journal of Cheminformatics,2011,3(1):1-14. [42]The RDKit Documentation-The RDKit 2019.03.1 documentation[OL].http://www.rdkit.org/docs/index.html. [43]ZHANG Y,SUO B,WANG Z,et al.BDF:A relativistic electronic structure program package[J].The Journal of Chemical Physics,2020,152(6):064113. [44]STROUT D L,SCUSERIA G E.A quantitative study of thescaling properties of the Hartree-Fock method[J].The Journal of Chemical Physics 1995,102:8448-8452. |
[1] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[2] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[3] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[4] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[5] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[6] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[7] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[8] | 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮. 一种基于异质模型融合的 Android 终端恶意软件检测方法 Android Malware Detection Method Based on Heterogeneous Model Fusion 计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103 |
[9] | 李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208 |
[10] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[11] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[12] | 田真真, 蒋维, 郑炳旭, 孟利民. 基于服务器集群的负载均衡优化调度算法 Load Balancing Optimization Scheduling Algorithm Based on Server Cluster 计算机科学, 2022, 49(6A): 639-644. https://doi.org/10.11896/jsjkx.210800071 |
[13] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
[14] | 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨. 基于国产众核处理器的深度神经网络算子加速库优化 Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor 计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226 |
[15] | 李浩东, 胡洁, 范勤勤. 基于并行分区搜索的多模态多目标优化及其应用 Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application 计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019 |
|