计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 251-256.doi: 10.11896/jsjkx.200700066
谢良旭1,2, 李峰3, 谢建平4, 许晓军1
XIE Liang-xu1,2, LI Feng3, XIE Jian-ping4, XU Xiao-jun1
摘要: 在生物信息学领域,人工智能方法在预测药物分子的物理化学性质和生物活性中获得了重大成功,特别是神经网络已被广泛应用到药物研发中。但是浅层神经网络的预测精度低,深度神经网络又容易出现过拟合的问题,而模型融合策略有望提升机器学习中弱学习器的预测能力。据此,文中将模型融合方法首次应用到药物分子性质的预测中,通过对药物分子的化学结构进行信息化编码,采用平均法、堆叠法融合浅层神经网络,提高对药物分子pKa预测的能力。与深度学习方法相比,堆叠法(Stacking)融合的模型具有更高的预测准确性,其预测结果的相关系数达到0.86。通过将多个弱学习器的神经网络有机组合可使其达到深度神经网络的预测精度,同时保留更好的模型泛化能力。研究结果表明,模型融合方法可提高神经网络对药物分子pKa预测结果的准确性和可靠性。
中图分类号:
[1]DANISHUDDI N,KHAN A U.Descriptors and their selection methods in QSAR analysis:paradigm for drug design[J].Drug Discovery Today,2016,21(8):1291-1302. [2]CHERKASOV A,MURATOV E N,FOURCHES D,et al.QSAR modeling:Where have you been? Where are you going to?[J].Journal of Medicinal Chemistry,2014,57(12):4977-5010. [3]SUN Z,LU C,SHI Z,et al.Reasearch and advances on deep learning[J].Computer Science,2016,43(2):1-8. [4]TIAN Q,WANG M.Research progress on deep learning algorithms.Computer Engineering and Applications[J].2019,55(22):25-33. [5]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [6]CHAN H C S,SHAN H,DAHOUN T,et al.Advancing drug discovery via artificial intelligence[J].Trends in Pharmacological Sciences,2019,40(8):592-604. [7]SHI X Y,YU L,TIAN S.et al.Research on calssification of oral bioavailability based on deep learning[J].Computer Science,2016,43(4):260-263. [8]SHEN C,DING J,WANG Z,et al.From machine learning todeep learning:Advances in scoring functions for protein-ligand docking[J].WIREs Computational Molecular Science,2020,10(1):e1429. [9]SEGLER M H S,KOGEJ T,TYRCHAN C,et al.Generating focused molecule libraries for drug discovery with recurrent neural networks[J].ACS Central Science,2018,4(1):120-131. [10]SMITH J S,ROITBERG A E,ISAYEV O.Transforming computational drug discovery with machine learning and AI[J].ACS Medicinal Chemistry Letters,2018,9(11):1065-1069. [11]XU Y,YAO H,LIN K.An overview of neural networks for drug discovery and the inputs used[J].Expert Opinion on Drug Discovery,2018,13(12):1091-1102. [12]FEINBERG E N,JOSHI E,PANDE V S,et al.Improvement in ADMET prediction with multitask deep featurization[J].Journal of Medicinal Chemistry,2020,63(16):8835-8848. [13]WENZEL J,MATTER H,SCHMIDT F.Predictive multitaskdeep neural network models for ADME-Tox properties:Lear-ning from large data sets[J].Journal of Chemical Information and Modeling,2019,59(3):1253-1268. [14]LEI T,SUN H,KANG Y,et al.ADMET evaluation in drug discovery.18.Reliable prediction of chemical-induced urinary tract toxicity by boosting machine learning approaches[J].Molecular Pharmaceutics,2017,14(11):3935-3953. [15]FU L,LIU L,YANG Z J,et al.Systematic modeling of logD7.4 based on ensemble machine learning,group contribution,and matched molecular pair analysis[J].Journal of Chemical Information and Modeling,2020,60(1):63-76. [16]LIAO C,NICKLAUS M C.Comparison of nine programs predicting pKa values of pharmaceutical substances[J].Journal of Chemical Information and Modeling,2009,49(12):2801-2812. [17]MANSOURI K,CARIELLO N F,KOROTCOV A,et al.Open-source QSAR models for pKa prediction using multiple machine learning approaches[J].Journal of Cheminformatics,2019,11(1):60. [18]ZHOU Z H,WU J,TANG W.Ensembling neural networks:Many could be better than all[J].Artificial Intelligence,2002,137(1):239-263. [19]MIN S,LEE B,YOON S.Deep learning in bioinformatics[J].Briefings in Bioinformatics,2016,18(5):851-869. [20]WISHART D S,FEUNANG Y D,GUO A C,et al.DrugBank 5.0:a major update to the DrugBank database for 2018[J].Nucleic Acids Research,2018,46(D1):D1074-D1082. [21]CHUANG K V,GUNSALUS L M,KEISER M J.Learning molecular representations for medicinal chemistry[J].Journal of Medicinal Chemistry,2020,63(16):8705-8722. [22]DUAN J,DIXON S L,LOWRIE J F,et al.Analysis and compa-rison of 2D fingerprints:Insights into database screening performance using eight fingerprint methods[J].Journal of Mole-cular Graphics and Modelling,2010,29(2):157-170. [23]LI L,KOH C C,REKER D,et al.Predicting protein-ligand interactions based on bow-pharmacological space and Bayesian additive regression trees[J].Scientific Reports,2019,9(1):7703. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[4] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[5] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[6] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[7] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[8] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[9] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[10] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[11] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[12] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[13] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[14] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[15] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
|