计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 251-256.doi: 10.11896/jsjkx.200700066

• 人工智能 • 上一篇    下一篇

基于融合神经网络模型的药物分子性质预测

谢良旭1,2, 李峰3, 谢建平4, 许晓军1   

  1. 1 江苏理工学院电气信息工程学院生物信息与医药工程研究所 江苏 常州213001
    2 江苏省中以产业技术研究院 江苏 常州213100
    3 江苏理工学院电气信息工程学院 江苏 常州213001
    4 湖州师范学院理学院 浙江 湖州 313000
  • 收稿日期:2020-07-10 修回日期:2020-10-20 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 许晓军(xuxiaojun@jsut.edu.cn)
  • 作者简介:xieliangxu@jsut.edu.cn
  • 基金资助:
    国家自然科学基金(12074151,22003020);江苏省自然科学基金(BK20191032);常州市重点研发项目(CJ20200045);江苏省中以产业技术研究院开放课题(JSIITRI202009)

Predicting Drug Molecular Properties Based on Ensembling Neural Networks Models

XIE Liang-xu1,2, LI Feng3, XIE Jian-ping4, XU Xiao-jun1   

  1. 1 Institute of Bioinformatics, Medical Engineering, School of Electrical, Information Engineering, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
    2 Jiangsu Sino-Israel Industrial Technology Research Institute,Changzhou,Jiangsu 213100,China
    3 School of Electrical and Information Engineering,Jiangsu University of Technology,Changzhou,Jiangsu 213001,China4 School of Science,Huzhou University,Huzhou,Zhejiang 313000,China
  • Received:2020-07-10 Revised:2020-10-20 Online:2021-09-15 Published:2021-09-10
  • About author:XIE Liang-xu,born in 1987,postgra-duate,associate professor,is a member of China Computer Federation.His main research interest includes AI aided drug design and data mining.
    XU Xiao-jun,born in 1979,professor,Jiangsu distinguished professor.His main research interest includes computational biophysics and AI aided biomolecules structure prediction.
  • Supported by:
    National Natural Science Foundation of China(12074151,22003020),Natural Science Foundation of Jiangsu Province,China(BK20191032),Changzhou Sci & Tech Program(CJ20200045) and Funding from Jiangsu Sino-Israel Industrial Technology Research Institute(JSIITRI202009)

摘要: 在生物信息学领域,人工智能方法在预测药物分子的物理化学性质和生物活性中获得了重大成功,特别是神经网络已被广泛应用到药物研发中。但是浅层神经网络的预测精度低,深度神经网络又容易出现过拟合的问题,而模型融合策略有望提升机器学习中弱学习器的预测能力。据此,文中将模型融合方法首次应用到药物分子性质的预测中,通过对药物分子的化学结构进行信息化编码,采用平均法、堆叠法融合浅层神经网络,提高对药物分子pKa预测的能力。与深度学习方法相比,堆叠法(Stacking)融合的模型具有更高的预测准确性,其预测结果的相关系数达到0.86。通过将多个弱学习器的神经网络有机组合可使其达到深度神经网络的预测精度,同时保留更好的模型泛化能力。研究结果表明,模型融合方法可提高神经网络对药物分子pKa预测结果的准确性和可靠性。

关键词: 机器学习, 计算机辅助药物设计, 模型融合, 深度学习, 生物信息学

Abstract: Artificial intelligence (AI) methods have made great success in predicting chemical properties and bioactivity of drug molecules in the Bioinformatics field.Neural network gains wide applications in the process of drug discovery.However,the shallow neural network (SNN) gives lower accuracy while deep neural networks (DNN) are easy to be overfitting.Model ensembling is expected to further improve the predictive performance of weak learners in traditional machine learning methods.Therefore,it is the first time to apply model ensembling strategy to predict the properties of drug molecules.By encoding molecular structures,the combination strategies,averaging,and stacking methods are adopted to increase predicting accuracy of pKa of drug molecules.Compared with DNN,the stacking strategy presents the best predictive accuracy and the Pearson coefficient reaches to 0.86.Ensembling weak learners of the neural networks can reproduce the accuracy of DNN while keeping the satisfied generalization ability.The results show that ensembling method can increase the predictive accuracy and reliability.

Key words: Bioinformatics, Computer aided drug discovery, Deep learning, Machine learning, Model ensembling

中图分类号: 

  • TP183
[1]DANISHUDDI N,KHAN A U.Descriptors and their selection methods in QSAR analysis:paradigm for drug design[J].Drug Discovery Today,2016,21(8):1291-1302.
[2]CHERKASOV A,MURATOV E N,FOURCHES D,et al.QSAR modeling:Where have you been? Where are you going to?[J].Journal of Medicinal Chemistry,2014,57(12):4977-5010.
[3]SUN Z,LU C,SHI Z,et al.Reasearch and advances on deep learning[J].Computer Science,2016,43(2):1-8.
[4]TIAN Q,WANG M.Research progress on deep learning algorithms.Computer Engineering and Applications[J].2019,55(22):25-33.
[5]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[6]CHAN H C S,SHAN H,DAHOUN T,et al.Advancing drug discovery via artificial intelligence[J].Trends in Pharmacological Sciences,2019,40(8):592-604.
[7]SHI X Y,YU L,TIAN S.et al.Research on calssification of oral bioavailability based on deep learning[J].Computer Science,2016,43(4):260-263.
[8]SHEN C,DING J,WANG Z,et al.From machine learning todeep learning:Advances in scoring functions for protein-ligand docking[J].WIREs Computational Molecular Science,2020,10(1):e1429.
[9]SEGLER M H S,KOGEJ T,TYRCHAN C,et al.Generating focused molecule libraries for drug discovery with recurrent neural networks[J].ACS Central Science,2018,4(1):120-131.
[10]SMITH J S,ROITBERG A E,ISAYEV O.Transforming computational drug discovery with machine learning and AI[J].ACS Medicinal Chemistry Letters,2018,9(11):1065-1069.
[11]XU Y,YAO H,LIN K.An overview of neural networks for drug discovery and the inputs used[J].Expert Opinion on Drug Discovery,2018,13(12):1091-1102.
[12]FEINBERG E N,JOSHI E,PANDE V S,et al.Improvement in ADMET prediction with multitask deep featurization[J].Journal of Medicinal Chemistry,2020,63(16):8835-8848.
[13]WENZEL J,MATTER H,SCHMIDT F.Predictive multitaskdeep neural network models for ADME-Tox properties:Lear-ning from large data sets[J].Journal of Chemical Information and Modeling,2019,59(3):1253-1268.
[14]LEI T,SUN H,KANG Y,et al.ADMET evaluation in drug discovery.18.Reliable prediction of chemical-induced urinary tract toxicity by boosting machine learning approaches[J].Molecular Pharmaceutics,2017,14(11):3935-3953.
[15]FU L,LIU L,YANG Z J,et al.Systematic modeling of logD7.4 based on ensemble machine learning,group contribution,and matched molecular pair analysis[J].Journal of Chemical Information and Modeling,2020,60(1):63-76.
[16]LIAO C,NICKLAUS M C.Comparison of nine programs predicting pKa values of pharmaceutical substances[J].Journal of Chemical Information and Modeling,2009,49(12):2801-2812.
[17]MANSOURI K,CARIELLO N F,KOROTCOV A,et al.Open-source QSAR models for pKa prediction using multiple machine learning approaches[J].Journal of Cheminformatics,2019,11(1):60.
[18]ZHOU Z H,WU J,TANG W.Ensembling neural networks:Many could be better than all[J].Artificial Intelligence,2002,137(1):239-263.
[19]MIN S,LEE B,YOON S.Deep learning in bioinformatics[J].Briefings in Bioinformatics,2016,18(5):851-869.
[20]WISHART D S,FEUNANG Y D,GUO A C,et al.DrugBank 5.0:a major update to the DrugBank database for 2018[J].Nucleic Acids Research,2018,46(D1):D1074-D1082.
[21]CHUANG K V,GUNSALUS L M,KEISER M J.Learning molecular representations for medicinal chemistry[J].Journal of Medicinal Chemistry,2020,63(16):8705-8722.
[22]DUAN J,DIXON S L,LOWRIE J F,et al.Analysis and compa-rison of 2D fingerprints:Insights into database screening performance using eight fingerprint methods[J].Journal of Mole-cular Graphics and Modelling,2010,29(2):157-170.
[23]LI L,KOH C C,REKER D,et al.Predicting protein-ligand interactions based on bow-pharmacological space and Bayesian additive regression trees[J].Scientific Reports,2019,9(1):7703.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[4] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[5] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[7] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[8] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[9] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[10] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[11] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[13] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[14] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[15] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!