计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 59-65.doi: 10.11896/jsjkx.181202395
谢腾宇1,周晓根2,胡俊1,张贵军1
XIE Teng-yu1,ZHOU Xiao-gen2,HU Jun1,ZHANG Gui-jun1
摘要: 从头预测是蛋白质结构建模的一种重要方法,该方法的研究有助于人类理解蛋白质功能,从而进行药物设计和疾病治疗。为了提高预测精度,文中提出了基于接触图残基对距离约束的蛋白质结构预测算法(CDPSP)。基于进化算法框架,CDPSP将构象空间采样分为探索和增强两个阶段。在探索阶段,设计基于残基对距离的变异与选择策略,即根据接触图的接触概率选择残基对,并通过片段组装技术对所选择的残基对的邻近区域进行变异;将残基对距离离散化为多个区域并为其分配期望概率,根据期望概率确定是否选择变异的构象,从而增加种群的多样性。在增强阶段,利用基于接触图信息的评分指标,结合能量函数,衡量构象的质量,从而选择较优的构象,达到增强CDPSP近天然态区域采样能力的效果。为了验证所提算法的性能,通过CASP12中的10个FM组目标蛋白质对其进行了测试,并将其与一些先进算法进行比较。实验结果表明,CDPSP可以预测得到精度较高的蛋白质三维结构模型。
中图分类号:
[1]KOLATA G.Trying to crack the second half of the genetic code [J].Science,1986,233:1037-1040. [2]WANG C,ZHU J W,ZHANG H C,et al.A Survey on Algorithms for Protein Tertiary Structure Prediction[J].Chinese Journal of Computers,2018,41(4):760-779. [3]DENG H Y,JIA Y,ZHANG Y.Protein structure prediction [J].Acta Physica Sinica,2016,65(17):169-179. [4]MA B G.Protein Folding Prediction[J].Chinese Science Bulletin,2016,61(24):2670-2680. [5]DILL K A,MACCALLUM J L.The protein-folding problem,50 years on [J].Science,2012,338(6110):1042-1046. [6]ZHANG Y.Protein structure prediction:when is it useful? [J].Current Opinion in Structural Biology,2009,19(2):145-155. [7]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction (CASP)-Round XII [J].Proteins:Structure,Function,and Bioinforma-tics,2018,86 (Suppl 1):7-15. [8]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction:Progress and new directions in round XI [J].Proteins:Structure,Function,and Bioinformatics,2016,84(Suppl 1):4-14. [9]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction (CASP)--round x [J].Proteins:Structure,Function,and Bioinformatics,2014,82( Suppl 2):1-6. [10]KEASAR C,MCGUFFIN L J,WALLNER B,et al.An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12 [J].Scientific Reports,2018,8(1):9939. [11]ANFINSEN C B,HABER E,SELA M,et al.The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain [J].Proceedings of the National Academy of Sciences,1961,47(9):1309-1314. [12]LEE J,FREDDOLINO P L,ZHANG Y.Ab Initio Protein Structure Prediction[M]∥From Protein Structure to Function with Bioinformatics.Netherlands,Dordrecht:Springer,2017:3-35. [13]BRADLEY P,MISURA K M,BAKER D.Toward high-resolution de novo structure prediction for small proteins [J].Science,2005,309(5742):1868-1871. [14]KIM D E,BLUM B,BRADLEY P,et al.Sampling bottlenecks in de novo protein structure prediction [J].Journal of Molecular Biology,2009,393(1):249-260. [15]LI Z,SCHERAGA H A.Monte Carlo-minimization approach to the multiple-minima problem in protein folding [J].Proceedings of the National Academy of Sciences,1987,84(19):6611-6615. [16]KIHARA D,LU H,KOLINSKI A,et al.TOUCHSTONE:an ab initio protein structure prediction method that uses threading-based tertiary restraints [J].Proceedings of the National Academy of Sciences,2001,98(18):10125-10130. [17]LEE J.New Monte Carlo algorithm:Entropic sampling [J]. Physical Review Letters,1993,71(2):211-214. [18]PIANA S,LINDORFF-LARSEN K,SHAW D E.Atomic-level description of ubiquitin folding [J].Proceedings of the National Academy of Sciences,2013,110(15):5915-5920. [19]LINDORFF-LARSEN K,MARAGAKIS P,PIANA S,et al.Picosecond to Millisecond Structural Dynamics in Human Ubiqui-tin[J].Journal of Physical Chemistry B,2016,120(33):8313-8320. [20]PEARLMAN D A,CASE D A,CALDWELL J W,et al.Amber,a Package of Computer-Programs for Applying Molecular Mechanics,Normal-Mode Analysis,Molecular-Dynamics and Free-Energy Calculations to Simulate the Structural and Energetic Properties of Molecules [J].Computer Physics Communications,1995,91(1/2/3):1-41. [21]CLAUSEN R,SHEHU A.A multiscale hybrid evolutionary algorithm to obtain sample-based representations of multi-basin protein energy landscapes[C]∥Proceedings of the 5th ACM Conference on Bioinformatics,Computational Biology,and Health Informatics.ACM,2014:269-278. [22]GARZA-FABRE M,KANDATHIL S M,HANDL J,et al.Generating,Maintaining,and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction [J].Evolutionary Computation,2016,24(4):577-607. [23]HAO X H,ZHANG G J,ZHOU X G.Conformational Space Sampling Method Using Multi-Subpopulation Differential Evolution for De novo Protein Structure Prediction [J].IEEE Transactions on Nanobioscience,2017,16(7):618-633. [24]HAO X H,ZHANG G J,ZHOU X G,et al.A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2016,13(5):887-900. [25]HAO X H,ZHANG G J,ZHOU X G.Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction [J].Computational Biology and Chemistry,2018,73:105-119. [26]ZHOU X G,ZHANG G J.Differential Evolution With Underestimation-Based Multimutation Strategy [J].IEEE Transactions on Cybernetics,2018,PP(99):1-12. [27]ZHANG G J,ZHOU X G,YU X F,et al.Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2017,14(6):1288-1301. [28]ZHOU X G,ZHANG G J,HAO X H,et al.Enhanced differen- tial evolution using local Lipschitz underestimate strategy for computationally expensive optimization problems [J].Applied Soft Computing,2016,48:169-181. [29]ZHOU X G,ZHANG G J.Abstract Convex Underestimation Assisted Multistage Differential Evolution [J].IEEE Transactions on Cybernetics,2017,47(9):2730-2741. [30]ZHOU X G,ZHANG G J,HAO X H,et al.A novel differential evolution algorithm using local abstract convex underestimate strategy for global optimization [J].Computers & Operations Research,2016,75(11):132-149. [31]RAKHSHANI H,IDOUMGHAR L,LEPAGNOT J,et al. Speed up differential evolution for computationally expensive protein structure prediction problems[J/OL].https://doi.org/10.1016/j.swevo.2019.01.009. [32]LEE J,SCHERAGA H A,RACKOVSKY S.New optimization method for conformational energy calculations on polypeptides:Conformational space annealing [J].Journal of Computational Chemistry,1997,18(9):1222-1232. [33]ZHANG Y.Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 [J].Proteins:Structure,Function,and Bioinformatics,2014,82:175-187. [34]LI Z W,HAO X H,ZHANG G J.Replica Exchange Based Local Enhanced Differential Evolution Searching Method in Ab-initio Protein Structure Prediction[J].Computer Science,2017,44(5):211-217. [35]SHEHU A,OLSON B.Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration [J].International Journal of Robotics Research,2010,29(8):1106-1127. [36]ROY A,KUCUKURAL A,ZHANG Y.I-TASSER:a unified platform for automated protein structure and function prediction [J].Nature Protocols,2010,5(4):725-738. [37]ROHL C A,STRAUSS C E M,MISURA K M S,et al.Protein structure prediction using rosetta [J].Methods in Enzymology,2004,383:66-93. [38]HAO X H,ZHANG G J,ZHOU X G,et al.Protein Conformational Space Optimization Algorithm Based on Fragment-assembly[J].Computer Science,2015,42(3):237-240. [39]XU D,ZHANG Y.Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field [J].Proteins:Structure,Function,and Bioinforma-tics,2012,80(7):1715-1735. [40]KC D B.Recent advances in sequence-based protein structure prediction [J].Briefings in bioinformatics,2016,18(6):1021-1032. [41]ZHANG C,MORTUZA S M,HE B,et al.Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12 [J].Proteins:Structure,Function,and Bioinformatics,2018,86( Suppl 1):136-151. [42]ZHANG W X,YANG J Y,HE B J,et al.Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11 [J].Proteins:Structure,Function,and Bioinformatics,2016,84:76-86. [43]OVCHINNIKOV S,PARK H,KIM D E,et al.Protein structure prediction using Rosetta in CASP12 [J].Proteins:Structure,Function,and Bioinformatics,2018,86( Suppl 1):113-121. [44]MOULT J,FIDELIS K,KRYSHTAFOVYCH A,et al.Critical assessment of methods of protein structure prediction (CASP)Round XII [J].Proteins:Structure,Function,and Bioinformatics 2018,86:7-15. [45]ADHIKARI B,BHATTACHARYA D,CAO R,et al.CON- FOLD:Residue-residue contact-guided ab initio protein folding [J].Proteins:Structure,Function,and Bioinformatics,2015,83(8):1436-1449. [46]JONES D T.Predicting novel protein folds by using FRAGFOLD [J].Proteins:Structure,Function,and Bioinformatics,2001( Suppl 5):127-132. [47]DE OLIVEIRA S H P,DEANE C M.Combining co-evolution and secondary structure prediction to improve fragment library generation [J].Bioinformatics,2018,34(13):2219-2227. [48]ZHANG G J,MA L F,WANG X Q,et al.Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2018,PP(99):1-1. [49]DeepMind.AlphaFold:Using AI for scientific discovery[EB/ OL].(2018-12-02).https://deepmind.com/blog/alphafold/. [50]JI S,ORUC T,MEAD L,et al.DeepCDpred:Inter-residue Distance and Contact Prediction for Improved Prediction of Protein Structure [J].PLOS ONE,2019,14(1):e0205214. [51]WANG S,LI W,ZHANG R,et al.CoinFold:a web server for protein contact prediction and contact-assisted protein folding [J].Nucleic Acids Research,2016,44(W1):W361-W366. [52]WANG S,SUN S Q,LI Z,et al.Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model [J].Plos Computational Biology,2017,13(1):e1005324. [53]MA J Z,WANG S,WANG Z Y,et al.Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning [J].Bioinformatics,2015,31(21):3506-3513. [54]ABRIATA L A,TAMO G E,MONASTYRSKYY B,et al.Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods [J].Proteins:Structure,Function,and Bioinformatics,2018,86:97-112. [55]ZHANG Y,SKOLNICK J.SPICKER:A clustering approach to identify near-native protein folds [J].Journal of Computational Chemistry,2004,25(6):865-871. [56]CHIVIAN D,KIM D E,MALMSTROM L,et al.Automated prediction of CASP-5 structures using the Robetta server [J].Proteins:Structure,Function,and Bioinformatics,2003,53:524-533. [57]ZHANG Y,SKOLNICK J.Scoring function for automated assessment of protein structure template quality [J].Proteins:Structure,Function,and Bioinformatics,2004,57(4):702-710. [58]XU J,ZHANG Y.How significant is a protein structure similari- ty with TM-score= 0.5? [J].Bioinformatics,2010,26(7):889-895. |
[1] | 刘宝宝, 杨菁菁, 陶露, 王贺应. 基于DE-LSTM模型的教育统计数据预测研究 Study on Prediction of Educational Statistical Data Based on DE-LSTM Model 计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120 |
[2] | 孙刚, 伍江江, 陈浩, 李军, 徐仕远. 一种基于切比雪夫距离的隐式偏好多目标进化算法 Hidden Preference-based Multi-objective Evolutionary Algorithm Based on Chebyshev Distance 计算机科学, 2022, 49(6): 297-304. https://doi.org/10.11896/jsjkx.210500095 |
[3] | 李笠, 李广鹏, 常亮, 古天龙. 约束进化算法及其应用研究综述 Survey of Constrained Evolutionary Algorithms and Their Applications 计算机科学, 2021, 48(4): 1-13. https://doi.org/10.11896/jsjkx.200600151 |
[4] | 周晟伊, 曾红卫. 进化算法与符号执行结合的程序复杂度分析方法 Program Complexity Analysis Method Combining Evolutionary Algorithm with Symbolic Execution 计算机科学, 2021, 48(12): 107-116. https://doi.org/10.11896/jsjkx.210200052 |
[5] | 赵杨, 倪志伟, 朱旭辉, 刘浩, 冉家敏. 基于改进狮群进化算法的面向空间众包平台的多工作者多任务路径规划方法 Multi-worker and Multi-task Path Planning Based on Improved Lion Evolutionary Algorithm forSpatial Crowdsourcing Platform 计算机科学, 2021, 48(11A): 30-38. https://doi.org/10.11896/jsjkx.201200085 |
[6] | 朱汉卿, 马武彬, 周浩浩, 吴亚辉, 黄宏斌. 基于改进多目标进化算法的微服务用户请求分配策略 Microservices User Requests Allocation Strategy Based on Improved Multi-objective Evolutionary Algorithms 计算机科学, 2021, 48(10): 343-350. https://doi.org/10.11896/jsjkx.201100009 |
[7] | 张清琪, 刘漫丹. 复杂网络社区发现的多目标五行环优化算法 Multi-objective Five-elements Cycle Optimization Algorithm for Complex Network Community Discovery 计算机科学, 2020, 47(8): 284-290. https://doi.org/10.11896/jsjkx.190700082 |
[8] | 李章维, 肖璐倩, 郝小虎, 周晓根, 张贵军. 蛋白质构象空间的多模态优化算法 Multimodal Optimization Algorithm for Protein Conformation Space 计算机科学, 2020, 47(7): 161-165. https://doi.org/10.11896/jsjkx.190600100 |
[9] | 董明刚, 弓佳明, 敬超. 基于谱聚类的多目标进化社区发现算法研究 Multi-obJective Evolutionary Algorithm Based on Community Detection Spectral Clustering 计算机科学, 2020, 47(6A): 461-466. https://doi.org/10.11896/JsJkx.191100215 |
[10] | 杨浩, 陈红梅. 基于量子进化算法的非平衡数据混合采样算法 Mixed-sampling Method for Imbalanced Data Based on Quantum Evolutionary Algorithm 计算机科学, 2020, 47(11): 88-94. https://doi.org/10.11896/jsjkx.191000102 |
[11] | 王瑄, 毛莺池, 谢在鹏, 黄倩. 基于差分进化的推断任务卸载策略 Inference Task Offloading Strategy Based on Differential Evolution 计算机科学, 2020, 47(10): 256-262. https://doi.org/10.11896/jsjkx.190800159 |
[12] | 肖鹏, 邹德旋, 张强. 一种高效动态自适应差分进化算法 Efficient Dynamic Self-adaptive Differential Evolution Algorithm 计算机科学, 2019, 46(6A): 124-132. |
[13] | 李章维, 郝小虎, 张贵军. 蛋白质结构从头预测多级个体筛选进化算法 Multi-layer Screening Based Evolution Algorithm for De Novo Protein Structure Prediction 计算机科学, 2019, 46(6A): 80-84. |
[14] | 耿焕同, 韩伟民, 周山胜, 丁洋洋. 一种基于新型邻域更新策略的MOEA/D算法 MOEA/D Algorithm Based on New Neighborhood Updating Strategy 计算机科学, 2019, 46(5): 191-197. https://doi.org/10.11896/j.issn.1002-137X.2019.05.029 |
[15] | 金婷, 谭文安, 孙勇, 赵尧. 模糊多目标进化的社会团队形成方法 Social Team Formation Method Based on Fuzzy Multi-objective Evolution 计算机科学, 2019, 46(2): 315-320. https://doi.org/10.11896/j.issn.1002-137X.2019.02.048 |
|