计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 222-227.doi: 10.11896/j.issn.1002-137X.2019.04.035

所属专题: 生物信息学

• 人工智能 • 上一篇    下一篇

基于双加权投票的蛋白质功能预测

唐家琪1, 吴璟莉1,2,3, 廖元秀1, 王金艳1,2,3   

  1. 广西师范大学计算机科学与信息工程学院 广西 桂林5410041
    广西师范大学广西多源信息挖掘与安全重点实验室 广西 桂林5410042
    广西区域多源信息集成与智能处理协同创新中心 广西 桂林5410043
  • 收稿日期:2018-03-03 出版日期:2019-04-15 发布日期:2019-04-23
  • 通讯作者: 吴璟莉(1978-),女,博士,教授,CCF会员,主要研究方向为生物信息学、算法设计与分析,E-mail:wjlhappy@mailbox.gxnu.edu.cn(通信作者)
  • 作者简介:唐家琪(1992-),男,硕士生,主要研究方向为生物信息学、机器学习;廖元秀(1963-),女,硕士,副教授,主要研究方向为人工智能、形式化方法、机器人知识表示及推理;王金艳(1982-),女,博士,副教授,CCF会员,主要研究方向为数据安全、不确定性理论、自动推理。
  • 基金资助:
    本文受国家自然科学基金项目(61762015,61502111,61662007,61763003),广西自然科学基金项目(2015GXNSFAA139288),“八桂学者”工程专项,广西科技基地和人才专项(AD16380008)资助。

Prediction of Protein Functions Based on Bi-weighted Vote

TANG Jia-qi1, WU Jing-li1,2,3, LIAO Yuan-xiu1, WANG Jin-yan1,2,3   

  1. School of Computer Science & Information Engineering,Guangxi Normal University,Guilin,Guangxi 541004,China1
    Guangxi Key Laboratory of Multi-Source Information Mining & Safety,Guangxi Normal University,Guilin,Guangxi 541004,China2
    Guangxi Regional Multi-Source Information Integration & Intelligent Processing Cooperation Innovation Center,Guilin,Guangxi 541004,China3
  • Received:2018-03-03 Online:2019-04-15 Published:2019-04-23

摘要: 蛋白质是完成重要生物活动所必需的分子。准确掌握蛋白质功能,将对生命科学研究及应用起到极大的促进作用。高通量技术的发展产生了海量的蛋白质序列,利用计算技术预测大规模蛋白质功能已成为当今生物信息学的核心任务之一。目前,作为蛋白质功能预测的研究热点,基于蛋白质相互作用网络的预测方法在降低数据噪声影响、充分利用网络拓扑特性及整合多源数据等方面仍不够完善。文中结合带阻力随机游走得到的全局拓扑相似度,及功能术语的语义相似度,设计了一种双加权投票蛋白质功能预测算法BiWV;并在此基础上整合了生物通路信息,提出了带生物通路的双加权投票算法——BiWV-P。在酿酒酵母和人类数据集上,对所提算法与TMC,UBiRW和ProHG 3种算法的预测效果进行对比分析。实验结果显示,算法BiWV和BiWV-P能够有效预测蛋白质功能,并在许多数据集上获得较其他算法更高的微正确率与微F1。

关键词: 蛋白质相互作用网络, 功能预测, 生物通路, 随机游走, 语义相似度

Abstract: Proteins are the essential molecules to accomplish important biological activities.It will greatly promote the advance of life science research and application to accurately grasp their functions.A tremendous amount of protein sequences has been generated with the development of high-throughput techniques.Thus,prediction of large-scale protein functions with computation technology has become one of the key tasks in bioinformatics today.Currently,the prediction method based on protein-protein interaction network,which is a research hotspot of protein function prediction,still has shortcomings at such aspects as reducing the impact of data noise,making full use of network topology characteristics,integrating multi-source data,and so on.In this paper,the Bi-Weighted Vote(BIWV) algorithm was proposed to predict protein functions,which combines the global topological similarity produced by Random Walk with Resistance (RWS) and the semantic similarity between terms.In addition,the Bi-Weighted Vote algorithm with pathway (BiWV-P) was presented by integrating the information of biological pathway.By using the data sets of saccharomyces cerevi-siae and homo sapiens,experiments were performed to compare TMC,UBiRW,ProHG,BiWV and BiWV-P.The experimental results indicate that BiWV algorithm and BiWV-P algorithm can predict protein functions effectively,and achieve higher micro-accuracy and micro-F1 than other algorithms in many data sets.

Key words: Biological pathway, Function prediction, Protein-protein interaction network, Random walk, Semantic similarity

中图分类号: 

  • TP391
[1]SCHWIKOWSKI B,UETZ P,FIELDS S.A network of protein-protein interactions in yeast[J].Nature Biotechnology,2000,18(12):1257-1261.
[2]HISHIGAKI H,NAKAI K,ONO T,et al.Assessment of prediction accuracy of protein function from protein-protein interaction data[J].Yeast,2001,18(6):523-531.
[3]CHUA H N,SUNG W K,WONG L.Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions[J].Bioinformatics,2006,22(13):1623-1630.
[4]CHRISTINE B,FRANÇOIS C,DAVID M,et al.Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network[J].Genome Biology,2003,5(1):6-18.
[5]NABIEVA E,JIM K,AGARWAL A,et al.Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps[J].Bioinformatics,2005,21(1):302-310.
[6]DENG M,TU Z,SUN F,et al.Mapping Gene Ontology to proteins based on protein-protein interaction data.[J].Bioinforma-tics,2004,20(6):895-902.
[7]VAZQUEZ A,FLAMMINI A,MARITAN A,et al.Global protein function prediction from protein-protein interaction,networks[J].Nature Biotechnology,2003,21(6):697-700.
[8]ZHANG X F,DAI D Q.A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms[J].IEEE/ACM Transactions on Computational Biology &Bioinformatics,2012,9(3):740-753.
[9]WANG H,HUANG H,DING C.Function-Function Correlated Multi-Label Protein Function Prediction over Interaction Networks[C]∥International Conference on Research in Computational Molecular Biology.Berlin:Springer,2012:302-313.
[10]YU G,ZHU H,DOMENICONI C.Predicting protein functions using incomplete hierarchical labels[J].BMC Bioinformatics,2015,16(1):1-12.
[11]PENG W,WANG J,CHEN L,et al.Predicting protein functions by using unbalanced bi-random walk algorithm on protein-protein interaction network and functional interrelationship network[J].Current Protein & Peptide Science,2014,15(6):529-539.
[12]YU G,RANGWALA H,DOMENICONI C,et al.Protein Function Prediction using Multi-label Ensemble Classification[J].IEEE/ACM Transactions on Computational Biology & Bioinformatics,2013,10(4):1045-1057.
[13]LIU J,WANG J,YU G.Protein Function Prediction by Random Walks on a Hybrid Graph[J].Current Proteomics,2016,13(2):130-142.
[14]PRASAD A,SAHA S,CHATTERJEE P,et al.Protein Function Prediction from Protein Interaction Network Using Bottom-up L2L Apriori Algorithm[C]∥International Conference on Computational Intelligence,Communications,and Business Analytics.Singapore:Springer,2017:3-16.
[15]LICHTENBERG U D,JENSEN L J,BRUNAK S,et al.Dynamic Complex Formation During the Yeast Cell Cycle[J].Science,2005,307(5710):724-727.
[16]XIONG W,LIU H,GUAN J,et al.Protein function prediction by collective classification with explicit and implicit edges in protein-protein interaction networks[J].BMC Bioinformatics,2013,14(Suppl 12):4-16.
[17]COZZETTO D,BUCHAN D W,BRYSON K,et al.Protein function prediction by massive integration of evolutionary analyses and multiple data sources[J].BMC Bioinformatics,2013,14 (Suppl 3):1-11.
[18]CAO M,PIETRAS C M,FENG X,et al.New directions for diffusion-based network prediction of protein function:incorporating pathways with confidence[J].Bioinformatics,2014,30(12):219-227.
[19]PENG W,LI M,CHEN L,et al.Predicting protein functions by using unbalanced random walk algorithm on three biological networks[J].IEEE/ACM transactions on computational biology and bioinformatics,2017,14(2):360-369.
[20]LEI C,RUAN J.A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity[J].Bioinformatics,2013,29(3):355-364.
[21]WANG J Z,DU Z,PAYATTAKOOL R,et al.A new method to measure the semantic similarity of GO terms[J].Bioinforma-tics,2007,23(10):1274-1281.
[22]XENARIOS I,RICE D W,SALWINSKI L,et al.DIP:the database of interacting proteins.[J].Nucleic Acids Research,2000,32(1):289-291.
[23]OGATA H,GOTO S,SATO K,et al.KEGG:Kyoto Encyclopedia of Genes and Genomes.[J].Nucleic Acids Research,2000,27(1):29-34.
[24]ASHBURNER M,BALL C J,BOTSTEIN D,et al.Gene ontology:tool for the unification of biology.The Gene Ontology Consortium[J].Nature Genetics,2000,25(1):25-29.
[25]CARY M P,BADER G D,SANDER C.Pathway information for systems biology[J].FEBS Letters,2005,579(8):1815-1820.
[26]CONSORTIUM U P.The Universal Protein Resource (Uni- Prot) in 2010[J].Nucleic Acids Research,2010,38(Database issue):142-148.
[27]BIRNEY E.Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt[J].Nature Protocols,2009,4(8):1184-1191.
[28]TENENBAUM D.Client-side REST access to KEGG[EB/OL].http://rpackages.ianhowson.com/bioc/KEGGREST.
[29]ZHANG M L,ZHOU Z H.A Review on Multi-Label Learning Algorithms[J].IEEE Transactions on Knowledge & Data Engineering,2014,26(8):1819-1837.
[30]周志华.机器学习.北京:清华大学出版社,2016:23-33.
[31]GILLIS J,PAVLIDIS P.The Impact of Multifunctional Genes on “Guilt by Association” Analysis[J/OL].http://www.oalib.com/paper/134869@.W-vO7ywYxAs.
[1] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[2] 李家文, 郭炳晖, 杨小博, 郑志明.
基于信息传播的致病基因识别研究
Disease Genes Recognition Based on Information Propagation
计算机科学, 2022, 49(1): 264-270. https://doi.org/10.11896/jsjkx.201100129
[3] 王胜, 张仰森, 陈若愚, 向尕.
基于细粒度差异特征的文本匹配方法
Text Matching Method Based on Fine-grained Difference Features
计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008
[4] 刘丹, 赵森, 颜志良, 赵静, 王会青.
基于堆叠自动编码器的miRNA-疾病关联预测方法
miRNA-disease Association Prediction Model Based on Stacked Autoencoder
计算机科学, 2021, 48(10): 114-120. https://doi.org/10.11896/jsjkx.200900169
[5] 戴彩艳, 何菊, 胡孔法, 丁有伟, 李新霞.
基于衰减系数建立动态蛋白质网络模型进行关键蛋白质预测
Establishment of Dynamic Protein Network Model Based on Attenuation Coefficient for Key Protein Prediction
计算机科学, 2020, 47(6A): 29-33. https://doi.org/10.11896/JsJkx.190800071
[6] 张云帆,周宇,黄志球.
基于语义相似度的API使用模式推荐
Semantic Similarity Based API Usage Pattern Recommendation
计算机科学, 2020, 47(3): 34-40. https://doi.org/10.11896/jsjkx.190300053
[7] 张虎, 周晶晶, 高海慧, 王鑫.
融合节点结构和内容的网络表示学习方法
Network Representation Learning Method on Fusing Node Structure and Content
计算机科学, 2020, 47(12): 119-124. https://doi.org/10.11896/jsjkx.190900027
[8] 杨壮, 刘培强, 费兆杰, 刘畅.
基于结构洞的多数据源融合关键蛋白质识别方法
Essential Protein Identification Method Based on Structural Holes and Fusion of Multiple Data Sources
计算机科学, 2020, 47(11A): 40-45. https://doi.org/10.11896/jsjkx.200200004
[9] 许飞翔,叶霞,李琳琳,曹军博,王馨.
基于SA-BP算法的本体概念语义相似度综合计算
Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm
计算机科学, 2020, 47(1): 199-204. https://doi.org/10.11896/jsjkx.181202351
[10] 赵倩倩,吕敏,许胤龙.
基于两种子结构感知的社交网络Graphlets采样估计算法
Estimating Graphlets via Two Common Substructures Aware Sampling in Social Networks
计算机科学, 2019, 46(3): 314-320. https://doi.org/10.11896/j.issn.1002-137X.2019.03.046
[11] 尹欣红, 赵世燕, 陈晓云.
带偏置的信号传播的随机游走的社团检测算法
Community Detection Algorithm Based on Random Walk of Signal Propagation with Bias
计算机科学, 2019, 46(12): 45-55. https://doi.org/10.11896/jsjkx.190700051
[12] 杨开平, 李明奇, 覃思义.
基于网络回复的律师评价方法
Lawyer Evaluation Method Based on Network Response
计算机科学, 2018, 45(9): 237-242. https://doi.org/10.11896/j.issn.1002-137X.2018.09.039
[13] 刘庆烽, 刘哲, 宋余庆, 朱彦.
基于约束随机游走的肿瘤图像分割方法
Tumor Image Segmentation Method Based on Random Walk with Constraint
计算机科学, 2018, 45(7): 243-247. https://doi.org/10.11896/j.issn.1002-137X.2018.07.042
[14] 肖迎元,张红玉.
基于用户潜在特征的社交网络好友推荐方法
Friend Recommendation Method Based on Users’ Latent Features in Social Networks
计算机科学, 2018, 45(3): 218-222. https://doi.org/10.11896/j.issn.1002-137X.2018.03.034
[15] 卿勇,刘梦娟,银盈,李杨曦.
SMART:一种面向电商平台快速消费品的图推荐算法
SMART:A Graph-based Recommendation Algorithm for Fast Moving Consumer Goods in E-commerce Platform
计算机科学, 2017, 44(Z11): 464-469. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.099
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!