计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 40-45.doi: 10.11896/jsjkx.200200004

• 人工智能 • 上一篇    下一篇

基于结构洞的多数据源融合关键蛋白质识别方法

杨壮, 刘培强, 费兆杰, 刘畅   

  1. 山东工商学院计算机科学与技术学院 山东 烟台 264005
    山东省高等学校协同创新中心:未来智能计算 山东 烟台 264005
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 刘培强(liupq@126.com)
  • 作者简介:1083628707@qq.com
  • 基金资助:
    山东省自然科学基金(ZR2017MF049);烟台市重点研发计划项目(2017ZH065)

Essential Protein Identification Method Based on Structural Holes and Fusion of Multiple Data Sources

YANG Zhuang, LIU Pei-qiang, FEI Zhao-jie, LIU Chang   

  1. School of Computer Science and Technology,Shandong Technology and Business University,Yantai,Shandong 264005,China
    Co-innovation Center of Shandong Colleges and Universities:Future Intelligent Computing,Yantai,Shandong 264005,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:YANG Zhuang,born in 1992,MS.His main research interests include algorithms and complexity theory,and computational biology.
    LIU Pei-qiang,born in 1970,Ph.D,professor,is a member of China Computer Federation.His main research interests include algorithms and complexity theory,and computational biology.
  • Supported by:
    This work was supported by the Shandong Provincial Natural Science Foundation(ZR2017MF049) and Key Research and Development Program of Yantai City(2017ZH065).

摘要: 关键蛋白质识别是当前计算生物学领域的一个研究热点和难点。通过计算方法识别关键蛋白质的方法主要有DC,BC,LAC,PeC,ION和LIDC等。现有方法的识别准确率还有待进一步提高,主要原因是其仅使用了蛋白质相互作用网络单一数据源,以及蛋白质相互作用网络中存在许多假阳性和假阴性数据等。为了提高识别准确率,提出一种高效识别方法PSHC。首先,PSHC方法首次把结构洞理论引入到关键蛋白质识别方法中;其次,融合了蛋白质相互作用网络和蛋白质复合物两种数据源用于识别关键蛋白质。在真实数据上的实验结果表明,与其他传统方法相比,PSHC方法可以识别更多关键蛋白质,并且敏感度、特异性、准确性、阳性预测值、阴性预测值、F测度等统计指标也明显高于其他方法。

关键词: 蛋白质复合物, 蛋白质相互作用网络, 关键蛋白质, 结构洞

Abstract: Essential protein identification is a hot research topic which is difficult in the field of computational biology.The exis-ting methods for identifying essential proteins by computational methods are mainly DC,BC,LAC,PeC,ION,and LIDC,yet the identification accuracy needs to be further improved,mainly because only one data source is used which is protein interaction network,and there are many false positive and false negative data in the network.In order to improve the identification accuracy,an efficient essential protein identification method PSHC is proposed.Firstly,the PSHC method introduced the structure hole theory into the essential protein identification method for the first time.Secondly,the PSHC method combines two data sources of protein interaction network and protein complex to identify the essential proteins.Experimental results on real data show that PSHC can identify more essential proteins than other traditional methods,and statistical indicators such as sensitivity,specificity,accuracy,positive predictive value,negative predictive value,and F-measure are also higher than other methods.

Key words: Essential proteins, Protein complex, Protein interaction network, Structural holes

中图分类号: 

  • TP301
[1] PÁL C,PAPP B.Genomic function:Rate of evolution and gene dispensability[J].Nature,2003,421(6922):496-497.
[2] CLATWORTHY A E,PIERSON E,HUNG D T.Targeting viru-lence:a new paradigm for antimicrobial therapy[J].Nat Chem Biol,2007,3(9):541-548.
[3] LAMICHHANE G,ZIGNOL M,BLADES N J.A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis:Application to Mycobacterium tuberculosis[J].PNAS,2003,100(12):7213-7218.
[4] STEINMETZ L M,SCHARFE C,DEUTSCHBAUER A M,et al.Systematic screen for human disease genes in yeast[J].Nat Genet,2002,31(4):400-404.
[5] GIAEVER G,CHU A M,LI N.Functional profiling of theSaccharomyces cerevisiae genome[J].Nature,2002,418(6896):387.
[6] ROEMER T,JIANG B,DAVISON J,et al.Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery[J].Mol Microbiol,2003,50(1):167-181.
[7] CULLEN L M,ARNDT G M.Genome-wide screening for gene function using RNAi in mammalian cells[J].Immunol Cell Biol,2005,83(3):217-223.
[8] ITO T,CHIBA T,OZAWA R,et al.A comprehensive two-hybrid analysis to explore the yeast protein interactome[J].Proceedings of the National academy of Sciences of the United States of America,2001,98(8):4569-4574.
[9] AEBERSOLD R,MANN M.Mass spectrometry-based pro-teomics[J].Nature,2003,422(6928):198-207.
[10] HO Y,GRUHLER A,BADER G D,et al.Systematic identi©cation of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J].Nature,2002,415(6868):180-183.
[11] H J,SP M,AL B.Lethality and centrality in protein networks[J].Nature,2001,411(6833):41-42.
[12] HAHN M W,KERN A D.Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks[J].Mol Biol Evol,2005,22(4):803-806.
[13] JOY M P,BROCK A,INGBER D E,et al.High-betweennessproteins in the yeast protein interaction network[J].J Biomed Biotechnol,2005,2005(2):96-103.
[14] ESTRADA E,RODRIGUEZ-VELAZQUEZ J A.Subgraph centrality in complex networks[J].Phys Rev E Stat Nonlin Soft Matter Phys,2005,71(5 Pt 2):056103.
[15] WUCHTY S,STADLER P F.Centers of complex networks[J].Journal of Theoretical Biology,2003,223(1):45-53.
[16] STEPHENSON K,ZELEN M.Rethinking centrality:Methods and examples[J].Social Networks,1989,11(1):1-37.
[17] BONACICH P.Power and Centrality:A Family of Measures[J].American Journal of Sociology,1987,92(5):1170-1182.
[18] LI M,WANG J,CHEN X,et al.A local average connectivity-based method for identifying essential proteins from the network level[J].Comput Biol Chem,2011,35(3):143-150.
[19] WANG J,LI M,WANG H,et al.Identification of essential proteins based on edge clustering coefficient[J].IEEE/ACM Trans Comput Biol Bioinform,2012,9(4):1070-1080.
[20] QI Y,LUO J.Prediction of Essential Proteins Based on Local Interaction Density[J].IEEE/ACM Trans Comput Biol Bioinform,2016,13(6):1170-1182.
[21] LI M,ZHANG H,WANG J X,et al.A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data[J].Bmc Systems Biology,2012,6(1):15.
[22] ZHAO B,ZHAO Y,ZHANG X,et al.An iteration method foridentifying yeast essential proteins from heterogeneous network[J].BMC Bioinformatics,2019,20(1):355-368.
[23] LUO J,QI Y.Identification of Essential Proteins Based on aNew Combination of Local Interaction Density and Protein Complexes[J].PLoS One,2015,10(6):e0131418.
[24] QIN C,SUN Y,DONG Y.A New Method for Identifying Essential Proteins Based on Network Topology Properties and Protein Complexes[J].PLoS One,2016,11(8):e0161042.
[25] ZHANG X,XIAO W,HU X.Predicting essential proteins by integrating orthology,gene expressions,and PPI networks[J].PLoS One,2018,13(4):e0195410.
[26] LI M,LU Y,NIU Z,et al.United Complex Centrality for Identification of Essential Proteins from PPI Networks[J].IEEE/ACM Trans Comput Biol Bioinform,2017,14(2):370-380.
[27] LEI X,YANG X.A new method for predicting essential proteins based on participation degree in protein complex and subgraph density[J].PLoS One,2018,13(6):e0198998.
[28] LI M,LI W,WU F X,et al.Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information[J].Journal of Theoretical Biology,2018,447:65-73.
[29] LEI X,YANG X,FUJITA H.Random walk based method to identify essential proteins by integrating network topology and biological characteristics[J].Knowledge-Based Systems,2019,167:53-67.
[30] BURT R S.Structural Holes:The Social Structure of Competition[M].Harvard University Press,2009.
[31] IOANNIS X,LUKASZ S,JOYCE D X.DIP,the database of interacting proteins:a research tool for studying cellular networks of protein interactions[J].Nucleic Acids Research,2002,30(1):303-305.
[32] KROGAN N J,CAGNEY G,YU H,et al.Global landscape of protein complexes in the yeast Saccharomyces cerevisiae[J].Nature,2006,440(7084):637-643.
[33] MEWES H W,DIETMANN S,FRISHMAN D,et al.MIPS:analysis and annotation of genome information in 2007[J].NucleicAcids Res,2008,36:196-201.
[34] MICHAEL C J,CAROLINE A,CATHERINE B.SGD:saccharomyces genome database[J].Nucleic Acids Research,1998,26(1):73-79.
[35] ZHANG R,LIN Y.DEG 5.0,a database of essential genes in both prokaryotes and eukaryotes[J].Nucleic Acids Res,2009,37:455-458.
[36] DE MATTEIS G,GRAUDENZI A,ANTONIOTTI M.A review of spatial computational models for multi-cellular systems,with regard to intestinal crypts and colorectal cancer development[J].J Math Biol,2013,66(7):1409-1462.
[1] 刘文洋, 郭延哺, 李维华.
识别关键蛋白质的混合深度学习模型
Identifying Essential Proteins by Hybrid Deep Learning Model
计算机科学, 2021, 48(8): 240-245. https://doi.org/10.11896/jsjkx.200700076
[2] 戴彩艳, 何菊, 胡孔法, 丁有伟, 李新霞.
基于衰减系数建立动态蛋白质网络模型进行关键蛋白质预测
Establishment of Dynamic Protein Network Model Based on Attenuation Coefficient for Key Protein Prediction
计算机科学, 2020, 47(6A): 29-33. https://doi.org/10.11896/JsJkx.190800071
[3] 唐家琪, 吴璟莉, 廖元秀, 王金艳.
基于双加权投票的蛋白质功能预测
Prediction of Protein Functions Based on Bi-weighted Vote
计算机科学, 2019, 46(4): 222-227. https://doi.org/10.11896/j.issn.1002-137X.2019.04.035
[4] 王珍,韩忠明,李晋.
大规模数据下的社交网络结构洞节点发现算法研究
Research on Social Network Structural Holes Discovery Algorithm under Large-scale Data
计算机科学, 2017, 44(4): 188-192. https://doi.org/10.11896/j.issn.1002-137X.2017.04.041
[5] 洪海燕,刘维.
基于改进的PSO算法的关键蛋白质识别方法研究
Research on Essential Protein Identification Method Based on Improved PSO Algorithm
计算机科学, 2017, 44(10): 38-44. https://doi.org/10.11896/j.issn.1002-137X.2017.10.007
[6] 洪海燕,刘维.
基于空间映射的蛋白质相互作用网络链接预测算法
Link Prediction Algorithm in Protein-Protein Interaction Network Based on Spatial Mapping
计算机科学, 2016, 43(Z6): 413-417. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.098
[7] 洪海燕,刘维.
基于PPI网络的关键蛋白质的高效预测算法
Efficient Prediction Method of Essential Proteins Based on PPI Network
计算机科学, 2016, 43(Z11): 16-20. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.004
[8] 赵碧海,熊慧军,倪问尹,刘志兵,胡赛.
一种改进的基于加权网络的蛋白质复合物识别算法
Improved Weighted-network Based Algorithm for Predicting Protein Complexes
计算机科学, 2014, 41(6): 231-234. https://doi.org/10.11896/j.issn.1002-137X.2014.06.045
[9] 尤梦丽,雷秀娟.
PPI网络聚类的评价方法的研究与应用
Study and Application of Evaluating Methods of PPI Network Clustering
计算机科学, 2013, 40(12): 254-258.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!