计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 313-321.doi: 10.11896/jsjkx.181102215

• 交叉与前沿 • 上一篇    下一篇

面向PCP-MS数据的PPI网络推断算法

陈征, 田博, 何增有   

  1. (大连理工大学软件学院 辽宁 大连116000)
  • 收稿日期:2018-11-29 出版日期:2019-12-15 发布日期:2019-12-17
  • 通讯作者: 何增有(1976-)男,博士,教授,博士师导师,CCF高级会员,主要研究方向为生物信息、机器学习、数据挖掘,E-mail:zyhe@dlut.edu.cn。
  • 作者简介:陈征(1994-),男,硕士生,主要研究方向为机器学习、数据挖掘、生物信息,E-mail:chenzheng_dut@163.com;田博(1994-),女,硕士生,主要研究方向为机器学习、数据挖掘、生物信息。
  • 基金资助:
    本文受国家自然科学基金项目(61572094)资助。

PPI Network Inference Algorithm for PCP-MS Data

CHEN Zheng, TIAN Bo, HE Zeng-you   

  1. (School of Software Technology,Dalian University of Technology,Dalian,Liaoning 116000,China)
  • Received:2018-11-29 Online:2019-12-15 Published:2019-12-17

摘要: 随着蛋白质组学的发展,研究者们开始聚焦于人类的全部蛋白质相互作用(Protein-Protein Interaction,PPI)网络的建立,质谱分析技术已成为预测蛋白质相互作用的代表方法。质谱技术是构建蛋白质相互作用网络的主要实验手段之一,基于质谱技术产生了大量的蛋白质纯化数据,如AP-MS数据和PCP-MS数据等。这些数据为PPI网络的构建提供了重要的数据支持,但是通过人工的手段来构建PPI网络不仅低效,而且很不现实。因此,面向PCP-MS数据的网络推断算法是生物信息学研究的一个热点问题。文中针对一类主流的质谱(PCP-MS)数据的PPI网络构建算法问题开展研究,从解决目前存在的瓶颈问题出发,达到构建高质量PPI网络的目的。现有的面向PCP-MS数据的PPI网络推断算法的研究还处于初级阶段,相关方法较少。同时,算法结果的质量还存在着一些问题:1)很多错误的相互作用被包含在不同的推断算法结果中,同时一些正确的相互作用在结果中被遗漏;2)不同的推断算法在同一数据集上的表现差异较大;3)对于不同的数据集,同一算法表现性能的波动方差较大。因此,为了从PCP-MS数据中推断出结构可靠、质量较高的PPI网络,文中提出一种基于相关性分析与排序整合的PPI评分方法。该方法基于无监督学习,包括以下两个步骤:1)计算蛋白质之间的相关系数,得到多组相关性结果;2)采用排序整合的方法对多组结果进行整合,得到整合后的PPI分数。实验结果表明,所提方法在不使用参考标准的情况下,可以达到与有监督学习方法接近的结果。

关键词: MS数据, PPI网络, 蛋白质直接相互作用, 排序整合, 相关性分析

Abstract: With the development of proteomics,scholars begin to pay more attention to the construction of Protein-Protein Interaction (PPI) network.Mass spectrometry(MS) has become a representative method for protein-protein inte-raction (PPI) inference,and it is one of the main experiment method to construct PPI network.Based on the technology of mass spectrometry,a large amount of experimental protein MS data is generated,such as affinity purification-mass spectrometry (AP-MS) data and protein correlation profiling-mass spectrometry (PCP-MS) data,which provide important data support for the construction of PPI networks,but constructing PPI networks by hand is impracticable and time consuming.Thus,PPI network inference algorithm for PCP-MS data has begun to become the research hotspot in bioinformatics.This thesis focused on the problem of PPI network inference for two main types of mass spectral data (AP-MS data and PCP-MS data),and designed effective methods respectively to solve the issue of current bottlenecks,achieving the construction of high-quality PPI network.The existing algorithms for PPI network interface from PCP-MS data are still in infancy,and there is a few of related algorithms.The existing method have several problem.Specifically:1)many error interaction is contained in the results produced by the different algorithms,and the correct interaction is omitted in the results.2)Different algorithms may produce very different results when they face the same data set.3)For different data sets,the performance variance of the same algorithm is larger.For the problem of PPI network inference for PCP-MS data,this paper proposed a PPI scoring method based on correlation analysis and rank aggregation.The method is based on unsupervised learning and includes two steps.Firstly,correlation coefficient between protein pairs is computed,and multiple results of PPI scores can be obtained.Secondly,multiple results for each pair of proteins are combininect via rank aggregation to a single PPI score.The experimental results show that this method is comparable with those supervised learning methods using standard reference set.

Key words: Correlation analysis, MS data, PPI network, Protein direct interaction, Rank aggregation

中图分类号: 

  • TP391.41
[1]GUAN W,WANG J,HE F C.The advance in research methods for large-scale protein-protein interactions [J].Chinese Bulletin of Life Sciences,2006,18(5):507-512.(in Chinese)
关薇,王建,贺福初.大规模蛋白质相互作用研究方法进展[J].生命科学,2006,18(5):507-512.
[2]KIM M S,PINTO S M,GETNET D,et al.A draft map of the human proteome [J].Nature,2014,509(7502):575-581.
[3]WILHELM M,SCHLEGL J,HAHNE H,et al.Mass-spec- trometry-based draft of the human proteome [J].Nature,2014,509(7502):582-587.
[4]BAKER M.Proteomics:The interaction map [J].Nature,2012,484(7393):271-275.
[5]MIRZAEI H,CARRASCO M.Modern Proteomics-Sample Preparation,Analysis and Practical Applications[M].Springer International Publishing,2016.
[6]MEHTA V,TRINKLE-MULCAHY L.Recent advances in large-scale protein interactome mapping[J].F1000research,2016,5:782.
[7]FAN S B,WU Y J,YANG B,et al.A New Approach to Protein Structure and Interaction Research:Chemical Cross-linking in Combination With Mass Spectrometry [J].Progress in Bioche-mistry and Biophysics,2014,41(11):1109-1125.(in Chinese)
樊盛博,吴妍洁,杨兵,等.蛋白质结构与相互作用研究新方法——交联质谱技术[J].生物化学与生物物理进展,2014,41(11):1109-1125.
[8]HUTTLIN E L,TING L,BRUCKNER R J,et al.The BioPlex Network:A Systematic Exploration of the Human Interactome.[J].Cell,2015,162(2):425-440.
[9]HUTTLIN E L,BRUCKNER R J,PAULO J A,et al.Architecture of the human interactome defines protein communities and disease networks:[J].Nature,2017,545(7655):505-509.
[10]BEHRENDS C,SOWA M E,GYGI S P,et al.Network organization of the human autophagy system[J].Nature,2010,466(7302):68-76.
[11]JÄGER S,CIMERMANCIC P,GULBAHCE N,et al.Global landscape of HIV-human protein complexes [J].Nature,2012,481(7381):365-370.
[12]SOWA M E,BENNETT E J,GYGI S P,et al.Defining the human deubiquitinating enzyme interaction landscape [J].Cell,2009,138(2):389-403.
[13]GURUHARSHA K G,RUAL J F,ZHAI B,et al.A protein complex network of Drosophila melanogaster [J].Cell,2011,147(3):690-703.
[14]TENG B,ZHAO C,LIU X,et al.Network inference from AP-MS data:computational challenges and solutions [J].Briefings in Bioinformatics,2015,16(4):658-674.
[15]CHEN B,FAN W,LIU J,et al.Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks [J].Briefings in Bioinformatics,2014,15(2):177-194.
[16]JI J,ZHANG A,LIU C,et al.Survey:Functional Module Detection from Protein-Protein Interaction Networks [J].IEEE Transactions on Knowledge & Data Engineering,2014,26(2):261-277.
[17]VARJOSALO M,SACCO R,STUKALOV A,et al.Interlaboratory reproducibility of large-scale human protein-complex analysis by standardized AP-MS [J].Nature Methods,2013,10(4):307-314.
[18]SHARAN R,ULITSKY I,SHAMIR R.Network-based prediction of protein function [J].Molecular Systems Biology,2007,3(1):88.
[19]BARABÁSI A L,GULBAHCE N,LOSCALZO J.Network medicine:a network-based approach to human disease [J].Nature Reviews Genetics,2011,12(1):56-68.
[20]TAYLOR I W,LINDING R,WARDE-FARLEY D,et al.Dynamic modularity in protein interaction networks predicts breast cancer outcome [J].Nature Biotechnology,2009,27(2):199-204.
[21]HE Z,YU W.Stable feature selection for biomarker discovery [J].Computational Biology and Chemistry,2010,34(4):215-225.
[22]NESVIZHSKII A I.Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments [J].Proteomics,2012,12(10):1639-1655.
[23]ARMEAN I M,LILLEY K S,TROTTER M W B.Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments [J].Molecular & Cellular Proteomics,2013,12(1):1-13.
[24]WAN C,BORGESON B,PHANSE S,et al.Panorama of ancient metazoan macromolecular complexes [J].Nature,2015,525(7569):339-344.
[25]HAVUGIMANA P C,HART G T,NEPUSZ T,et al.A census of human soluble protein complexes [J].Cell,2012,150(5):1068-1081.
[26]DE GELDER R,WEHRENS R,HAGEMAN J A.A generalized expression for the similarity of spectra:application to powder diffraction pattern classification [J].Journal of Computational Chemistry,2001,22(3):273-289 [27]TIAN B,DUAN Q,ZHAO C,et al.Reinforce:An Ensemble Approach for Inferring PPI Network from AP-MS Data [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2017,PP(99):1-1.
[28]KOLDE R,LAUR S,ADLER P,et al.Robust rank aggregation for gene list integration and meta-analysis [J].Bioinformatics,2012,28(4):573-580.
[29]STOREY J D.A direct approach to false discovery rates [J]. Journal of the Royal Statistical Society,2002,64(3):479-498.
[30]RUEPP A,WAEGELE B,LECHNER M,et al.CORUM:the comprehensive resource of mammalian protein complexes—2009 [J].Nucleic Acids Research,2009,38(suppl_1):D497-D501.
[1] 杨啸, 王翔坤, 胡浩, 朱敏.
面向设备状态监测的可视化技术综述
Survey on Visualization Technology for Equipment Condition Monitoring
计算机科学, 2022, 49(7): 89-99. https://doi.org/10.11896/jsjkx.210900167
[2] 陈莉莉, 朱峰, 盛斌, 陈志华.
基于离散四元数傅里叶变换的彩色图像质量评价
Quality Evaluation of Color Image Based on Discrete Quaternion Fourier Transform
计算机科学, 2018, 45(8): 70-74. https://doi.org/10.11896/j.issn.1002-137X.2018.08.012
[3] 胡庆生 雷秀娟.
PPI网络的改进马尔科夫聚类算法
Improved MCL Clustering Algorithm in PPI Networks
计算机科学, 2015, 42(7): 108-113. https://doi.org/10.11896/j.issn.1002-137X.2015.07.023
[4] 章月阳,刘维.
不确定性PPI网络链接预测
Link Prediction in Uncertain Protein-Protein Interaction Network
计算机科学, 2014, 41(Z11): 399-402.
[5] 赵美惠.
面向环境监测的无线传感器网络的数据流挖掘研究
Study on Mining Data Streams in WSNs for Environment Monitoring
计算机科学, 2012, 39(Z11): 111-113.
[6] 陈 红,刘光远,赖祥伟.
相关性分析和最大最小蚁群算法用于脉搏信号的情感识别
Affective Recognition from Pulse Signal Using Correlation Analysis and Max-Min Ant Colony Algorithm
计算机科学, 2012, 39(4): 250-253.
[7] 曹军,刘光远,赖祥伟.
量子粒子群和相关性分析在心电特征选择中的应用
Application of QPSO Algorithm and Correlation Analysis in Feature Selection from ECG Signal
计算机科学, 2012, 39(3): 212-215.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!