计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 294-300.doi: 10.11896/j.issn.1002-137X.2019.02.045

所属专题: 生物信息学

• 交叉与前沿 • 上一篇    下一篇

一种基于同配性的重叠蛋白质复合体检测算法

王杰, 梁吉业, 赵兴旺, 郑文萍   

  1. 山西大学计算机与信息技术学院 太原030006
    山西大学计算智能与中文信息处理教育部重点实验室 太原 030006
  • 收稿日期:2018-09-26 出版日期:2019-02-25 发布日期:2019-02-25
  • 通讯作者: 梁吉业(1962-),男,博士,教授,CCF杰出会员,主要研究方向为粒计算、数据挖掘与机器学习,E-mail:ljy@sxu.edu.cn
  • 作者简介:王 杰(1988-),男,博士生,CCF会员,主要研究方向为数据挖掘与生物信息学,E-mail:xhcwj@sina.com;赵兴旺(1984-),男,博士生,CCF会员,主要研究方向为数据挖掘与机器学习;郑文萍(1979-),女,博士,副教授,CCF会员,主要研究方向为图论算法和生物信息学。
  • 基金资助:
    本文受国家自然科学基金项目(61876103,61603230),山西省重点研发计划项目(201603D111014)资助。

Overlapping Protein Complexes Detection Algorithm Based on Assortativity in PPI Networks

WANG Jie, LIANG Ji-ye, ZHAO Xing-wang, ZHENG Wen-ping   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006,China
  • Received:2018-09-26 Online:2019-02-25 Published:2019-02-25

摘要: 蛋白质复合体在生物过程中具有重要的作用,从蛋白质互作用网络中进行蛋白质复合体检测是后基因时代的一项具有挑战性的任务。种子扩展方法是一种从蛋白质互作用网络中进行重叠蛋白质复合体检测的有效技术。然而,现有方法面临两方面的问题:1)在选择种子结点时通常仅仅考虑了网络中结点的直接邻居之间的连接紧密度,难以充分体现结点在局部邻域子图内的重要性;2)在簇的扩展过程中假设候选结点之间是相互独立的,忽略了候选结点的添加顺序可能对聚类结果带来的影响。为了解决以上问题,文中基于生物网络同配性提出了一种重叠蛋白质复合体检测算法。该算法利用结点的二阶邻域信息来度量结点的重要性,进而选择种子结点,在簇扩展过程中利用同配性实现多个候选结点的批量添加。为了对重叠聚类结果进行评价,提出了一种重叠复合体评价指标F-overlap。与其他复合体检测算法在蛋白质互作用数据集上的对比实验结果表明,所提算法能够有效地进行重叠蛋白质复合体检测。

关键词: 蛋白质互作用网络, 复合体检测, 同配性, 种子扩展方法

Abstract: Protein complexes play significant roles in biological processes.The detection of protein complexes from available protein-protein interaction (PPI) networks is one of the most challenging tasks in the post-genome era.Seed expansion method is an effective clustering technique for overlapping protein complexes detection from PPI networks.However,existing methods are usually faced with two problems.One is that they only consider link density between direct neighbors of nodes in a network in the step of seed selection,which is not enough to indicate the importance of nodes in local subgraphs consisting of their neighborhoods.The other is that candidate nodes are assumed to be independent from each other,ignoring the impact of candidate nodes’ order on clustering in the process of cluster extension.To solve the problems,this paper proposed an overlapping protein complexes detection algorithm based on assortativity,which considers 2-order neighborhood of nodes in the process of seed selection,and multiple candidate nodes are added into clusters based on assortativity in networks in the process of cluster expansion.In order to evaluate overlapping results,a new evaluation index named F-overlap was presented.Experiment results on PPI networks show that the proposed algorithm can effectively identify overlapping protein complexes.

Key words: Assortativity, Complexes detection, Protein-protein interaction network, Seed expansion method

中图分类号: 

  • TP181
[1]SPIRIN V,MIRNY L A.Protein complexes and functional mo- dules in molecular networks [J].Proceedings of the National Academy of Sciences,2003,100(21):12123-12128.
[2]JI J Z,LIU Z J,LIU H X,et al.An overview of research on functional module detection for protein-protein interaction networks [J].Acta Automatica Sinica,2014,40(4):577-593.(in Chinese)
冀俊忠,刘志军,刘红欣,等.蛋白质相互作用网络功能模块检测的研究综述[J].自动化学报,2014,40(4):577-593.
[3]BRUN C,HERRMANN C,GUENOCHE A.Clustering proteins from interaction networks for the prediction of cellular functions [J].Bmc Bioinformatics,2004,5(1):95.
[4]YU L,GAO L,SUN P G.Research on algorithms for complexes and functional modules prediction in protein-protein interaction networks [J].Chinese Journal of Computers,2011,34(7):1239-1251.(in Chinese)
鱼亮,高琳,孙鹏岗.蛋白质网络中复合体和功能模块预测算法研究[J].计算机学报,2011,34(7):1239-1251.
[5]LI X L,WU M,KWOH C K,et al.Computational approaches for detecting protein complexes from protein interaction networks:A survey [J].Bmc Genomics,2010,11(1):S3.
[6]REN J,WANG J X,LI M.Identifying protein complexes based on local fitness method [C]∥ Proceedings of IEEE International Conference on Bioinformatics and Biomedicine.Piscataway,NJ:IEEE,2012:205-210.
[7]OUGHTRED R,CHATR-ARYAMONTRI A,BREITKREUTZ B J,et al.BioGRID:A resource for studying biological interactions in yeast [J].Cold Spring Harbor Protocols,2016,2016(1):080754.
[8]KESKIN O,TUNCBAG N,GURSOY A.Predicting protein- protein interactions from the molecular to the proteome level [J].Chemical Reviews,2016,116(8):4884.
[9]ASUR S,UCAR D,PARTHASARATHY S.An ensemble framework for clustering protein-protein interaction networks [J].Bioinformatics,2007,23(13):29-40.
[10]NEPUSZ T,YU H,PACCANARO A.Detecting overlapping protein complexes in protein-protein interaction networks [J].Nature Methods,2012,9(5):471-472.
[11]BHOWMICK S S,SEAH B S.Clustering and summarizing protein-protein interaction networks:A survey [J].IEEE Transactions on Knowledge and Data Engineering,2016,28(3):638-658.
[12]SHIH Y K,PARTHASARATHY S.Identifying functional mo- dules in interaction networks through overlapping Markov clustering [J].Bioinformatics,2012,28(18):473-479.
[13]WANG J,LIANG J Y,ZHENG W P.A graph clustering me- thod for detecting protein complexes [J].Journal of Computer Research and Development,2015,52(8):1784-1793.(in Chinese)
王杰,梁吉业,郑文萍.一种面向蛋白质复合体检测的图聚类方法[J].计算机研究与发展,2015,52(8):1784-1793.
[14]ADAMCSEK B,PALLA G,FARKAS I J,et al.CFinder:Locating cliques and overlapping modules in biological networks [J].Bioinformatics,2006,22(8):1021-1023.
[15]LIU G M,WONG L,CHUA H N.Complex discovery from weighted PPI networks [J].Bioinformatics,2009,25(15):1891-1897.
[16]BADER G D,HOGUE C W V.An automated method for fin- ding molecular complexes in large protein interaction networks [J].Bmc Bioinformatics,2003,4(1):2.
[17]ALTAF-UL-AMIN M,SHINBO Y,MIHARA K,et al.Deve- lopment and implementation of an algorithm for detection of protein complexes in large interaction networks [J].Bmc Bioinformatics,2006,7(1):207.
[18]WANG J,ZHENG W P,QIAN Y H,et al.A seed expansion graph clustering method for protein complexes detection in protein interaction networks [J].Molecules,2017,22(12):2179.
[19]LI M,CHEN J E,WANG J X,et al.Modifying the DPClus algorithm for identifying protein complexes based on new topological structures [J].Bmc Bioinformatics,2008,9(1):398.
[20]GAVIN A C,ALOY P,GRANDI P,et al.Proteome survey reveals modularity of the yeast cell machinery [J].Nature,2006,440(7084):631-636.
[21]LEUNG H C M,XIANG Q,YIU S M,et al.Predicting protein complexes from PPI data:A core-attachment approach [J].Journal of Computational Biology,2009,16(2):133-144.
[22]KOUHSAR M,ZARE-MIRAKABAD F,JAMALI Y.WCO- ACH:Protein complex prediction in weighted PPI networks [J].Genes and Genetic Systems,2015,90(5):317-324.
[23]NEWMAN M E J.Assortative mixing in networks [J].Physical Review Letters,2002,89(20):208701.
[24]NOLDUS R,VAN M P.Assortativity in complex networks [J].Journal of Complex Networks,2015,3(4):507-542.
[25]KUNEGIS J.Konect:The koblenz network collection [C]∥ Proceedings of the 22nd International Conference on World Wide Web.New York:ACM,2013:1343-1350.
[26]CHERRY J M,HONG E L,AMUNDSEN C,et al.Saccharomyces genome database:The genomics resource of budding yeast [J].Nucleic Acids Research,2011,40(1):700-705.
[27]LEE A J T,LIN M C,HSU C M.Mining dense overlapping subgraphs in weighted protein-protein interaction networks[J].Bio-Systems,2011,103(3):392-399.
[28]WHANG J J,GLEICH D F,DHILLON I S.Overlapping community detection using neighborhood-inflated seed expansion [J].IEEE Transactions on Knowledge and Data Engineering,2016,28(5):1272-1284.
[29]LUO J W,QI Y.Identification of essential proteins based on a new combination of local interaction density and protein complexes [J].Plos One,2015,10(6):0131418.
[30]QIN C,SUN Y Q,DONG Y D.A new method for identifying essential proteins based on network topology properties and protein complexes [J].Plos One,2016,11(8):0161042.
[31]TRAJANOVSKI S,MARTIN-HERNANDEZ J,WINTERBA- CH W,et al.Robustness envelopes of networks [J].Journal of Complex Networks,2013,1(1):44-62.
[32]CAO B W,LUO J W,LIANG C,et al.Moepga:A novel method to detect protein complexes in yeast protein-protein interaction networks based on multiobjective evolutionary programming genetic algorithm [J].Computational Biology and Chemistry,2015,58:173-181.
[33]GAVIN A C,BOSCHE M,KRAUSE R,et al.Functional orga- nization of the yeast proteome by systematic analysis of protein complexes [J].Nature,2002,415(6868):141-147.
[34]KROGAN N J,CAGNEY G,YU H,et al.Global landscape of protein complexes in the yeast Saccharomyces cerevisiae[J].Nature,2006,440(7084):637-643.
[1] 程章桃, 钟婷, 张晟铭, 周帆.
基于图学习的推荐系统研究综述
Survey of Recommender Systems Based on Graph Learning
计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[4] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[5] 孙晓寒, 张莉.
基于评分区域子空间的协同过滤推荐算法
Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace
计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062
[6] 刘卫明, 安冉, 毛伊敏.
基于聚类和WOA的并行支持向量机算法
Parallel Support Vector Machine Algorithm Based on Clustering and WOA
计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040
[7] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[8] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[9] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[10] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[11] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12] 陈章辉, 熊贇.
基于解耦-检索-生成的图像风格化描述生成模型
Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate
计算机科学, 2022, 49(6): 180-186. https://doi.org/10.11896/jsjkx.211100129
[13] 徐辉, 康金梦, 张加万.
基于特征感知的数字壁画复原方法
Digital Mural Inpainting Method Based on Feature Perception
计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[14] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[15] 罗俊仁, 张万鹏, 陆丽娜, 陈璟.
即时策略博弈在线对抗规划方法综述
Survey on Online Adversarial Planning for Real-time Strategy Game
计算机科学, 2022, 49(6): 287-296. https://doi.org/10.11896/jsjkx.210600168
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!