计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 102-114.doi: 10.11896/jsjkx.250900062

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于超图网络嵌入的蛋白质复合体识别算法

王杰1, 杨贤灿1, 赵兴旺2   

  1. 1 山西财经大学信息学院 太原 030006
    2 山西大学计算机与信息技术学院 太原 030006
  • 收稿日期:2025-09-09 修回日期:2025-11-10 出版日期:2025-12-15 发布日期:2025-12-09
  • 通讯作者: 杨贤灿(937741973@qq.com)
  • 作者简介:(20191031@sxufe.edu.cn)
  • 基金资助:
    国家自然科学基金(62006145);山西省基础研究计划(202303021212169)

Protein Complex Identification Algorithm Based on Hypergraph Network Embedding

WANG Jie1, YANG Xiancan1, ZHAO Xingwang2   

  1. 1 School of Information, Shanxi University of Finance and Economics, Taiyuan 030006, China
    2 School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
  • Received:2025-09-09 Revised:2025-11-10 Published:2025-12-15 Online:2025-12-09
  • About author:WANG Jie,born in 1988,Ph.D,asso-ciate professor,is a member of CCF(No.N2805M).His main research interests include data mining and bioinformatics,etc.
    YANG Xiancan,born in 2000,master,is a member of CCF(No.T8922G).His main research interests include data mining and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(62006145) and Fundamental Research Program of Shanxi Province(202303021212169).

摘要: 蛋白质复合体在细胞生物学过程中起着关键作用,对理解细胞功能和生物过程的识别至关重要。在蛋白质-蛋白质相互作用(Protein-Protein Interaction,PPI)网络中采用网络聚类识别蛋白质复合体已经成为数据挖掘与生物信息学的研究热点,各种计算方法被提出用于识别蛋白质复合体。然而,大多数方法仅利用原始网络来挖掘密集子图或子网络,未能突破传统图结构对多节点交互关系的局限。针对生物网络中普遍存在的多对多复杂交互特性问题,提出基于超图网络嵌入的蛋白质复合体识别算法(Protein Complex Identification Method Based on Hypergraph Network Embedding,PCIHNE)。该算法首先利用超图网络对多元关系的直接建模能力,将原始PPI网络转换为超图网络。其次,对超图网络采用分层压缩策略递归地压缩为多个不同层次的较小超图,以此构建多尺度分析框架。再次,将超图卷积应用于不同层次,得到每个节点在不同尺度下的表示。将这些节点表示进行连接,得到完整的节点嵌入表示。基于节点嵌入表示,在低阶原始网络上构建加权PPI网络。最后,在加权PPI网络上采用基于核心附属策略,得到预测的蛋白质复合体。在多个酵母和人类真实的数据集上将所提算法与其他蛋白质复合体识别算法进行比较,实验结果表明,所提方法在F-measure和Accuracy指标上取得了较好的蛋白质复合体识别性能。

关键词: 蛋白质相互作用网络, 蛋白质复合体, 超图, 网络嵌入, 网络聚类

Abstract: Protein complexes are crucial for understanding cellular functions and identifying biological processes,playing critical roles in cell biology.The use of network clustering in PPI networks to identify protein complexes has become a hot research topic in data mining and bioinformatics.A variety of computational methods have emerged to identify protein complexes.However,most existing algorithms primarily use original network to detect dense subnetworks and fail to break through the limitations of traditional graph structures for multi-node interactions.Aiming at the issue of many-to-many complex interaction characteristics prevalent in biological networks,this paper proposes a novel protein complex identification method based on hypergraph network embedding(PCIHNE).Through the ability of hypergraph networks,it firstly converts the original PPI network into a hypergraph network.Then a hierarchical compression strategy recursively compresses the hypergraph into multiple smaller hypergraphs at different levels,thereby constructing a multi-scale analysis framework.Next,hypergraph convolution is performed on each levels to generate node representations at different granularities.These node representations are concatenated to obtain the complete node representation.Based on the representations obtained from hypergraph learning,a weighted PPI network is constructed by similarity on the original network.Finally,a core-attachment based strategy is used to obtain predicted protein complexes in the weighted PPI network.It evaluates the effectiveness of PCIHNE by comparing it with other protein complex algorithms on multiple yeast and human datasets.Experimental results demonstrate that PCIHNE is better than comparison protein complex identification methods regarding F-measure and Accuracy metrics.

Key words: Protein-protein interaction network, Protein complexes, Hypergraphs, Network embedding, Network clustering

中图分类号: 

  • TP399
[1]ZHANG Y,JIAK B,ZHANGA D.Consistent protein functional module detection from multi-view of biological data[J].Acta Electronica Sinica,2014,42(12):2337-2344.
[2]WU Z,WANG Y,CHEN L.Network-based drug repositioning[J].Molecular BioSystems,2013,9(6):1268-1281.
[3]GÖBL C,MADL T,SIMON B,et al.NMR approaches for structural analysis of multidomain proteins and complexes in solution[J].Progress in Nuclear Magnetic Resonance Spectroscopy,2014,80:26-63.
[4]WALZTHOENI T,LEITNER A,STENGEL F,et al.Massspectrometry supported determination of protein complex structure[J].Current Opinion in Structural Biology,2013,23(2):252-260.
[5]ALBERTS B.The cell as a collection of protein machines:preparing the next generation of molecular biologists[J].Cell,1998,92(3):291-294.
[6]DUNHAM B,GANAPATHIRAJU M K.Benchmark evaluation of protein-protein interaction prediction algorithms[J].Molecules,2021,27(1):41.
[7]HUA Y,LI J X,FENG Z H,et al.Protein-drug interaction pre-diction based on attention feature fusion[J].Journal of Compu-ter Research and Development,2022,59(9):2051-2065.
[8]CAO H T,CHEN J.Prediction of multitype protein interactions combining Doc2vec and GCN[J].CAAI Transactions on Intelligent Systems,2023,18(6):1165-1172.
[9]LI Z J,CHEN Y M,LIU J W,et al.A survey of computational method in protein-protein interaction research[J].Journal of Computer Research and Development,2008,45(12):2129-2137.
[10]PAN Y L,GUAN J H,YAO H,et al.Computational methods for protein complex prediction:A survey[J].Journal of Frontiers of Computer Science and Technology,2022,16(1):1-20.
[11]GAO Y,FENG Y,JI S,et al.HGNN+:General hypergraphneural networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(3):3181-3199.
[12]SHANG J L,ZHANG Z Y,QU W W,et al.Survey of graphpartitioning techniques for distributed graph computing[J].Journal of Computer Research and Development,2025,62(1):90-103.
[13]VLASBLOM J,WODAK S J.Markov clustering versus affinity propagation for the partitioning of protein interaction graphs[J].BMC Bioinformatics,2009,10(1):1-14.
[14]SHIH Y K,PARTHASARATHY S.Identifying functionalmodules in interaction networks through overlapping Markov clustering[J].Bioinformatics,2012,28(18):473-479.
[15]LYU J,YAO Z,LIANG B,et al.Small protein complex prediction algorithm based on protein-protein interaction network segmentation[J].BMC Bioinformatics,2022,23(1):1-20.
[16]WANG C,WANG R,JIANG K.Amethod for detecting overlapping protein complexes based on an adaptive improved FCM clustering algorithm[J].Mathematics,2025,13(2):196.
[17]NEPUSZ T,YU H,PACCANARO A.Detecting overlappingprotein complexes in protein-proteininteraction networks[J].Nature Methods,2012,9(5):471-472.
[18]ALTAF-UL-AMIN M,SHINBO Y,MIHARA K,et al.Deve-lopment and implementation of an algorithm for detection of protein complexes in large interaction networks[J].BMC Bioinformatics,2006,7(1):1-13.
[19]SAHOO T R,PATRA S,VIPSITA S.Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks[J].Computational Biology and Chemistry,2023,106:107935.
[20]GAVIN A C,ALOY P,GRANDI P,et al.Proteome survey reveals modularity of the yeast cell machinery[J].Nature,2006,440(7084):631-636.
[21]LEUNG H C M,XIANG Q,YIU S M,et al.Predicting protein complexes from PPI data:a core-attachment approach[J].Journal of Computational Biology,2009,16(2):133-144.
[22]WU M,LI X,KWOH C K,et al.A core-attachment based me-thod to detect protein complexes in PPI networks[J].BMC Bioinformatics,2009,10(1):1-16.
[23]KOUHSAR M,ZARE-MIRAKABAD F,JAMALI Y.WCO-ACH:protein complex prediction in weighted PPI networks[J].Genes & Genetic Systems,2015,90(5):317-324.
[24]PENG W,WANG J,ZHAO B,et al.Identification of protein complexes using weighted pagerank-nibble algorithm and core-attachment structure[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,12(1):179-192.
[25]MUKHOPADHYAY A,RAY S,MAULIK U,et al.Multiobjective approach to protein complex detection[M]//MultiobjectiveOptimization Algorithms for Bioinformatics.Singapore:Sprin-ger,2024:171-193.
[26]WANG J,LIANG J Y,ZHAO X W,et al.Overlapping protein complexes detection algorithm based on assortativity in PPI networks[J].Computer Science,2019,46(2):294-300.
[27]WANG J,JIA Y,SANGAIAH A K,et al.A network clustering algorithm for protein complex detection fused with power-Law distribution characteristic[J].Electronics,2023,12(14):3007.
[28]XU M.Understanding graph embedding methods and their applications[J].Siam Review,2021,63(4):825-853.
[29]WANG R,MA H,WANG C.An ensemble learning framework for detecting protein complexes from PPI networks[J].Frontiers in genetics,2022,13:839949.
[30]MENG X,XIANG J,ZHENG R,et al.DPCMNE:Detecting protein complexes from protein-protein interaction networks via multi-level network embedding[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2021,19(3):1592-1602.
[31]CHEN H,CAI Y,JI C,et al.AdaPPI:Identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network[J].Briefings in Bioinformatics,2023,24(1):523.
[32]WANG S,CUI H,QU Y,et al.Multi-source biological know-ledge-guided hypergraph spatiotemporal subnetwork embedding for protein complex identification[J].Briefings in Bioinforma-tics,2025,26(1):718.
[33]XIA S,LI D,DENG X,et al.Integration of protein sequence and protein-protein interaction data by hypergraph learning to identify novel protein complexes[J].Briefings in Bioinformatics,2024,25(4):274.
[34]FU G,HOU C,YAO X.Learning topological representation for networks via hierarchical sampling[C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8.
[35]KUMAR T,VAIDYANATHAN S,ANANTHAPADMANABHAN H,et al.Hypergraph clustering by iteratively reweighted modularity maximization[J].Applied Network Science,2020,5(1):52.
[36]XIANG N,YOU M,WANG Q,et al.Hypergraph network embedding for community detection[J].The Journal of Supercomputing,2024,80(10):14180-14202.
[37]GAVIN A C,BÖSCHE M,KRAUSE R,et al.Functional orga-nization of the yeast proteome by systematic analysis of protein complexes[J].Nature,2002,415(6868):141-147.
[38]GAVIN A C,ALOY P,GRANDI P,et al.Proteome survey reveals modularity of the yeast cell machinery[J].Nature,2006,440(7084):631-636.
[39]KROGAN N J,CAGNEY G,YU H,et al.Global landscape of protein complexes in the yeast Saccharomyces cerevisiae[J].Nature,2006,440(7084):637-643.
[40]XENARIOS I,SALWINSKI L,DUAN X J,et al.DIP,thedatabase of interacting proteins:a research tool for studying cellular networks of protein interactions[J].Nucleic Acids Research,2002,30(1):303-305.
[41]STARK C,BREITKREUTZ B J,REGULY T,et al.BioGRID:A general repository for interaction datasets[J].Nucleic Acids Research,2006,34(1):535-539.
[42]SZKLARCZYK D,GABLE A L,LYON D,et al.STRING v11:Protein-protein association networks with increased coverage,supporting functional discovery in genome-wide experimental datasets[J].Nucleic Acids Research,2019,47(1):607-613.
[43]PU S,WONG J,TURNER B,et al.Up-to-date catalogues ofyeast protein complexes[J].Nucleic Acids Research,2009,37(3):825-831.
[44]BROHEE S,VAN HELDEN J.Evaluation of clustering algo-rithms for protein-protein interaction networks[J].BMC Bioinformatics,2006,7(1):1-19.
[45]GIURGIU M,REINHARD J,BRAUNER B,et al.CORUM:The comprehensive resource of mammalian protein complexes-2019[J].Nucleic Acids Research,2019,47(1):559-563.
[46]IVAZEH A,ZAHIRI J,RAHGOZAR M,et al.Performanceevaluation measures for protein complex prediction[J].Geno-mics,2019,111(6):1483-1492.
[47]OMRANIAN S,ANGELESKA A,NIKOLOSKI Z.PC2P:Parameter-free network-based prediction of protein complexes[J].Bioinformatics,2021,37(1):73-81.
[48]ZOU M,GAN Z,WANG Y,et al.Unig-encoder:A universal feature encoder for graph and hypergraph node classification[J].Pattern Recognition,2024,147:110115.
[49]LIU Z,TANG B,YE Z,et al.Hypergraph transformer for semi-supervised classification[C]//ICASSP 2024-2024 IEEEInternational Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2024:7515-7519.
[50]WANG R,LIU G,WANG C.Identifying protein complexesbased on an edge weight algorithm and core-attachment structure[J].BMC Bioinformatics,2019,20(1):1-20.
[51]HANNA E M,ZAKI N.Detecting protein complexes in protein interaction networks using a ranking algorithm with a refined merging procedure[J].BMC Bioinformatics,2014,15(1):1-11.
[52]JIANG P,SINGH M.SPICi:a fast clustering algorithm for large biological networks[J].Bioinformatics,2010,26(8):1105-1111.
[53]LIU G,WONG L,CHUA H N.Complex discovery from weighted PPI networks[J].Bioinformatics,2009,25(15):1891-1897.
[54]LI M,CHEN J,WANG J,et al.Modifying the DPClus algorithm for identifying protein complexes based on new topological structures[J].BMC Bioinformatics,2008,9(1):1-16.
[55]BADER G D,HOGUE C W V.An automated method for fin-ding molecular complexes in large protein interaction networks[J].BMC Bioinformatics,2003,4(1):1-27.
[56]ZAKI N,EFIMOV D,BERENGUERES J.Protein complex detection using interaction reliability assessment and weighted clustering coefficient[J].BMC Bioinformatics,2013,14(1):1-9.
[57]PELLEGRINI M,BAGLIONI M,GERACI F.Protein complex prediction for large protein protein interaction networks with the Core&Peel method[J].BMC Bioinformatics,2016,17(12):37-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!