计算机科学 ›› 2015, Vol. 42 ›› Issue (7): 108-113.doi: 10.11896/j.issn.1002-137X.2015.07.023

• 2014’全国理论计算机科学年会 • 上一篇    下一篇

PPI网络的改进马尔科夫聚类算法

胡庆生 雷秀娟   

  1. 陕西师范大学计算机科学学院 西安710119
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金青年基金(61100164,61173190),教育部留学回国人员科研启动基金(教外司留[2012]1707号),中央高校基本科研业务费专项资金项目(GK201402035,GK201302025)资助

Improved MCL Clustering Algorithm in PPI Networks

HU Qing-sheng LEI Xiu-juan   

  • Online:2018-11-14 Published:2018-11-14

摘要: 蛋白质相互作用(PPI)网络是生物信息学的一个新的研究领域。近年来马尔科夫(MCL)聚类算法在未知蛋白质的功能模块预测方面发挥了重要作用,但是聚类质量不高,为此提出了一种基于突变因子和惩罚因子及重新定义解释聚类结果的MCL聚类算法。该算法采用惩罚因子,惩罚质量较大的吸引子;采用突变因子在算法后期断绝初始转移概率对转移概率的束缚。算法在PPI网络数据集上进行了测试,结果表明该算法不但可以抑制小类的产生,而且聚类结果的质量在Avg.F方面相对于基本MCL算法提高了13.1%。

关键词: MCL聚类算法,惩罚因子,突变因子,PPI网络

Abstract: Protein-protein interaction (PPI) network is a new research field in the bioinformatics.Recently MCL clustering algorithm has played an important role in the field of predicting the function of unknown proteins.However,the quality of the clustering result is low.An improved MCL clustering algorithm was proposed in this paper,in which the penalty and mutation factors are introduced.New structured regulation operation takes place of expansion operation in the basic MCL algorithm.Experiments were performed on PPI data.And the results show that the new algorithm can not only minimize the number of small clusters,but also improve F-measure and Avg.F value compared with the basic MCL algorithm.

Key words: MCL clustering algorithm,Penalty factor,Mutation factor,PPI network

[1] Von Mering C,Krause R,Snel B,et al.Comparative assessment of large-scale data sets of protein-protein interactions [J].Nature,2002,417:399-403
[2] Watts D J,Strogatz S H.Collective dynamics of ‘small-world’ networks [J].Nature,1998,339:440-442
[3] Barabasi A L,Oltvai Z N.Network biology:understanding the cell’s functional organization [J].Nature Review Genetics,2004,5(2):101-103
[4] MacQueen J.Some methods for classification and analysis ofmultivariate observations [C]∥Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley,University of California Press,1967:281-297
[5] Guo Hai-xiang,Zhu Ke-Jun,Gao Si-wei,et al.An improved genetic K-means algorithm for optimal clustering [C]∥Procee-dings of 6th IEEE International Conference on Data Mining Workshops.Washington DC:IEEE Computer Society,2006:793-797
[6] Zhang T,Ramakrishnan R,Livny M.An efficient data clustering method for very large databases [C]∥Proceeding ACM SIGMOD Conference on Management of Data.Montreal,Canada,1996:103-114
[7] Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in Large Spatial Databases with Noise [C]∥Proceedings of 2nd International Conference on Know-ledge Discovery and Data Mining (KDD’96).Portland,OR,1996:226-231
[8] Ben-Hur A,Horn D,Siegelmann T H,et al.Support vectorclustering [J].Journal of Machine Learning Research,2002,2:125-137
[9] Tipping M E.The Relevance Vector Machine [M]∥Advances in Neural Information Processing Systems 12.2000:652-658
[10] Dongen S V.Graph Clustering by Flow Simulation [D].Utrecht:University of Utrecht,2000
[11] Ruan Pei-ying,Hayashida M,Maruyama O,et al.Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels [C]∥BMC Bioinformatics.2014,15,S2:S6
[12] Satuluri V,Parthasarathy S.Scalable graph clustering using stochastic flows:applications to community discovery [C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA,2009:737-746
[13] Satuluri V,Parthasararthy S,Ucar D.Markov clustering of protein interaction networks with improved balance and scalability [C]∥Proceeding BCB’10 Proceedings of the 1th ACM International Conference on Bioinformatics and Computational Biology.New York,NY,USA,2010:247-256
[14] Shih Y K,Parthasarathy S.Identifying functional modules in interaction networks through overlapping Markov clustering [J].Bioinformatics,2012,8(18):i473-i479
[15] Shih Y K,Parthasarathy S.Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering [C]∥Proceeding BCB’13 Proceedings of the International Conference on Bioinformatics,Computational Biology and Biomedical Informatics.New York,NY,USA,2013:440
[16] 岑涌,罗林开.一种改善非平衡分布数据SVM分类能力的新策略[J].计算机与数学工程,2006,4(11):103-105 Cen Yong,Lou Lin-kai.A novel strategy for improving the performance of SVM classification for unbalance distribution data[J].Computer & Digital Engineering,2006,4(11):103-105
[17] 方霞,席金菊.基于变异和启发式选择的蚁群优化算法[J].计算机工程与应用,2013,9(24):24-27 Fang Xia,Xi Jin-ju.Ant colony algorithm based on mutation features and selected heuristic[J].Computer Engineering and Applications,2013,9(24):24-27
[18] Güldener U,Munsterkotter M,Kastenmuller G,et al.CYGD:the Comprehensive Yeast Genome Database [J].Nucleic Acids Research,2005,33(1):364-368
[19] Zhang Ai-dong.Protein Interaction Networks [M].UK:Cambridge University Press,2009
[20] 尤梦丽,雷秀娟.PPI 网络聚类的评价方法的研究与应用[J].计算机科学,2013,0(12):254-258 You Meng-li,Lei Xiu-juan.Study and application of evaluating methods of PPI network clustering[J].Computer Science,2013,0(12):254-258

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!