计算机科学 ›› 2014, Vol. 41 ›› Issue (2): 82-86.

• CCML 2013 • 上一篇    下一篇

SVM与主动学习方法相结合的蛋白质相互作用预测

史文丽,郭茂祖,李晋,刘晓燕   

  1. 哈尔滨工业大学计算机科学与技术学院 哈尔滨150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨150001
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(60932008,61172098,61271346),高等学校博士学科点专项科研基金(20112302110040)资助

Protein-protein Interaction Prediction Combining Active Learning with SVM

SHI Wen-li,GUO Mao-zu,LI Jin and LIU Xiao-yan   

  • Online:2018-11-14 Published:2018-11-14

摘要: 提出了基于SVM的主动学习算法,用来解决蛋白质相互作用的预测问题。细胞中的生物过程是通过蛋白质相互作用实现的。但是通过实验验证蛋白质之间是否具有相互作用的代价非常大,而且数据很难获取。为了在有限的阳性样本情况下更加快速准确地预测蛋白质之间是否具有相互作用,引入了主动学习方法。主动学习算法可以用来构造有效训练集,其目标是通过迭代抽样,每次寻找最富有信息量的数据点,找到最有利于提升分类效果的样本,进而减小分类训练集的大小。比较了5种不同的主动学习算法,以寻找在有限资源前提下提高分类算法效率的最佳途径。实验表明,主动学习方法与SVM算法相结合,能够在保证SVM分类性能的前提下,有效减少学习所需的样本数量。

关键词: 支持向量机,主动学习,蛋白质相互作用 中图法分类号TP18文献标识码A

Abstract: An active learning method using SVM was introduced in this paper to solve the problem of protein-protein interaction prediction task.Biological processes in cells are carried out through protein-protein interactions.Since determining whether a pair of genes interacts by wet-lab experiments is resource-intensive,we proposed a support vector machine active learning algorithm for interaction prediction.Active machine learning can guide the selection of pairs of genes for future experimental characterization in order to accelerate accurate prediction of the human gene interactome.As a method of constructing an effective training set,the goal of active learning algorithm is to find informative sample which can enhance the classification results of the model during the iteration,thereby reducing the size of the training set and improving the efficiency of the model within limited time and resources.The experiment shows that compared with the general SVM,active learning with SVM can reduce the number of examples effectively on the premise of keeping correctness of the classifier.

Key words: Support vector machine,Active learning,Protein-protein interaction

[1] Mohamed T,Tarun S,Madhavi K G.An efficient heuristicmethod for active feature acquisition and its application to protein-protein interaction prediction[J].BMC Proceedings,2012,6(Suppl 7):S2
[2] Deane C M,Salwinski L,Xenarios I,et al.Protein interactions:two methods for assessment of the reliability of high throughput observations[J].Mol Cell Proteomics,2002,1(5):349-356
[3] von Mering C,Krause R,Snel B,et al.Comparative assessmentof large-scale data sets of protein-protein interactions[J].Nature,2002,417(6887):399-403
[4] Ito T,Tashiro K,Muta S,et al.Toward a protein-protein interaction map of the budding yeast:A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins[J].PNAS,2000,97(3):1143-1147
[5] Jansen R,Yu H,Greenbaum D et al.A Bayesian networks approach for predicting protein-protein interactions from genomic data[J].Science,2003,302(5644):449-453
[6] Qi Y,Bar-Joseph Z,Klein-Seetharaman J.Evaluation of different biological data and computational classification methods for use in protein interaction prediction[J].Proteins,2006,63(3):490-500
[7] Lin N,Wu B,Jansen R,et al.Information assessment on predicting protein-protein interactions[J].BMC Bioinformatics,2004,5:154
[8] DeBarr D,Wechsler H.Spam Detection using Clustering,Ran-dom Forests,and Active Learning[C]∥Sixth Conference on Email and Anti-Spam.Mountain View,California,2009
[9] Tuia D,Ratle F,Pacifici F,et al.Active learning methods for remote sensing image classification[J].IEEE Trans.Geosci.Remote Sens.,2009,47(7):2218-2232
[10] Dagan I,Engelson S.Committee-based sampling for trainingprobabilistic classifiers[C]∥Proceedings of the 12th International Conference on Machine learning.1995:150-157
[11] 韩光,赵春霞,胡雪蕾.一种新的SVM主动学习算法及其在障碍物检测中的应用[J].计算机研究与发展,2009,6(11):15-20
[12] Tang M,Luo X,Roukos S.Active learning for statistical natural language parsing[C]∥ACL 2002.Philadelphia,PA,USA 2002
[13] Shen X,Zhai C.Active Feedback in Ad Hoc Information Re-trieval[C]∥28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05).2005:59-66
[14] Campbell C,Cristianini N,Smola A.Query Learning with Large Margin Classifiers[C]∥Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000).Morgan Kaufman,2000
[15] Chang C C,Lin C J.LIBSVM.A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2:1-27
[16] Qi Y,Klein-Seetharaman J,Bar-Joseph Z.A mixture of feature experts approach for protein-protein interaction prediction[J].BMC Bioinformatics,2007,8(Suppl 10):S6
[17] Tong A H,Lesage G,Bader G D,et al.Global mapping of the yeast genetic interaction network[J].Science,2004,303(5659):808-813
[18] SPSS Inc.IBM SPSS Statistics 20Brief Guide.pdf5

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!