计算机科学 ›› 2014, Vol. 41 ›› Issue (Z6): 387-390.

• 数据挖掘 • 上一篇    下一篇

加权抽样对相似性学习算法的改进效果研究

刘欣悦,刘广钟   

  1. 上海海事大学信息工程学院 上海201306;上海海事大学信息工程学院 上海201306
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61202370),上海市教委科研重点创新项目(12ZZ151),上海市浦江人才计划项目(11PJ1404300),上海海事大学2013年研究生学术新人培育计划(工学)(GK2013077)资助

Study on Similarity Learning with Weighted Sampling

LIU Xin-yue and LIU Guang-zhong   

  • Online:2018-11-14 Published:2018-11-14

摘要: 当今诸多聚类算法需要通过计算样本间距离来得到样本相似性。因此对这类算法而言,距离的计算方法尤为重要。对部分现有距离度量学习或相似性学习算法进行研究后可以发现,多数算法在选择学习样本的过程中,都采用了重复随机抽样的方式。这一抽样方式使所有训练节点都有均等概率用于度量或相似性学习,但因样本位置不同,对分类算法而言样本的分类难度也不同。如果能针对较难分类的样本进行着重学习,并适当减少对易分类点的学习时间,便能提高学习过程的效率性,减少学习过程的时间。节约时间成本,在大数据时代有不容忽视的意义。

关键词: 相似性度量,距离度量,加权抽样,机器学习,k-NN,Boosting 中图法分类号TP181文献标识码A

Abstract: A lot of classification algorithms get the similarity between samples according to their distance.Therefore,for this kind of algorithms,the way for getting distance is very important.Studies in existing metric or similarity learning algorithms find that most of the existing methods take use of random samples from training database for learning.This sampling method gives an equal probability for every training samples to be used for metric learning.However,the different location results in different classification difficulty of samples.If those samples who are difficult classified could be used more frequently in learning,while other samples arranged less learning time,the efficiency of learning will be improved.Reducing learning time is significant in Big-Data era.

Key words: Similarity measurement,Distance metric,Weighted sampling,Machine learning,k-NN,Boosting

[1] Chechik G,Sharma V,Shalit U,et al.Large scale online learningof image similarity through ranking[J].The Journal of Machine Learning Research,2010,11:1109-1135
[2] Blitzer J,Weinberger K Q,Saul L K.Distance metric learningfor large margin nearest neighbor classification[C]∥Advances in Neural Information Processing Systems.2005:1473-1480
[3] Davis J V,Kulis B,Jain P,et al.Information-theoretic metric learning[C]∥Proceedings of the 24th international conference on Machine learning.ACM,2007:209-216
[4] Wu P,Hoi S C H,Zhao P,et al.Mining social images with distance metric learning for automated image tagging[C]∥Proceedings of the Fourth ACM International Conference on Web Search and Data Mining.ACM,2011:197-206
[5] Globerson A,Roweis S T.Metric learning by collapsing classes[C]∥Advances in Neural Information Processing Systems.2005:451-458
[6] Mensink T,Verbeek J,Perronnin F,et al.Large scale metriclearning for distance-based image classification[R].2012
[7] Kulis B,Sustik M,Dhillon I.Learning low-rank kernel matrices[C]∥Proceedings of the 23rd international conference on Machine learning.ACM,2006:505-512
[8] Demar J.Statistical comparisons of classifiers over multiple data sets[J].The Journal of Machine Learning Research,2006,7:1-30
[9] Liu Yang,Rong Jin.Distance metric learning:A comprehensive survey[D].Michigan State Universiy,2006:1-51
[10] 张丽娟,李舟军.分类方法的新发展:研究综述[J].计算机科学,2006,33(10):11-15
[11] He X,King O,Ma W Y,et al.Learning a semantic space fromuser’s relevance feedback for image retrieval[J].Circuits and Systems for Video Technology,IEEE Transactions on,2003,13(1):39-48
[12] Peng J,Heisterkamp D R,Dai H K.Adaptive kernel metric nearest neighbor classification[C]∥ 16th International Conference on Pattern Recognition.IEEE,2002,3:33-36
[13] Bar-Hillel A,Hertz T,Shental N,et al.Learning distance functions using equivalence relations[C]∥Proc.International Conference on Machine Learning,2003
[14] Bar-Hillel A,Hertz T,Shental N,et al.Learning distance functions using equivalence relations[C]∥ICML.2003,3:11-18
[15] http://zh.wikipedia.org/wiki/%E5%BA%A6%E9%87%8F%E7%A9%BA%E9%97%B4
[16] http://zh.wikipedia.org/wiki/%E8%B7%9D%E7%A6%BB
[17] Duda R O,Hart P E,Stock D G.Pattern Classification (Second Edition)[M].北京:机械工业出版社,2010:143-155

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!