基于相似度差的大间隔快速学习模型

摘要/Abstract

摘要： 许多模式分类方法比如支持向量机和L₂核分类器等都会利用核方法并转化为二次规划问题进行求解,而计算核矩阵需要O(m²)的空间复杂度,求解QP问题则需要O(m³)的时间复杂度,这就使得此类方法在大样本数据上的学习性能非常低下。对此,首次提出了相似度差支持向量机算法DSSVM。算法旨在寻求样本与某类相似度的一个最佳线性表示,并从线性表示的稀疏性以及相似度差意义上的间隔最大化角度构造了新的最优化问题。同时,证明了该算法等价于中心约束型最小包含球问题,这样就可以通过引入最小包含球的快速学习理论将相似度差支持向量机扩展为相似度差核支持向量机DSCVM,从而较好地解决了大规模数据集的分类问题。实验证明了相似度差支持向量机和相似度差核支持向量机的有效性。

关键词: 相似度差,稀疏,核心集,最小包含球,支持向量机

Abstract: Many pattern classification methods,such as support vector machine and L₂ kernel classification,often use the kernel methods and are formulated as quadratic programming problems,but computing kernel matrix would require O(m²)computation,and solving QP may take up to O(m3),which limits these methods to train on large datasets．In this paper,a new classification method called difference of similarity support vector machine(DSSVM)was proposed.DSSVM pursues a best linear representation of a total similarity between any sample and a particular class．According to the sparsity of the linear representation and the max margin of the difference of similarity,a new optimization problem is obtained．Meanwhile,the difference of similarity support vector machine can be equivalently formulated as the center constrained minimal enclosing ball,and thus difference of similarity support vector machine can be extended to diffe-rence of similarity core vector machine(DSCVM)by introducing fast learning theory of minimal enclosing ball,to solve the classification for large datasets．This is confirmed in the experimental studies.

Key words: Difference of similarity,Sparsity,Core set,Minimum enclosing ball,Support vector machine

应文豪,王士同. 基于相似度差的大间隔快速学习模型[J]. 计算机科学, 2013, 40(8): 239-244. https://doi.org/

YING Wen-hao and WANG Shi-tong. Large Margin and Fast Learning Model Based on Difference of Similarity[J]. Computer Science, 2013, 40(8): 239-244. https://doi.org/

参考文献

[1] Cortes C,Vapnik V N．Support vector networks[J]．MachineLearning,1995,0(3):273-297
[2] Chang C-C,Lin C-J．Training v-Support Vector Classifiers:Theoryand Algorithms[J]．Neural Computation,2002,14:1959-1977
[3] Hu M,Chen Y,Kwok J T．Building sparse multi-kernel SVM classifiers[J]．IEEE Transactions on Neural Networks,2009,20(5):827-839
[4] 丁晓剑,赵银亮,李远成.基于SVM的二次下降有效集算法[J].电子学报,2011,39(8):1766-1770
[5] Tax D M J,Duin R P W．Support Vector Data Description[J]．Machine Learning,2004,54(1):45-66
[6] Kim J S,Scott C D．L2kernel classification[J]．IEEE Trans．PAMI,2010,32(10):1822-1831
[7] Yang Y,Wright J,Ma Y,et al.Feature Selection in Face Recognition:A Sparse Representation Perspective[J]．IEEE Trans．PAMI,2009,31(2):210-227
[8] Majumdar A,Ward R K．Classification via group sparsity pro-moting regularization[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing．2009:861-864
[9] Majumdar A,Ward R K．Improved Group Sparse Classifier[J]．Pattern Recognition Letters,2010,31(13):1959-1964
[10] Tibshirani R.Regression shrinkage and selection via the lasso[J]．J．Royal．Statist．Soc B.,1996,58(1):267-288
[11] Donoho D L．For most large underdetermined systems of linear equations the minimal L1-norm solution is also the sparsest solution[J]．Commun．Pure Appl．Math．,2006(6):797-829
[12] 宋枫溪,程科,杨静宇,等.最大散度差和大间距线性投影与支持向量机[J].自动化学报,2004,30(6):890-896
[13] Fine S,Scheinberg K．Efficient SVM training using low-rankkernel representations[J]．J．Mach．Learn．Res.,2001,2:243-264
[14] Deng Z H,Chung F L,Wang S T．FRSDE:Fast Reduced SetDensity Estimator using Minimal Enclosing Ball Approximation[J].Pattern Recognition,2008,1:1363-1372
[15] Tsang I W,Kwok J T,Zurada J M．Generalized core vector machines[J]．IEEE Transactions on Neural Networks,2006,17(5):1126-1140
[16] Chung F L,Deng Z H,Wang S T．From Minimum Enclosing Ball to Fast Fuzzy Inference System Training on Large Datasets[J]．IEEE Transactions on Fuzzy Systems,2009,17(1):173-184
[17] Bǎdoiu M,Clarkson K L．Optimal core sets for balls[J]．Computational Geometry:Theory and Applications,2008,0(1):14-22
[18] Bǎdoiu M,Har-Peled S,Indyk P．Approximate clustering viacore sets[C]∥Proc．34th Annu．ACM Symposium Theory Comput.,Montreal,QC,Canada,2002:250-257
[19] Asharaf S,Murty M N,Shevade S K．Multiclass core vector machine[C]∥Proc．24th ICML．Corvalis,OR,2007:41-48
[20] 胡文军,王士同,邓赵红.适合大样本快速训练的最大夹角间隔核心集向量机[J].电子学报,2011,9(5):1178-1184
[21] Smola A,Schlkopf B．Sparse greedy matrix approximation for machine learning[C]∥ICML’00Proceedings of the 7th International Conference on Mechine Learning Stanford,CA,June 2000:911-918
[22] Asuncion A,Newman D J．UCI Machine Learning Repository．http://www.ics.uci.edu/ ~mlearn/MLRepository.html,Irvine,CA:University of California,School of Information and Computer Science

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed