面向超大数据集的SVM近似训练算法

计算机科学 ›› 2009, Vol. 36 ›› Issue (11): 208-212.

面向超大数据集的SVM近似训练算法

曾志强,廖备水,高济

(厦门理工学院计算机科学与技术系厦门361024);(浙江大学计算机科学与技术学院杭州310027)

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然科学基金(60773177)，福建省青年人才项目(2008F3108),厦门理工学院引进人才项目(YKJ08003R)资助。

Approximate Approach to Train SVM on Very Large Data Sets

ZENG Zhi-qiang, LIAO Bei shui,GAO Ji

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 标准SVM学习算法运行所需的时间和空间复杂度分别为O(l3)和O(l3),l为训练样本的数量，因此不适用于对超大数据集进行训练。提出一种基于近似解的SVM训练算法:Approximate Vector Machine (AVM)。 AVM采用增量学习的策略来寻找近似最优分类超平面，并且在迭代过程中采用热启动及抽样技巧来加快训练速度。理论分析表明，该算法的计算复杂度与训练样本的数量无关，因此具有良好的时间与空间扩展性。在超大数据集上的实验结果表明，该算法在极大提高训练速度的同时，仍然保持了原始分类器的泛化性能，并且训练完毕具有较少的支持向量，因此结果分类器具有更快的分类速度。

关键词: 支持向量机，核函数，增量学习，近似解，核心集

Abstract: Standard Support Vector Machine (SVM) training has O(l3) time and O(l3)space complexities,where L is the training set size. It is thus computationally infeasible on very large data sets. A novel SVM training method, Approximatc Vector Machine (AVM)，based on approximate solution was presented to scale up kernel methods on very large data sets. This approach only obtains an approximately optimal hyper plane by incremental learning, and uses probabilistic speedup and hot start tricks to accelerate training speed during each iterative stage. hheoretical analysis indicates that AVM has the time and space complexities that are independent of training set size. Experiments on very large data sets show that the proposed method not only preserves the generalization performance of the original SVM classifiers,but outperforms existing scalcup methods in terms of training time and number of support vectors.

Key words: Support vector machine, Kernel function, Incremental learning, Approximate solution, Core set

曾志强,廖备水,高济. 面向超大数据集的SVM近似训练算法[J]. 计算机科学, 2009, 36(11): 208-212. https://doi.org/

ZENG Zhi-qiang, LIAO Bei shui,GAO Ji. Approximate Approach to Train SVM on Very Large Data Sets[J]. Computer Science, 2009, 36(11): 208-212. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed