计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 297-300.doi: 10.11896/jsjkx.210400149

• 大数据&数据科学 • 上一篇    下一篇

一种适于多分类问题的支持向量机加速方法

陈景年   

  1. 山东财经大学信息与计算科学系 济南 250014
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 陈景年(jnchen06@163.com)
  • 基金资助:
    国家自然科学基金(61773325)

Acceleration of SVM for Multi-class Classification

CHEN Jing-nian   

  1. Department of Information and Computing Science,Shandong University of Finance and Economics,Jinan 250014,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:CHEN Jing-nian,born in 1970,Ph.D,professor,supervisor,is a senior member of China Computer Federation.His main research interests include big data analysis,intelligent information processing.
  • Supported by:
    National Natural Science Foundation of China(61773325).

摘要: 支持向量机因具有卓越的分类效果和坚实的理论基础而成为了近年来模式识别、机器学习以及数据挖掘等领域中最重要的分类方法之一。然而,其训练时间会随样本增多而明显增长,并且在处理多分类问题时模型训练会更加复杂。为解决上述问题,给出了一种适于多分类问题的训练数据快速约简方法MOIS。该方法以聚类中心为参照点,在删除掉冗余训练样本的同时,选择起决定作用的边界样本来大幅度约简训练数据,并消减类别间的分布不均衡问题。实验结果表明,MOIS在保持甚至提高支持向量机分类效果的同时,能大幅提高训练效率。例如,在Optdigit数据集上,利用所提方法使分类准确率由98.94%提高到99.05%的同时,训练时间缩短到原来的15%;又如,在HCL2000前100类构成的数据集上,在准确率略有提高的情况下(由99.29%提高到99.30%),训练时间更是大幅缩短到不足原来的6%。另外,MOIS本身具有很高的运行效率。

关键词: 多分类, 聚类, 数据约简, 样本选择, 支持向量机

Abstract: With excellent classification effect and solid theoretical foundation,support vector machines have become one of the most important classification method in the field of pattern recognition,machine learning and data mining in recent years.How-ever,their training time becomes much longer with the increase of training instances.In the case of multi-class classification,the training process will become even more complex.To deal with above problems,a fast data reduction method named as MOIS is proposed for multi-class classification.With cluster centers being used as reference points,redundent instances can be deleted,bound instances crucial for the trainning can be selected,and the distribution imbalance between classes can also be relieved by the proposed method.Experiments show that MOIS can enormously improve the training efficiency while keeping or even improving the classification accuracy.For example,on Optdigit dataset,the classification accuracy is increased from 98.94% to 99.05%,while the training time is reduced to 0.15% of the original.What's more,on the dataset formed by the first 100 classes of HCL2000,the training time of the proposed method is reduced to less than 6% of original,while the accuracy is improved slightly from 99.29% to 99.30%.Furthermore,MIOS is highly efficient.

Key words: Clustering, Data reduction, Instance seletion, Multi-class classification, Support vector machines

中图分类号: 

  • TP391
[1] VAPNIK V.The nature of statistical learning theory[M].New York:Springer,1995.
[2] DONG J,KRZYZAK A,SUEN C Y.Fast SVM training algorithm with decomposition on very large data sets[J].IEEE Trans. Pattern Analysis and Machine Intelligence,2005,27(4):603-618.
[3] YANG B Q,GUAN X P,ZHU J W,et al.SVMs multi-class loss feedback based discriminative dictionary learning for image classification[J].Pattern Recognition,2020,112(12):76-90.
[4] ZHANG X D,LI A,PAN R.Stock trend prediction based on new status box method and adaboost probabilistic support vector machine[J].Applied Soft Computing,2016,49:385-398.
[5] RAMÍREZ J,GÓRRIZ J,SALAS-GONZALEZ D,et al.Com-puter-aided diagnosis of alzheimer's type dementia combining support vector machines and discriminant set of features[J].Information Sciences,2013,237:59-72.
[6] KEERTHI S S,SHEVADE S K,BHATTACHARYYA C,et al.Improvements to platt's SMO algorithm for SVM classifier design[J].Neural Computation,2001,13(3):637-649.
[7] MANGASARIAN O L,MUSICANT D R.Successive overrela-xation for support vector machines[J].IEEE Transactions on Neural Networks,1999,10(5):1032-1037.
[8] VAPNIK V.Estimation of dependences based on empirical data[M].New York:Springer,2006.
[9] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.
[10] BURGES C J.A tutorial on support vector machines for pattern recognition[J].Data Mining and Knowledge Discovery,1998,2:121-167.
[11] ALMEIDA M B,BRAGA A P,BRAGA J P.SVM-KM:Spee-ding SVMs learning with a priori cluster selection and k-means[C]//Brazilian symposium on neural networks.Brazil Computer Society,2000:162-167.
[12] LI H L,WANG C H,YUAN B Z,et al.A Learning Strategy of SVM Used to Large Training Set[J].Chinese Journal of Computers,2004,27(5):715-719.
[13] SHIN H,CHO S.Neighborhood property based pattern selection for support vector machines[J].Neural Computation,2007,19(3):816-855.
[14] ANGIULLI F,ASTORINO A.Scaling up support vector machines using nearest neighbor condensation[J].IEEE Transactions on Neural Networks,2010,21(2):351-357.
[15] LI Y,MAGUIRE L.Selecting critical patterns based on local geometrical and statistical information[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(6):1189-1201.
[16] KIM D,KANG S,CHO S.Expected margin-based pattern selection for support vector machines[J].Expert Systems With Applications,2020,139:1-12.
[17] HETTICH S,BLAKE C L,MERZ C J.UCI Repository of machine learning databases[EB/OL].http//www.ics.uci.edu/~mlearn/MLRepository.html.
[18] ZHANG H,GUO J,CHEN G,et al.HCL2000—A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition[C]//International Conference on Document Analysis and Recognition.IEEE Computer Society,2009:286-289.
[19] LIU C L,NAKASHIMA K,SAKO H,et al.Handwritten digit recognition:investigation of normalization and feature extraction techniques[J].Pattern Recognition,2004,37(2):265-279.
[1] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[3] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[4] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[5] 单晓英, 任迎春.
基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别
Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm
计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216
[6] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[7] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[8] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[9] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[10] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[11] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[12] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[13] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[14] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[15] 武玉坤, 李伟, 倪敏雅, 许志骋.
单类支持向量机融合深度自编码器的异常检测模型
Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder
计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!