计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 297-300.doi: 10.11896/jsjkx.210400149
陈景年
CHEN Jing-nian
摘要: 支持向量机因具有卓越的分类效果和坚实的理论基础而成为了近年来模式识别、机器学习以及数据挖掘等领域中最重要的分类方法之一。然而,其训练时间会随样本增多而明显增长,并且在处理多分类问题时模型训练会更加复杂。为解决上述问题,给出了一种适于多分类问题的训练数据快速约简方法MOIS。该方法以聚类中心为参照点,在删除掉冗余训练样本的同时,选择起决定作用的边界样本来大幅度约简训练数据,并消减类别间的分布不均衡问题。实验结果表明,MOIS在保持甚至提高支持向量机分类效果的同时,能大幅提高训练效率。例如,在Optdigit数据集上,利用所提方法使分类准确率由98.94%提高到99.05%的同时,训练时间缩短到原来的15%;又如,在HCL2000前100类构成的数据集上,在准确率略有提高的情况下(由99.29%提高到99.30%),训练时间更是大幅缩短到不足原来的6%。另外,MOIS本身具有很高的运行效率。
中图分类号:
[1] VAPNIK V.The nature of statistical learning theory[M].New York:Springer,1995. [2] DONG J,KRZYZAK A,SUEN C Y.Fast SVM training algorithm with decomposition on very large data sets[J].IEEE Trans. Pattern Analysis and Machine Intelligence,2005,27(4):603-618. [3] YANG B Q,GUAN X P,ZHU J W,et al.SVMs multi-class loss feedback based discriminative dictionary learning for image classification[J].Pattern Recognition,2020,112(12):76-90. [4] ZHANG X D,LI A,PAN R.Stock trend prediction based on new status box method and adaboost probabilistic support vector machine[J].Applied Soft Computing,2016,49:385-398. [5] RAMÍREZ J,GÓRRIZ J,SALAS-GONZALEZ D,et al.Com-puter-aided diagnosis of alzheimer's type dementia combining support vector machines and discriminant set of features[J].Information Sciences,2013,237:59-72. [6] KEERTHI S S,SHEVADE S K,BHATTACHARYYA C,et al.Improvements to platt's SMO algorithm for SVM classifier design[J].Neural Computation,2001,13(3):637-649. [7] MANGASARIAN O L,MUSICANT D R.Successive overrela-xation for support vector machines[J].IEEE Transactions on Neural Networks,1999,10(5):1032-1037. [8] VAPNIK V.Estimation of dependences based on empirical data[M].New York:Springer,2006. [9] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27. [10] BURGES C J.A tutorial on support vector machines for pattern recognition[J].Data Mining and Knowledge Discovery,1998,2:121-167. [11] ALMEIDA M B,BRAGA A P,BRAGA J P.SVM-KM:Spee-ding SVMs learning with a priori cluster selection and k-means[C]//Brazilian symposium on neural networks.Brazil Computer Society,2000:162-167. [12] LI H L,WANG C H,YUAN B Z,et al.A Learning Strategy of SVM Used to Large Training Set[J].Chinese Journal of Computers,2004,27(5):715-719. [13] SHIN H,CHO S.Neighborhood property based pattern selection for support vector machines[J].Neural Computation,2007,19(3):816-855. [14] ANGIULLI F,ASTORINO A.Scaling up support vector machines using nearest neighbor condensation[J].IEEE Transactions on Neural Networks,2010,21(2):351-357. [15] LI Y,MAGUIRE L.Selecting critical patterns based on local geometrical and statistical information[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(6):1189-1201. [16] KIM D,KANG S,CHO S.Expected margin-based pattern selection for support vector machines[J].Expert Systems With Applications,2020,139:1-12. [17] HETTICH S,BLAKE C L,MERZ C J.UCI Repository of machine learning databases[EB/OL].http//www.ics.uci.edu/~mlearn/MLRepository.html. [18] ZHANG H,GUO J,CHEN G,et al.HCL2000—A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition[C]//International Conference on Document Analysis and Recognition.IEEE Computer Society,2009:286-289. [19] LIU C L,NAKASHIMA K,SAKO H,et al.Handwritten digit recognition:investigation of normalization and feature extraction techniques[J].Pattern Recognition,2004,37(2):265-279. |
[1] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[2] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[3] | 刘丽, 李仁发. 医疗CPS协作网络控制策略优化 Control Strategy Optimization of Medical CPS Cooperative Network 计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230 |
[4] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[5] | 单晓英, 任迎春. 基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别 Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm 计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216 |
[6] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059 |
[7] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[8] | 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞. 基于密度敏感距离和模糊划分的改进FCM算法 FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition 计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042 |
[9] | 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳. 三维城市场景中的小物体检测 Small Object Detection in 3D Urban Scenes 计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174 |
[10] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[11] | 朱哲清, 耿海军, 钱宇华. 面向化学结构的线段聚类算法 Line-Segment Clustering Algorithm for Chemical Structure 计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131 |
[12] | 张宇姣, 黄锐, 张福泉, 隋栋, 张虎. 基于菌群优化的近邻传播聚类算法研究 Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization 计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218 |
[13] | 左园林, 龚月姣, 陈伟能. 成本受限条件下的社交网络影响最大化方法 Budget-aware Influence Maximization in Social Networks 计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228 |
[14] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[15] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
|