计算机科学 ›› 2016, Vol. 43 ›› Issue (5): 238-242.doi: 10.11896/j.issn.1002-137X.2016.05.044
刘泽燊,潘志松
LIU Ze-shen and PAN Zhi-song
摘要: 随着数据规模的不断增加,支持向量机(SVM)的并行化设计成为数据挖掘领域的一个研究热点。针对SVM算法训练大规模数据时存在寻优速度慢、内存占用大等问题,提出了一种基于Spark平台的并行支持向量机算法(SP-SVM)。该方法通过调整层叠支持向量机(Cascade SVM)的合并策略和训练结构,并利用Spark分布式计算框架实现;其次,进一步分析并行操作算子的性能,优化算法并行化实现方案,有效克服了层叠模型训练效率低的缺点。实验结果表明,新的并行训练方法在损失较小精度的前提下,在一定程度上减少了训练时间,能够很好地提高模型的学习效率。
[1] Vapnik V N.The Nature of Statistical Learning Theory[M].Springer New York,1995:988-999 [2] Chang C C,Lin C J.LIBSVM:a Library for Support Vector Machines[J].ACM Transactions on Intelligent Systems & Technology,2006,2(3):389-396 [3] Dong J X,Krzyzak A,Suen C Y.Fast SVM training algorithm with decomposition on very large data sets[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,27(4):603-618 [4] Lin C Y,Tsai C H,Lee C P,et al.Large-scale logistic regression and linear support vector machines using spark[C]∥2014 IEEE International Conference on Big Data.IEEE,2014:519-528 [5] Zhang Wei,Zhang Gong-xuan,Wang Yong-li,et al.Research on parallel SVM algorithm based on CUDA[J].Computer Science,2013,40(4):69-72(in Chinese) 张巍,张功萱,王永利,等.基于CUDA的SVM算法并行化研究[J].计算机科学,2013,40(4):69-72 [6] Graf H P,Cosatto E,Bottou L,et al.Parallel Support VectorMachines:The Cascade SVM[C]∥Advances in Neural Information Processing Systems(NIPS).2004:521-528 [7] Sun Zhan-quan,Fox G.Study on Parallel SVM Based on MapReduce[C]∥The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications.Las Vegas NV USA,2012 [8] Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Proceedings of Operating Systems Design and Implementation(OSDI),2004,51(1):107-113 [9] Zhang Peng-xiang,Liu Li-min,Ma Zhi-qiang.Research of parallel SVM algorithm based on MapReduce[J].Computer Applications and Software,2015,32(3):172-176(in Chinese) 张鹏翔,刘利民,马志强.基于MapReduce的层叠分组并行SVM算法研究[J].计算机应用与软件,2015,32(3):172-176 [10] Zaharia M,Chowdhury M,Franklin M J,et al.Spark:clustercomputing with working sets[C]∥Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing USENIX Association.2010:10 [11] http://spark.apache.org [12] Zaharia M,Chowdhury M,Das T,et al.Resilient distributeddatasets:A fault-tolerant abstraction for in-memory cluster computing[C]∥Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation.2012:141-146 [13] Guo Xin-xin.SVM optimization algorithm based on Distributed Computing[D].Xi’an:Xi’an Electronic and Science University,2014(in Chinese) 郭欣欣.基于分布式计算的SVM算法优化[D].西安:西安电子科技大学,2014 [14] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets |
No related articles found! |
|