Computer Science ›› 2016, Vol. 43 ›› Issue (5): 238-242.doi: 10.11896/j.issn.1002-137X.2016.05.044

Previous Articles     Next Articles

Research on Parallel SVM Algorithm Based on Spark

LIU Ze-shen and PAN Zhi-song   

  • Online:2018-12-01 Published:2018-12-01

Abstract: With the constant increasing of data scale,the parallel design of support vector machine(SVM) has become a hot research topic in data mining field.In view of the problems in model training including slow optimization and large memory,we proposed a new parallel SVM algorithm(SP-SVM) based on Spark.First of all,this paper implemented algorithm using Spark parallel computing framework.Secondly,this paper analyzed the performance of the parallel operator and optimized the algorithm in parallel design scheme,solving the problem of low efficiency that cascade training model encounters.Experimental results show that the new parallel training method can save more training time and greatly improve the efficiency in the case of a small precision loss.

Key words: Parallel computing,Support vector machine,Large scale data,Cascade model,Spark

[1] Vapnik V N.The Nature of Statistical Learning Theory[M].Springer New York,1995:988-999
[2] Chang C C,Lin C J.LIBSVM:a Library for Support Vector Machines[J].ACM Transactions on Intelligent Systems & Technology,2006,2(3):389-396
[3] Dong J X,Krzyzak A,Suen C Y.Fast SVM training algorithm with decomposition on very large data sets[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,27(4):603-618
[4] Lin C Y,Tsai C H,Lee C P,et al.Large-scale logistic regression and linear support vector machines using spark[C]∥2014 IEEE International Conference on Big Data.IEEE,2014:519-528
[5] Zhang Wei,Zhang Gong-xuan,Wang Yong-li,et al.Research on parallel SVM algorithm based on CUDA[J].Computer Science,2013,40(4):69-72(in Chinese) 张巍,张功萱,王永利,等.基于CUDA的SVM算法并行化研究[J].计算机科学,2013,40(4):69-72
[6] Graf H P,Cosatto E,Bottou L,et al.Parallel Support VectorMachines:The Cascade SVM[C]∥Advances in Neural Information Processing Systems(NIPS).2004:521-528
[7] Sun Zhan-quan,Fox G.Study on Parallel SVM Based on MapReduce[C]∥The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications.Las Vegas NV USA,2012
[8] Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Proceedings of Operating Systems Design and Implementation(OSDI),2004,51(1):107-113
[9] Zhang Peng-xiang,Liu Li-min,Ma Zhi-qiang.Research of parallel SVM algorithm based on MapReduce[J].Computer Applications and Software,2015,32(3):172-176(in Chinese) 张鹏翔,刘利民,马志强.基于MapReduce的层叠分组并行SVM算法研究[J].计算机应用与软件,2015,32(3):172-176
[10] Zaharia M,Chowdhury M,Franklin M J,et al.Spark:clustercomputing with working sets[C]∥Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing USENIX Association.2010:10
[11] http://spark.apache.org
[12] Zaharia M,Chowdhury M,Das T,et al.Resilient distributeddatasets:A fault-tolerant abstraction for in-memory cluster computing[C]∥Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation.2012:141-146
[13] Guo Xin-xin.SVM optimization algorithm based on Distributed Computing[D].Xi’an:Xi’an Electronic and Science University,2014(in Chinese) 郭欣欣.基于分布式计算的SVM算法优化[D].西安:西安电子科技大学,2014
[14] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!