Computer Science ›› 2022, Vol. 49 ›› Issue (7): 64-72.doi: 10.11896/jsjkx.210500040

• Database & Big Data & Data Science • Previous Articles     Next Articles

Parallel Support Vector Machine Algorithm Based on Clustering and WOA

LIU Wei-ming, AN Ran, MAO Yi-min   

  1. School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China
  • Received:2021-04-30 Revised:2021-05-17 Online:2022-07-15 Published:2022-07-12
  • About author:LIU Wei-ming,born in 1964,professor,master supervisor.His main research interests include data mining,big data and so on.
    MAO Yi-min,born in 1970,Ph.D,professor,master supervisor.Her main research interests include data mining,big data and so on.
  • Supported by:
    National Natural Science Foundation of China(41562019),National Key Research and Development of China(2018YFC1504705) and Science and Technology Foundation of Jiangxi Province(GJJ151528,GJJ151531).

Abstract: Aiming at the problems of parallel support vector machine(SVM) being sensitive to redundant data,poor parameter optimization ability and load imbalance in parallel process in the big data environment,a parallel support vector machine algorithm—MR-KWSVM,based on clustering algorithm and whale optimization algorithm,is proposed.Firstly,the algorithm proposes K-means and fisher(KF) strategy to delete redundant data,and trains SVM with the data set after the redundant data is deleted,which effectively reduces the sensitivity of SVM to redundant data.Secondly,the improved whale optimization algorithm based on nonlinear convergence factor and self-adaptive inertia weight(IW-BNAW) is proposed,and the IW-BNAW algorithm is used to obtain the SVM optimal parameters and improve the parameter optimization ability of the support vector machine.Finally,in the process of constructing parallel SVM with MapReduce,a time feedback strategy(TFB) is proposed for load scheduling of reduce nodes,which improves the parallel efficiency of the cluster and achieves high parallel SVM.Experiment results show that the proposed algorithm not only guarantees the high parallel computing power of SVM in big data environment,but also significantly improves the classification accuracy of SVM,and it has better generalization performance.

Key words: IW-BNAW algorithm, KF strategy, MapReduce frame, SVM algorithm, TFB strategy

CLC Number: 

  • TP338
[1]SHI Q,ZHANG H.Fault diagnosis of an autonomous vehicle with an improved SVM algorithm subject to unbalanced datasets[J].IEEE Transactions on Industrial Electronics,2020,68(7):6248-6256.
[2]CAO J M.Research on network security framework for big data based hyper-heuristic SVM[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(1):23-29.
[3]ZHU G T,PAN X L.Research on the Early Warning for Online Public Opinion Crisis Based on Factor Analysis and SVM[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2020,37(5):94-100.
[4]ALTHNIAN A,ALOBOUD N,ALKHARASHI N,et al.Face Gender Recognition in the Wild:An Extensive Performance Comparison of Deep-Learned,Hand-Crafted,and Fused Features with Deep and Traditional Models[J].Applied Sciences,2020,11(1):89.
[5]AKBAL A.A local knit pattern-based automated fault classification method for the cooling system of the data center[J].Applied Acoustics,2021,176:107888.
[6]PINHEIRO R H W,CAVALCANTI G D C,TSANG I R.Combining binary classifiers in different dichotomy spaces for text categorization[J].Applied Soft Computing,2019,76:564-574.
[7]ZHAO Q.Social emotion classification of Japanese text information based on SVM and KNN[J].Journal of Ambient Intelligence and Humanized Computing,2021(8):1-12.
[8]RYZHIKOVA E,RALBOVSKY N M,SIKIRZHYTSKI V,et al.Raman spectroscopy and machine learning for biomedical applications:Alzheimer's disease diagnosis based on the analysis of cerebrospinal fluid[J].Spectrochimica Acta Part A:Molecular and Biomolecular Spectroscopy,2021,248:119188.
[9]PETRILLO U F,PALINI F,CATTANEO G,et al.FASTA/Q data compressors for MapReduce-Hadoop genomics:space and time savings made easy[J].BMC Bioinformatics,2021,22(1):1-21.
[10]SHEN C P,LIN J W,LIN F S,et al.GA-SVM modeling of multiclass seizure detector in epilepsy analysis system using cloud computing[J].Soft Computing-A Fusion of Foundations,Methodologies and Applications,2017,21(8):2139-2149.
[11]THAKUR R K,DESHPANDE M V.Kernel Optimized Support Vector Machine and MapReduce Framework for Sentiment Classification of Train Reviews[J].International Journal of Uncertainty,Fuzziness and Knowledge-based Systems,2019,27(6):1025-1050.
[12]DING X,HUANG W,GUO Y,et al.Parallel Recombined Support Vector Machine Based on MapReduce and Bagging[J].Journal of Information Engineering University,2018,1902:196-202,208.
[13]WANG R,XIANG X,XIAO B S.A Distributed SVM Algorithm Optimization of Clustering[J].Journal of Air Force Engineering University(Natural Science Edition),2018,19(2):86-92.
[14]ÁLVAREZ-ALVARADO J M,RÍOS-MORENO J G,OBRE-GÓN-BIOSCA S A,et al.Hybrid Techniques to Predict Solar Radiation Using Support Vector Machine and Search Optimization Algorithms:A Review[J].Applied Sciences(2076-3417),2021,11(3):1044.
[15]MAN W S,JI Y Y.Research on Distributed SVM Classification Based on Hadoop Platform[J].Computer Systems & Applications,2017,26(8):141-146.
[16]HU J,MA D,LIU C,et al.Network security situation prediction based on MR-SVM[J].IEEE Access,2019,7:130937-130945.
[17]GUO R,ZHANG F,WANG L,et al.BaPa:A Novel Approach of Improving Load Balance in Parallel Matrix Factorization for Recommender Systems[J].IEEE Transactions on Computers,2020,70(5):789-802.
[18]GUO W,ALHAM N K,LIU Y,et al.A Resource AwareMapReduce Based Parallel SVM for Large Scale Image Classifications[J].Neural Processing Letters,2016,44(1):161-184.
[19]PRAKASH P,RAJKUMAR N.Improved local fisher discriminant analysis based dimensionality reduction for cancer disease prediction[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(7):8083-8098.
[20]TSAO C Y,CHEN T Y.Pythagorean fuzzy likelihood function based on beta distributions and its based dominance ordering model in an uncertain multiple criteria decision support framework[J].International Journal of Intelligent Systems,2021,36(6):2680-2729.
[21]WANG M,YAN Z,LUO J,et al.A band selection approach based on wavelet support vector machine ensemble model and membrane whale optimization algorithm for hyperspectral image[J].Applied Intelligence,2021,51(11):7766-7780.
[22]SINHA A,JANA P K.A hybrid MapReduce-based k-meansclustering using genetic algorithm for distributed datasets[J].The Journal of Supercomputing,2018,74(4):1562-1579.
[23]ZHENG B,MA X.Application on Damage Types Recognition in Civil Aeroengine Based on SVM Optimized by DMPSO[J].Computer Science,2020,47(S2):132-138.
[24]ANGAYARKANNI S A,SIVAKUMAR R,RAMANA R Y V.Hybrid Grey Wolf:Bald Eagle search optimized support vector regression for traffic flow forecasting[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(1):1293-1304.
[25]BAGUI S,DEVULAPALLI K,COFFEY J.A HeuristicApproach for Load Balancing the FP-Growth Algorithm on MapReduce[J].Array,2020,7:100035.
[26]CAO J,WANG M,LI Y,et al.Improved support vector machine classification algorithm based on adaptive feature weight updating in the Hadoop cluster environment[J].PLOS ONE,2019,14(4):1-18.
[27]ZHANG Q,LIU L.Whale Optimization Algorithm based on Lamarckian learning for global optimization problems[J].IEEE Access,2019,7:36642-36666.
[1] ZHANG Yuan-ming, YU Jia-rui, JIANG Jian-bo, LU Jia-wei, XIAO Gang. Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework [J]. Computer Science, 2021, 48(2): 41-46.
[2] ZHAO Yue, REN Yong-gong and LIU Yang. Improved Apriori Algorithm and Its Application Based on MapReduce [J]. Computer Science, 2017, 44(6): 250-254.
[3] CHEN De-hua,ZHOU Meng,SUN Yan-qing and ZHENG Liang-liang. MR-GSpar:A Distributed Large Graph Sparsification Algorithm Based on MapReduce [J]. Computer Science, 2013, 40(10): 190-193.
[4] . [J]. Computer Science, 2006, 33(6): 64-68.
Full text



No Suggested Reading articles found!