计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 64-72.doi: 10.11896/jsjkx.210500040

• 数据库&大数据&数据科学* 上一篇    下一篇

基于聚类和WOA的并行支持向量机算法

刘卫明, 安冉, 毛伊敏   

  1. 江西理工大学信息工程学院 江西 赣州341000
  • 收稿日期:2021-04-30 修回日期:2021-05-17 出版日期:2022-07-15 发布日期:2022-07-12
  • 通讯作者: 毛伊敏(mymlyc@163.com)
  • 作者简介:(m9178mar_45@126.com)
  • 基金资助:
    国家自然科学基金(41562019);国家重点研发计划(2018YFC1504705);江西省教育厅科技项目(GJJ151528,GJJ151531)

Parallel Support Vector Machine Algorithm Based on Clustering and WOA

LIU Wei-ming, AN Ran, MAO Yi-min   

  1. School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China
  • Received:2021-04-30 Revised:2021-05-17 Online:2022-07-15 Published:2022-07-12
  • About author:LIU Wei-ming,born in 1964,professor,master supervisor.His main research interests include data mining,big data and so on.
    MAO Yi-min,born in 1970,Ph.D,professor,master supervisor.Her main research interests include data mining,big data and so on.
  • Supported by:
    National Natural Science Foundation of China(41562019),National Key Research and Development of China(2018YFC1504705) and Science and Technology Foundation of Jiangxi Province(GJJ151528,GJJ151531).

摘要: 针对并行SVM在大数据环境下对冗余数据敏感、参数寻优能力差以及并行过程中出现的负载不均衡等问题,提出了一种基于聚类算法和鲸鱼优化算法的并行支持向量机算法MR-KWSVM。首先,该算法提出KF策略来删减冗余数据,利用删减冗余数据后的数据集训练SVM,降低SVM对冗余数据的敏感性;其次,提出了基于非线性收敛因子和自适应惯性权重的鲸鱼智能优化算法IW-BNAW,利用“IW-BNAW”算法获取SVM的最优参数,提高支持向量机的参数寻优能力;最后,在利用MapReduce构造并行SVM的过程中,提出时间反馈策略用于reduce节点的负载调度,提高了集群的并行效率,实现了高并行的SVM。实验结果表明,所提算法不仅保证了SVM在大数据环境下的高并行计算能力,SVM的分类准确度也有明显提高,并且具有更好的泛化性能。

关键词: IW_BNAW算法, KF策略, MapReduce框架, SVM算法, TFB策略

Abstract: Aiming at the problems of parallel support vector machine(SVM) being sensitive to redundant data,poor parameter optimization ability and load imbalance in parallel process in the big data environment,a parallel support vector machine algorithm—MR-KWSVM,based on clustering algorithm and whale optimization algorithm,is proposed.Firstly,the algorithm proposes K-means and fisher(KF) strategy to delete redundant data,and trains SVM with the data set after the redundant data is deleted,which effectively reduces the sensitivity of SVM to redundant data.Secondly,the improved whale optimization algorithm based on nonlinear convergence factor and self-adaptive inertia weight(IW-BNAW) is proposed,and the IW-BNAW algorithm is used to obtain the SVM optimal parameters and improve the parameter optimization ability of the support vector machine.Finally,in the process of constructing parallel SVM with MapReduce,a time feedback strategy(TFB) is proposed for load scheduling of reduce nodes,which improves the parallel efficiency of the cluster and achieves high parallel SVM.Experiment results show that the proposed algorithm not only guarantees the high parallel computing power of SVM in big data environment,but also significantly improves the classification accuracy of SVM,and it has better generalization performance.

Key words: IW-BNAW algorithm, KF strategy, MapReduce frame, SVM algorithm, TFB strategy

中图分类号: 

  • TP338
[1]SHI Q,ZHANG H.Fault diagnosis of an autonomous vehicle with an improved SVM algorithm subject to unbalanced datasets[J].IEEE Transactions on Industrial Electronics,2020,68(7):6248-6256.
[2]CAO J M.Research on network security framework for big data based hyper-heuristic SVM[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(1):23-29.
[3]ZHU G T,PAN X L.Research on the Early Warning for Online Public Opinion Crisis Based on Factor Analysis and SVM[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2020,37(5):94-100.
[4]ALTHNIAN A,ALOBOUD N,ALKHARASHI N,et al.Face Gender Recognition in the Wild:An Extensive Performance Comparison of Deep-Learned,Hand-Crafted,and Fused Features with Deep and Traditional Models[J].Applied Sciences,2020,11(1):89.
[5]AKBAL A.A local knit pattern-based automated fault classification method for the cooling system of the data center[J].Applied Acoustics,2021,176:107888.
[6]PINHEIRO R H W,CAVALCANTI G D C,TSANG I R.Combining binary classifiers in different dichotomy spaces for text categorization[J].Applied Soft Computing,2019,76:564-574.
[7]ZHAO Q.Social emotion classification of Japanese text information based on SVM and KNN[J].Journal of Ambient Intelligence and Humanized Computing,2021(8):1-12.
[8]RYZHIKOVA E,RALBOVSKY N M,SIKIRZHYTSKI V,et al.Raman spectroscopy and machine learning for biomedical applications:Alzheimer's disease diagnosis based on the analysis of cerebrospinal fluid[J].Spectrochimica Acta Part A:Molecular and Biomolecular Spectroscopy,2021,248:119188.
[9]PETRILLO U F,PALINI F,CATTANEO G,et al.FASTA/Q data compressors for MapReduce-Hadoop genomics:space and time savings made easy[J].BMC Bioinformatics,2021,22(1):1-21.
[10]SHEN C P,LIN J W,LIN F S,et al.GA-SVM modeling of multiclass seizure detector in epilepsy analysis system using cloud computing[J].Soft Computing-A Fusion of Foundations,Methodologies and Applications,2017,21(8):2139-2149.
[11]THAKUR R K,DESHPANDE M V.Kernel Optimized Support Vector Machine and MapReduce Framework for Sentiment Classification of Train Reviews[J].International Journal of Uncertainty,Fuzziness and Knowledge-based Systems,2019,27(6):1025-1050.
[12]DING X,HUANG W,GUO Y,et al.Parallel Recombined Support Vector Machine Based on MapReduce and Bagging[J].Journal of Information Engineering University,2018,1902:196-202,208.
[13]WANG R,XIANG X,XIAO B S.A Distributed SVM Algorithm Optimization of Clustering[J].Journal of Air Force Engineering University(Natural Science Edition),2018,19(2):86-92.
[14]ÁLVAREZ-ALVARADO J M,RÍOS-MORENO J G,OBRE-GÓN-BIOSCA S A,et al.Hybrid Techniques to Predict Solar Radiation Using Support Vector Machine and Search Optimization Algorithms:A Review[J].Applied Sciences(2076-3417),2021,11(3):1044.
[15]MAN W S,JI Y Y.Research on Distributed SVM Classification Based on Hadoop Platform[J].Computer Systems & Applications,2017,26(8):141-146.
[16]HU J,MA D,LIU C,et al.Network security situation prediction based on MR-SVM[J].IEEE Access,2019,7:130937-130945.
[17]GUO R,ZHANG F,WANG L,et al.BaPa:A Novel Approach of Improving Load Balance in Parallel Matrix Factorization for Recommender Systems[J].IEEE Transactions on Computers,2020,70(5):789-802.
[18]GUO W,ALHAM N K,LIU Y,et al.A Resource AwareMapReduce Based Parallel SVM for Large Scale Image Classifications[J].Neural Processing Letters,2016,44(1):161-184.
[19]PRAKASH P,RAJKUMAR N.Improved local fisher discriminant analysis based dimensionality reduction for cancer disease prediction[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(7):8083-8098.
[20]TSAO C Y,CHEN T Y.Pythagorean fuzzy likelihood function based on beta distributions and its based dominance ordering model in an uncertain multiple criteria decision support framework[J].International Journal of Intelligent Systems,2021,36(6):2680-2729.
[21]WANG M,YAN Z,LUO J,et al.A band selection approach based on wavelet support vector machine ensemble model and membrane whale optimization algorithm for hyperspectral image[J].Applied Intelligence,2021,51(11):7766-7780.
[22]SINHA A,JANA P K.A hybrid MapReduce-based k-meansclustering using genetic algorithm for distributed datasets[J].The Journal of Supercomputing,2018,74(4):1562-1579.
[23]ZHENG B,MA X.Application on Damage Types Recognition in Civil Aeroengine Based on SVM Optimized by DMPSO[J].Computer Science,2020,47(S2):132-138.
[24]ANGAYARKANNI S A,SIVAKUMAR R,RAMANA R Y V.Hybrid Grey Wolf:Bald Eagle search optimized support vector regression for traffic flow forecasting[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(1):1293-1304.
[25]BAGUI S,DEVULAPALLI K,COFFEY J.A HeuristicApproach for Load Balancing the FP-Growth Algorithm on MapReduce[J].Array,2020,7:100035.
[26]CAO J,WANG M,LI Y,et al.Improved support vector machine classification algorithm based on adaptive feature weight updating in the Hadoop cluster environment[J].PLOS ONE,2019,14(4):1-18.
[27]ZHANG Q,LIU L.Whale Optimization Algorithm based on Lamarckian learning for global optimization problems[J].IEEE Access,2019,7:36642-36666.
[1] 张元鸣, 虞家睿, 蒋建波, 陆佳炜, 肖刚.
面向MapReduce的中间数据传输流水线优化机制
Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework
计算机科学, 2021, 48(2): 41-46. https://doi.org/10.11896/jsjkx.191000103
[2] 赵月,任永功,刘洋.
基于MapReduce的改进的Apriori算法及其应用研究
Improved Apriori Algorithm and Its Application Based on MapReduce
计算机科学, 2017, 44(6): 250-254. https://doi.org/10.11896/j.issn.1002-137X.2017.06.043
[3] 丁霄云,刘功申,孟魁.
基于一类SVM的不良信息过滤算法改进
Research and Improvement of Filter Algorithm of Malicious Information Based on One-class SVM
计算机科学, 2013, 40(Z11): 86-90.
[4] 陈德华,周蒙,孙延青,郑亮亮.
MR-GSpar:一种基于MapReduce的大图稀疏化算法
MR-GSpar:A Distributed Large Graph Sparsification Algorithm Based on MapReduce
计算机科学, 2013, 40(10): 190-193.
[5] .
一种基于粗糙集属性约简的支持向量异常入侵检测方法

计算机科学, 2006, 33(6): 64-68.
[6] 李晓东 何松柏 李春光 虞厥邦.
WLS—SVM算法用于DCSK通信系统降噪

计算机科学, 2005, 32(8): 142-144.
[7] 刘芳 梁雪峰.
一种基于免疫算子的SVM算法

计算机科学, 2004, 31(2): 109-110.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!