计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 64-72.doi: 10.11896/jsjkx.210500040
刘卫明, 安冉, 毛伊敏
LIU Wei-ming, AN Ran, MAO Yi-min
摘要: 针对并行SVM在大数据环境下对冗余数据敏感、参数寻优能力差以及并行过程中出现的负载不均衡等问题,提出了一种基于聚类算法和鲸鱼优化算法的并行支持向量机算法MR-KWSVM。首先,该算法提出KF策略来删减冗余数据,利用删减冗余数据后的数据集训练SVM,降低SVM对冗余数据的敏感性;其次,提出了基于非线性收敛因子和自适应惯性权重的鲸鱼智能优化算法IW-BNAW,利用“IW-BNAW”算法获取SVM的最优参数,提高支持向量机的参数寻优能力;最后,在利用MapReduce构造并行SVM的过程中,提出时间反馈策略用于reduce节点的负载调度,提高了集群的并行效率,实现了高并行的SVM。实验结果表明,所提算法不仅保证了SVM在大数据环境下的高并行计算能力,SVM的分类准确度也有明显提高,并且具有更好的泛化性能。
中图分类号:
[1]SHI Q,ZHANG H.Fault diagnosis of an autonomous vehicle with an improved SVM algorithm subject to unbalanced datasets[J].IEEE Transactions on Industrial Electronics,2020,68(7):6248-6256. [2]CAO J M.Research on network security framework for big data based hyper-heuristic SVM[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(1):23-29. [3]ZHU G T,PAN X L.Research on the Early Warning for Online Public Opinion Crisis Based on Factor Analysis and SVM[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2020,37(5):94-100. [4]ALTHNIAN A,ALOBOUD N,ALKHARASHI N,et al.Face Gender Recognition in the Wild:An Extensive Performance Comparison of Deep-Learned,Hand-Crafted,and Fused Features with Deep and Traditional Models[J].Applied Sciences,2020,11(1):89. [5]AKBAL A.A local knit pattern-based automated fault classification method for the cooling system of the data center[J].Applied Acoustics,2021,176:107888. [6]PINHEIRO R H W,CAVALCANTI G D C,TSANG I R.Combining binary classifiers in different dichotomy spaces for text categorization[J].Applied Soft Computing,2019,76:564-574. [7]ZHAO Q.Social emotion classification of Japanese text information based on SVM and KNN[J].Journal of Ambient Intelligence and Humanized Computing,2021(8):1-12. [8]RYZHIKOVA E,RALBOVSKY N M,SIKIRZHYTSKI V,et al.Raman spectroscopy and machine learning for biomedical applications:Alzheimer's disease diagnosis based on the analysis of cerebrospinal fluid[J].Spectrochimica Acta Part A:Molecular and Biomolecular Spectroscopy,2021,248:119188. [9]PETRILLO U F,PALINI F,CATTANEO G,et al.FASTA/Q data compressors for MapReduce-Hadoop genomics:space and time savings made easy[J].BMC Bioinformatics,2021,22(1):1-21. [10]SHEN C P,LIN J W,LIN F S,et al.GA-SVM modeling of multiclass seizure detector in epilepsy analysis system using cloud computing[J].Soft Computing-A Fusion of Foundations,Methodologies and Applications,2017,21(8):2139-2149. [11]THAKUR R K,DESHPANDE M V.Kernel Optimized Support Vector Machine and MapReduce Framework for Sentiment Classification of Train Reviews[J].International Journal of Uncertainty,Fuzziness and Knowledge-based Systems,2019,27(6):1025-1050. [12]DING X,HUANG W,GUO Y,et al.Parallel Recombined Support Vector Machine Based on MapReduce and Bagging[J].Journal of Information Engineering University,2018,1902:196-202,208. [13]WANG R,XIANG X,XIAO B S.A Distributed SVM Algorithm Optimization of Clustering[J].Journal of Air Force Engineering University(Natural Science Edition),2018,19(2):86-92. [14]ÁLVAREZ-ALVARADO J M,RÍOS-MORENO J G,OBRE-GÓN-BIOSCA S A,et al.Hybrid Techniques to Predict Solar Radiation Using Support Vector Machine and Search Optimization Algorithms:A Review[J].Applied Sciences(2076-3417),2021,11(3):1044. [15]MAN W S,JI Y Y.Research on Distributed SVM Classification Based on Hadoop Platform[J].Computer Systems & Applications,2017,26(8):141-146. [16]HU J,MA D,LIU C,et al.Network security situation prediction based on MR-SVM[J].IEEE Access,2019,7:130937-130945. [17]GUO R,ZHANG F,WANG L,et al.BaPa:A Novel Approach of Improving Load Balance in Parallel Matrix Factorization for Recommender Systems[J].IEEE Transactions on Computers,2020,70(5):789-802. [18]GUO W,ALHAM N K,LIU Y,et al.A Resource AwareMapReduce Based Parallel SVM for Large Scale Image Classifications[J].Neural Processing Letters,2016,44(1):161-184. [19]PRAKASH P,RAJKUMAR N.Improved local fisher discriminant analysis based dimensionality reduction for cancer disease prediction[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(7):8083-8098. [20]TSAO C Y,CHEN T Y.Pythagorean fuzzy likelihood function based on beta distributions and its based dominance ordering model in an uncertain multiple criteria decision support framework[J].International Journal of Intelligent Systems,2021,36(6):2680-2729. [21]WANG M,YAN Z,LUO J,et al.A band selection approach based on wavelet support vector machine ensemble model and membrane whale optimization algorithm for hyperspectral image[J].Applied Intelligence,2021,51(11):7766-7780. [22]SINHA A,JANA P K.A hybrid MapReduce-based k-meansclustering using genetic algorithm for distributed datasets[J].The Journal of Supercomputing,2018,74(4):1562-1579. [23]ZHENG B,MA X.Application on Damage Types Recognition in Civil Aeroengine Based on SVM Optimized by DMPSO[J].Computer Science,2020,47(S2):132-138. [24]ANGAYARKANNI S A,SIVAKUMAR R,RAMANA R Y V.Hybrid Grey Wolf:Bald Eagle search optimized support vector regression for traffic flow forecasting[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(1):1293-1304. [25]BAGUI S,DEVULAPALLI K,COFFEY J.A HeuristicApproach for Load Balancing the FP-Growth Algorithm on MapReduce[J].Array,2020,7:100035. [26]CAO J,WANG M,LI Y,et al.Improved support vector machine classification algorithm based on adaptive feature weight updating in the Hadoop cluster environment[J].PLOS ONE,2019,14(4):1-18. [27]ZHANG Q,LIU L.Whale Optimization Algorithm based on Lamarckian learning for global optimization problems[J].IEEE Access,2019,7:36642-36666. |
[1] | 张元鸣, 虞家睿, 蒋建波, 陆佳炜, 肖刚. 面向MapReduce的中间数据传输流水线优化机制 Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework 计算机科学, 2021, 48(2): 41-46. https://doi.org/10.11896/jsjkx.191000103 |
[2] | 赵月,任永功,刘洋. 基于MapReduce的改进的Apriori算法及其应用研究 Improved Apriori Algorithm and Its Application Based on MapReduce 计算机科学, 2017, 44(6): 250-254. https://doi.org/10.11896/j.issn.1002-137X.2017.06.043 |
[3] | 丁霄云,刘功申,孟魁. 基于一类SVM的不良信息过滤算法改进 Research and Improvement of Filter Algorithm of Malicious Information Based on One-class SVM 计算机科学, 2013, 40(Z11): 86-90. |
[4] | 陈德华,周蒙,孙延青,郑亮亮. MR-GSpar:一种基于MapReduce的大图稀疏化算法 MR-GSpar:A Distributed Large Graph Sparsification Algorithm Based on MapReduce 计算机科学, 2013, 40(10): 190-193. |
[5] | . 一种基于粗糙集属性约简的支持向量异常入侵检测方法 计算机科学, 2006, 33(6): 64-68. |
[6] | 李晓东 何松柏 李春光 虞厥邦. WLS—SVM算法用于DCSK通信系统降噪 计算机科学, 2005, 32(8): 142-144. |
[7] | 刘芳 梁雪峰. 一种基于免疫算子的SVM算法 计算机科学, 2004, 31(2): 109-110. |
|