Computer Science ›› 2017, Vol. 44 ›› Issue (12): 33-37.doi: 10.11896/j.issn.1002-137X.2017.12.006

Previous Articles     Next Articles

Study of ELM Algorithm Parallelization Based on Spark

LIU Peng, WANG Xue-kui, HUANG Yi-hua, MENG Lei and DING En-jie   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Extreme learning mechine(ELM) has high training speed,but with lots of matrix operations, it remains poor efficiency while applied to massive amount of data.After thorough research on parallel computation of Spark resilient distributed dataset (RDD),we proposed and implemented a parallelized algorithm of ELM based on Spark.And for convenience of performance comparison,Hadoop-MapReduce-based version was also implemented.Experimental results show that the training efficiency of the Spark-based ELM parallelization algorithm is significantly improved than the Hadoop-MapReduce-based version.If the amount of data processed is greater,the advantage of Spark in efficiency is more obvious.

Key words: ELM,Parallelization,Spark,RDD,Hadoop,MapReduce

[1] HUANG G B,ZHU Q Y,SIEW C K.Extreme learning ma-chine:theory and applications[J].Neurocomputing,2006,0(1):489-501.
[2] HUANG G B,WANG D H,YUAN L.Extreme learning machines:a survey [J].Int.J.Mach.Learn.& Cyber,2011,2(2):107-122.
[3] HUANG G B,DING X J,ZHOU H M.Optimization method based extreme learning machine for classification[J].Neurocomputing,2010,4(1):155-163.
[4] HE Q,SHANG T F,ZHUANG F Z,et al.Parallel extreme learning machine for regression based on MapReduce[J].Neurocomputing,2013,2(1):52-58.
[5] 安俊秀,王鹏,靳宇倡.Hadoop大数据处理技术基础与实践[M].北京:人民邮电出版社,2015:15-45.
[6] 王晓华.MapReduce2.0源码分析与编程实践[M].北京:人民邮电出版社,2014:21-60.
[7] CHEN J,CHEN H,WAN X Y,et al.MR-ELM:a MapReduce-based framework for large-scale ELM training in big data era [J].Neural Computing & Applications,2016,7(1):101-110.
[8] 夏俊鸾,刘旭辉,邵赛赛,等.Spark大数据处理技术[M].北京:电子工业出版社,2015.
[9] Pentreath N.Spark机器学习[M].蔡立宇,等译.北京:人民邮电出版社,2015:32-56.
[10] LIU Z Q,GU R,YUAN C,et al.Review of the parallelization of the classification algorithm based on SparkR[J].Journal of Frontiers of Computer Science and Technology,2015,9(11):1281-1294.(in Chinese) 刘志强,顾荣,袁春,等.基于SparkR的分类算法并行化研究[J].计算机科学与探索,2015,9(11):1281-1294.
[11] FERRARI S,STENGEL R F.Smooth function approximationusing neural networks [J].IEEE Transactions on Neural Networks,2005,6(1):24-38.
[12] HUANG Y H,GU R,GAO X K.The method of parallelization of the computing of the inverse matrix of distributed dense matrix based on Spark:China,CN 105373517 A[P].2016-03-02.(in Chinese) 黄宜华,顾荣,高兴坤.基于Spark的分布式稠密矩阵求逆并行化运算方法:中国,CN 105373517 A[P].2016-03-02.

No related articles found!
Full text



[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[3] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[4] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[5] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[6] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[7] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[8] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[9] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[10] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .