Computer Science ›› 2017, Vol. 44 ›› Issue (2): 65-69.doi: 10.11896/j.issn.1002-137X.2017.02.007

Previous Articles     Next Articles

Research of Distributed Principle Components Analysis Algorithm Based on MapReduce

YI Xiu-shuang, LIU Yong, LI Jie and WANG Xing-wei   

  • Online:2018-11-13 Published:2018-11-13

Abstract: With the population of the parallel framework of MapReduce,the parallelization of various types of Data Mi-ning algorithms is becoming a hot area of research.Principle components analysis(PCA) algorithm is getting more and more attention too.Summarizing the recent research result of the parallelization of PCA,we found that these PCA algorithms are not fully parallelized,especially the process of calculating the eigenvalue of the matrix.Whole process of PCA algorithm is divided into two stages,which are solution of the correlation coefficient matrix and the singular value decomposition of the matrix.Through the combination of the most popular MapReduce parallel framework and the QR decomposition of matrix,a new way to parallel the SVD was proposed in this paper.Analyzing the calculation speed of the parallel algorithm through the experiment on the data set which is consisted of random produced double floating point matrix of different dimensions,the result with the traditional serial algorithm was compared to show the efficiency improvement of the mentioned algorithm.Then we integrated the SVD algorithm into the PCA algorithm,and proposed the parallel computing process of the correlation coefficient matrix,and this will parallel the two stages of PCA algorithm.Subsequently,we conducted the comparison between the existing not fully parallelized PCA algorithm and normal PCA algorithm with the proposed algorithm on different dimensions of the matrix.Then analying the speed-up ratio of the proposed algorithm,we can find that our algorithm will consume less time when processing massive data set.

Key words: Principle components analysis,SVD,MapReduce

[1] Z L H,G Z K.Face Recognition Method Based on Adaptively Weighted Block-Two Dimensional Principal Component Analysis[C]∥Third International Conference on Computational Intelligence,Communication Systems and Networks (CICSyN),2011.IEEE,2011:22-25 .
[2] FAN C,CHEN X.Research of face recognition based on wavelettransform and principal component analysis[C]∥Eighth International Conference on Natural Computation (ICNC),2012.IEEE,2012:575-578.
[3] LIM S T,YAP D F W,MANAP N A.Medical image compression using block-based PCA algorithm[C]∥IEEE 2014 International Conference on Computer,Communications,and Control Technology (I4CT),2014.IEEE,2014:171-175.
[4] JIN C,VECCHIOLA C,BUYYA R.Mrpga:an extension of mapreduce for parallelizing genetic algorithms[C]∥IEEE Fourth International Conference on eScience,2008.IEEE,2008:214-221.
[5] XIA H M,ZHOU Y Q.A new solution to solve the eigenvalue and eigenvector of mateix[J].Computer engineering,2008,6(11):189-191.(in Chinese) 夏慧明,周永权.求解矩阵特征值和特征向量的新方法[J].计算机工程,2008,6(11):189-191.
[6] OORDONEZ C,MOHANAM N,GARCIA-ALVARADO C.PCAfor large data sets with parallel data summarization[J].Distri-buted and Parallel Databases,2014,32(3):377-403.
[7] DEMMEL J W.Applied numerical linear algebra[M].Siam,1997.
[8] BEˇKA M,OKA G,VAJTERIC M.Parallel Block-JacobiSVD Methods[M]∥High-Performance Scientific Computing.Springer London,2012:185-197.
[9] DUMAIS S T.Latent semantic analysis[J].Annual review of information science and technology,2004,38(1):188-230.
[10] ALTER O,BROWN P O,BOTSTEIN D.Singular value decomposition for genome-wide expression data processing and mode-ling[C]∥Proceedings of the National Academy of Sciences,2000,97(18):10101-10106.
[11] MA L,LI C,SONG S.Robustness analysis of watermarkingspectral images with PCA-SVD under various illumination conditions[C]∥IEEE International Conference on Information and Automation (ICIA),2010.IEEE,2010:1835-1839.
[12] ZHAO Y P,LI B,LI Y B,et al.Householder transformationbased sparse least squares support vector regression[J].Neurocomputing,2015,161:243-253.
[13] BALLARD G,DEMMEL J,GRIGORI L,et al.Reconstructing Householder vectors from tall-skinny QR[C]∥2014 IEEE 28th International Parallel and Distributed Processing Symposium,2014.IEEE,2014:1159-1170.
[14] XIONG K,HE Y.Power-effiicent resource allocation in MapReduce clusters[C]∥2013 IFIP/IEEE International Symposium on Integrated Network Management,2013.IEEE,2013:603-608.
[15] ZHANG T,WANG C,LI W.Distributed parallel PCA algorithm in the application of large sample data set[EB/OL].http://www.paper.educn/releasepaper/content/201112-636(in Chinese) 张涛,王纯,李炜.分布式并行PCA算法在大样本数据集中的应用[EB/OL].http://www.paper.educn/releasepaper/content/201112-636.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!