计算机科学 ›› 2017, Vol. 44 ›› Issue (2): 65-69.doi: 10.11896/j.issn.1002-137X.2017.02.007
易秀双,刘勇,李婕,王兴伟
YI Xiu-shuang, LIU Yong, LI Jie and WANG Xing-wei
摘要: 随着MapReduce并行化框架的流行,各种数据挖掘算法的并行化也成为了当下研究的热点。主成分分析(Principle Components Analysis,PCA)算法的并行化也得到了越来越多的关注。通过对目前PCA算法的并行化研究的成果进行总结,发现这些PCA算法并行程度并不完全,特别是特征值计算过程。整个PCA算法流程分为两个阶段:相关系数矩阵求解阶段和矩阵的奇异值分解(Singular Value Decomposition,SVD)阶段。通过当前最流行的并行框架MapReduce,融合矩阵的QR分解,提出了一种奇异值分解的并行实现方法。利用随机产生的不同维度大小的双浮点矩阵比较并行奇异值分解相对传统串行环境下的算法效率的提升情况,并分析算法效率。之后,将并行奇异值分解融合到PCA算法中,同时提出相关系数矩阵的并行计算过程,将PCA计算的两个部分完全并行化。利用不同维度的矩阵对提出的并行PCA算法与已存在的未完全并行PCA算法、常规的PCA算法的运算速度进行比较,分析完全并行化PCA算法的加速比,最终得出所提算法在处理一定规模的大数据情况下的时间消耗要少许多。
[1] Z L H,G Z K.Face Recognition Method Based on Adaptively Weighted Block-Two Dimensional Principal Component Analysis[C]∥Third International Conference on Computational Intelligence,Communication Systems and Networks (CICSyN),2011.IEEE,2011:22-25 . [2] FAN C,CHEN X.Research of face recognition based on wavelettransform and principal component analysis[C]∥Eighth International Conference on Natural Computation (ICNC),2012.IEEE,2012:575-578. [3] LIM S T,YAP D F W,MANAP N A.Medical image compression using block-based PCA algorithm[C]∥IEEE 2014 International Conference on Computer,Communications,and Control Technology (I4CT),2014.IEEE,2014:171-175. [4] JIN C,VECCHIOLA C,BUYYA R.Mrpga:an extension of mapreduce for parallelizing genetic algorithms[C]∥IEEE Fourth International Conference on eScience,2008.IEEE,2008:214-221. [5] XIA H M,ZHOU Y Q.A new solution to solve the eigenvalue and eigenvector of mateix[J].Computer engineering,2008,6(11):189-191.(in Chinese) 夏慧明,周永权.求解矩阵特征值和特征向量的新方法[J].计算机工程,2008,6(11):189-191. [6] OORDONEZ C,MOHANAM N,GARCIA-ALVARADO C.PCAfor large data sets with parallel data summarization[J].Distri-buted and Parallel Databases,2014,32(3):377-403. [7] DEMMEL J W.Applied numerical linear algebra[M].Siam,1997. [8] BEˇKA M,OKA G,VAJTERIC M.Parallel Block-JacobiSVD Methods[M]∥High-Performance Scientific Computing.Springer London,2012:185-197. [9] DUMAIS S T.Latent semantic analysis[J].Annual review of information science and technology,2004,38(1):188-230. [10] ALTER O,BROWN P O,BOTSTEIN D.Singular value decomposition for genome-wide expression data processing and mode-ling[C]∥Proceedings of the National Academy of Sciences,2000,97(18):10101-10106. [11] MA L,LI C,SONG S.Robustness analysis of watermarkingspectral images with PCA-SVD under various illumination conditions[C]∥IEEE International Conference on Information and Automation (ICIA),2010.IEEE,2010:1835-1839. [12] ZHAO Y P,LI B,LI Y B,et al.Householder transformationbased sparse least squares support vector regression[J].Neurocomputing,2015,161:243-253. [13] BALLARD G,DEMMEL J,GRIGORI L,et al.Reconstructing Householder vectors from tall-skinny QR[C]∥2014 IEEE 28th International Parallel and Distributed Processing Symposium,2014.IEEE,2014:1159-1170. [14] XIONG K,HE Y.Power-effiicent resource allocation in MapReduce clusters[C]∥2013 IFIP/IEEE International Symposium on Integrated Network Management,2013.IEEE,2013:603-608. [15] ZHANG T,WANG C,LI W.Distributed parallel PCA algorithm in the application of large sample data set[EB/OL].http://www.paper.educn/releasepaper/content/201112-636(in Chinese) 张涛,王纯,李炜.分布式并行PCA算法在大样本数据集中的应用[EB/OL].http://www.paper.educn/releasepaper/content/201112-636. |
No related articles found! |
|