Computer Science ›› 2018, Vol. 45 ›› Issue (11): 298-303.doi: 10.11896/j.issn.1002-137X.2018.11.048

• Interdiscipline & Frontier • Previous Articles     Next Articles

Parallel CP Tensor Decomposition Algorithm Combining with GPU Technology

WU Yu, YAN Guang-hui, WANG Ya-fei, MA Qing-qing, LIU Yu-xuan   

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730030,China)
  • Received:2017-10-20 Published:2019-02-25

Abstract: With the emergence of high-dimensional data,tensor and tensor decomposition have draw widespread attention in the field of data alanlysis.The high dimensionality and sparsity of tensor data results in a high computational complexity of tensor methods,which becomes an obstacle to the application of tensor decomposition in practical.Many researchers have introduced the parallel computing methods to improve the efficiency of tensor decomposition algorithm.Based on the existing research,this paper presented a GPU parallel CP tensor decomposition algorithm through simply calculating Khatri-Rao product,called ParSCP-ALS algorithm.The experimental results show that the proposed ParSCP-ALS algorithm can effectively improve the computational efficiency of CP tensor decomposition compared with the existing parallel algorithm.On the Movielens data set,the ParSCP-ALS algorithm reduces the computational time by about 58% compared with the existing parallel algorithm.

Key words: CP tensor decomposition, CP-ALS, GPU, CUDA

CLC Number: 

  • TP391
[1]LI X T.Research on Key Technologies of Multi-Relational Graph Mining Based on Tensors[D].Harbin:Harbin Institute of Technology,2013.(in Chinese)
[2]YANG B,LIU D Y,LIU J M,et al.Complex Network Clustering Algorithms[J].Journal of Software,2009,20(1):54-66.(in Chinese)
杨博,刘大有,LIU J M,等.复杂网络聚类方法[J].软件学报,2009,20(1):54-66.
[3]MØRUP M.Applications of tensor (multiway array) factorizations and decompositions in data mining[J].Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery,2011,1(1):24-40.
[4]HITCHCOCK F L.The expression of a tensor or a polyadic as a sum of products[J].Journal of Mathematics and Physics,1927,6(1-4):164-189.
[5]KOLDA T G,BADER B W.Tensor Decompositions and Applications[J].Siam Review,2009,51(3):455-500.
[6]HARSHMAN R A.Foundations of the PARAFAC procedure:Model and conditions for an ‘explanatory’multi-mode factor analysis [J].UCLA Working Papers in Phonetics,1970,16:1-84.
[7]CARROLL J D,PRUZANSKY S,KRUSKAL J B.CAN- DELINC:a general approach to multi dimension analysis of many-way arrays with linear constraints on parameters[J].Psychometrika,1980,45(1):3-24.
[8]PAPALEXAKIS E E,FALOUTSOS C,SIDIROPOULOS N D.ParCube:Sparse Parallelizable Tensor Decompositions[J].Acm Transactions on Knowledge Discovery from Data,2012,10(1):521-536.
[9]ANTIKAINEN J,HAVEL J,JOSTH R,et al.Nonnegative Tensor Factorization Accelerated Using GPGPU[J].IEEE Transactions on Parallel & Distributed Systems,2011,22(7):1135-1144.
[10]ZOU B,LI C,TAN L,et al.GPUTENSOR:Efficient tensor factorization for context-aware recommendations[J].Information Sciences,2015,299:159-177.
[11]KANG U,PAPALEXAKIS E,HARPALE A,et al.GigaTensor:scaling tensor analysis up by 100 times-algorithms and discoveries[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:316-324.
[12]COOK S.CUDA Programming-A Developer’s Guide to Parallel Computing with GPUs[M].Beijing:China Mechine Press,2014.(in Chinese)
COOK S.CUDA并行程序设计[M].北京:机械工业出版社,2014.
[13]BELL N,GARLAND M.Efficient Sparse Matrix-Vector Multiplication on CUDA[OL].
[14]LEY M.DBLP:some lessons learned[J].Proceedings of the VLDB Endowment,2009,2(2):1493-1500.
[15]HARPER F M,KONSTAN J A.The MovieLens Datasets:History and Context[J].ACM Transactions on Internactive Intelligent Systmes(TiiS),2015,5(4):19.
[1] WANG Liang, ZHOU Xin-zhi, YNA Hua. Real-time SIFT Algorithm Based on GPU [J]. Computer Science, 2020, 47(8): 105-111.
[2] LIU Shi-fang, ZHAO Yong-hua, YU Tian-yu, HUANG Rong-feng. Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster [J]. Computer Science, 2020, 47(4): 6-12.
[3] ZUO Xian-yu, ZHANG Zhe, SU Yue-han, LIU Yang, GE Qiang, TIAN Jun-feng. Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model [J]. Computer Science, 2020, 47(4): 25-29.
[4] XU Xin-peng, HU Bin-xing. Fast Calculation Method of Aircraft Component Strength Check Based on ICCG [J]. Computer Science, 2020, 47(11A): 624-627.
[5] KANG Lin-yao, TANG Bing, XIA Yan-min, ZHANG Li. GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm [J]. Computer Science, 2019, 46(8): 106-110.
[6] ZHENG Hong-bo, SHI Hao, DU Yi-cheng, ZHANG Mei-yu, QIN Xu-jia. Fast Stripe Extraction Method for Structured Light Images with Uneven Illumination [J]. Computer Science, 2019, 46(5): 272-278.
[7] ZHANG Yi-ran, CHEN Long, AN Xiang-zhe, YAN Shen-gen. Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform [J]. Computer Science, 2019, 46(2): 306-314.
[8] LU Xian-hua, WANG Hong-jun. Design of Distributed News Clustering System Based on Big Data Computing Framework [J]. Computer Science, 2019, 46(11A): 220-223.
[9] ZHU Chao, WU Su-ping. Parallel Harris Feature Point Detection Algorithm [J]. Computer Science, 2019, 46(11A): 289-293.
[10] SU Qing-hua, FU Jing-chao, GU Han, ZHANG Shan-shan, LI Yi-fei, JIANG Fang-zhou, BAI Han-lin, ZHAO Di. Parallel Algorithm Design for Assisted Diagnosis of Prostate Cancer [J]. Computer Science, 2019, 46(11A): 524-527.
[11] LIU Yu-cheng, Richard·DING, ZHANG Ying-chao. Research on Pan-real-time Problem of Medical Detection Based on BPNNs Recognition Algorithm [J]. Computer Science, 2018, 45(6): 301-307.
[12] ZHANG Jie, WEN Min-hua, Jame LIN, MENG De-long and LU Hao. Implementation and Optimization of Historical VaR on GPU [J]. Computer Science, 2018, 45(5): 291-294.
[13] LIAO Xing, YUAN Jing-ling and CHEN Min-cheng. Parallel PSO Container Packing Algorithm with Adaptive Weight [J]. Computer Science, 2018, 45(3): 231-234.
[14] ZHOU Yun, JIANG Fu. Improved Marching Cubes Based on CUDA [J]. Computer Science, 2018, 45(11A): 573-575.
[15] LIU Duan-yang, ZHENG Jiang-fan, SHEN Guo-jiang, LIU Zhi. Study on Parallel K-means Algorithm Based on CUDA [J]. Computer Science, 2018, 45(11): 292-297.
Full text



[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[7] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[8] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[9] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[10] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .