计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 298-303.doi: 10.11896/j.issn.1002-137X.2018.11.048

• 交叉与前沿 • 上一篇    下一篇

结合GPU技术的并行CP张量分解算法

武昱, 闫光辉, 王雅斐, 马青青, 刘宇轩   

  1. (兰州交通大学电子与信息工程学院 兰州730030)
  • 收稿日期:2017-10-20 发布日期:2019-02-25
  • 作者简介:武 昱(1993-),男,硕士,主要研究方向为张量分解、并行计算;闫光辉(1970-),男,教授,博士生导师,CCF会员,主要研究方向为数据挖掘、复杂网络、社交网络等,E-mail:yan_guanghui@163.com(通信作者);王雅斐(1992-),女,硕士,主要研究方向为社区发现、节点重要性;马青青(1993-),女,硕士,主要研究方向为社区发现、链路预测;刘宇轩(1992-),男,硕士,主要研究方向为社区发现、离群点检测。
  • 基金资助:
    本文受国家自然科学基金项目(61662066,61163010)资助。

Parallel CP Tensor Decomposition Algorithm Combining with GPU Technology

WU Yu, YAN Guang-hui, WANG Ya-fei, MA Qing-qing, LIU Yu-xuan   

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730030,China)
  • Received:2017-10-20 Published:2019-02-25

摘要: 随着高维数据的涌现,张量和张量分解方法在数据分析领域中受到了广泛关注。然而,张量数据的高维度和稀疏特性,导致算法的复杂度较高,阻碍了张量分解算法在实际中的应用。许多学者通过引入并行计算来提升张量分解算法的计算效率。在现有研究的基础上,给出一种简化计算Khatri-Rao乘积的GPU并行CP张量分解算法,称为ParSCP-ALS。在模拟数据集和真实数据集上的实验结果显示,相比现有并行算法,文中设计的ParSCP-ALS算法能有效提高CP张量分解的计算效率,其中在Movielens数据集上的计算时间减少了约58%。

关键词: CP-ALS算法, CP张量分解, CUDA, GPU

Abstract: With the emergence of high-dimensional data,tensor and tensor decomposition have draw widespread attention in the field of data alanlysis.The high dimensionality and sparsity of tensor data results in a high computational complexity of tensor methods,which becomes an obstacle to the application of tensor decomposition in practical.Many researchers have introduced the parallel computing methods to improve the efficiency of tensor decomposition algorithm.Based on the existing research,this paper presented a GPU parallel CP tensor decomposition algorithm through simply calculating Khatri-Rao product,called ParSCP-ALS algorithm.The experimental results show that the proposed ParSCP-ALS algorithm can effectively improve the computational efficiency of CP tensor decomposition compared with the existing parallel algorithm.On the Movielens data set,the ParSCP-ALS algorithm reduces the computational time by about 58% compared with the existing parallel algorithm.

Key words: CP tensor decomposition, CP-ALS, CUDA, GPU

中图分类号: 

  • TP391
[1]LI X T.Research on Key Technologies of Multi-Relational Graph Mining Based on Tensors[D].Harbin:Harbin Institute of Technology,2013.(in Chinese)
李旭涛.基于张量的多关系图挖掘关键技术研究[D].哈尔滨:哈尔滨工业大学,2013.
[2]YANG B,LIU D Y,LIU J M,et al.Complex Network Clustering Algorithms[J].Journal of Software,2009,20(1):54-66.(in Chinese)
杨博,刘大有,LIU J M,等.复杂网络聚类方法[J].软件学报,2009,20(1):54-66.
[3]MØRUP M.Applications of tensor (multiway array) factorizations and decompositions in data mining[J].Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery,2011,1(1):24-40.
[4]HITCHCOCK F L.The expression of a tensor or a polyadic as a sum of products[J].Journal of Mathematics and Physics,1927,6(1-4):164-189.
[5]KOLDA T G,BADER B W.Tensor Decompositions and Applications[J].Siam Review,2009,51(3):455-500.
[6]HARSHMAN R A.Foundations of the PARAFAC procedure:Model and conditions for an ‘explanatory’multi-mode factor analysis [J].UCLA Working Papers in Phonetics,1970,16:1-84.
[7]CARROLL J D,PRUZANSKY S,KRUSKAL J B.CAN- DELINC:a general approach to multi dimension analysis of many-way arrays with linear constraints on parameters[J].Psychometrika,1980,45(1):3-24.
[8]PAPALEXAKIS E E,FALOUTSOS C,SIDIROPOULOS N D.ParCube:Sparse Parallelizable Tensor Decompositions[J].Acm Transactions on Knowledge Discovery from Data,2012,10(1):521-536.
[9]ANTIKAINEN J,HAVEL J,JOSTH R,et al.Nonnegative Tensor Factorization Accelerated Using GPGPU[J].IEEE Transactions on Parallel & Distributed Systems,2011,22(7):1135-1144.
[10]ZOU B,LI C,TAN L,et al.GPUTENSOR:Efficient tensor factorization for context-aware recommendations[J].Information Sciences,2015,299:159-177.
[11]KANG U,PAPALEXAKIS E,HARPALE A,et al.GigaTensor:scaling tensor analysis up by 100 times-algorithms and discoveries[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:316-324.
[12]COOK S.CUDA Programming-A Developer’s Guide to Parallel Computing with GPUs[M].Beijing:China Mechine Press,2014.(in Chinese)
COOK S.CUDA并行程序设计[M].北京:机械工业出版社,2014.
[13]BELL N,GARLAND M.Efficient Sparse Matrix-Vector Multiplication on CUDA[OL].http://www.users.csbsju.edu/~mheroux/fau2013_csci317/SPMV-Garland-Bell.pdf.
[14]LEY M.DBLP:some lessons learned[J].Proceedings of the VLDB Endowment,2009,2(2):1493-1500.
[15]HARPER F M,KONSTAN J A.The MovieLens Datasets:History and Context[J].ACM Transactions on Internactive Intelligent Systmes(TiiS),2015,5(4):19.
[1] 宗迪迪, 谢益武.
基于法线迭代的模型中轴生成方法
Model Medial Axis Generation Method Based on Normal Iteration
计算机科学, 2022, 49(6A): 764-770. https://doi.org/10.11896/jsjkx.210400050
[2] 汪晋, 刘江.
基于GPU的并行DILU预处理技术
GPU-based Parallel DILU Preconditioning Technique
计算机科学, 2022, 49(6): 108-118. https://doi.org/10.11896/jsjkx.210300259
[3] 李繁, 严星, 张晓宇.
基于GPU的特征脸算法优化研究
Optimization of GPU-based Eigenface Algorithm
计算机科学, 2021, 48(4): 197-204. https://doi.org/10.11896/jsjkx.200600033
[4] 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立.
基于GPU加速的并行WMD算法
Parallel WMD Algorithm Based on GPU Acceleration
计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[5] 文敏华, 汪申鹏, 韦建文, 李林颖, 张斌, 林新华.
基于DGX-2的湍流燃烧问题优化研究
DGX-2 Based Optimization of Application for Turbulent Combustion
计算机科学, 2021, 48(12): 43-48. https://doi.org/10.11896/jsjkx.201200129
[6] 汪亮, 周新志, 严华.
基于GPU的实时SIFT算法
Real-time SIFT Algorithm Based on GPU
计算机科学, 2020, 47(8): 105-111. https://doi.org/10.11896/jsjkx.190700036
[7] 刘世芳, 赵永华, 于天禹, 黄荣锋.
广义稠密对称特征问题标准化算法在GPU集群上的有效实现
Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster
计算机科学, 2020, 47(4): 6-12. https://doi.org/10.11896/jsjkx.191000009
[8] 左宪禹, 张哲, 苏岳瀚, 刘扬, 葛强, 田军锋.
基于GPU多流并发并行模型的NDVI提取算法
Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model
计算机科学, 2020, 47(4): 25-29. https://doi.org/10.11896/jsjkx.190500029
[9] 许新鹏, 胡斌星.
基于ICCG法的飞行器部件强度校核快速计算方法
Fast Calculation Method of Aircraft Component Strength Check Based on ICCG
计算机科学, 2020, 47(11A): 624-627. https://doi.org/10.11896/jsjkx.191100154
[10] 康林瑶, 唐兵, 夏艳敏, 张黎.
基于GPU加速和非负矩阵分解的并行协同过滤推荐算法
GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm
计算机科学, 2019, 46(8): 106-110. https://doi.org/10.11896/j.issn.1002-137X.2019.08.017
[11] 郑红波, 石豪, 杜轶诚, 张美玉, 秦绪佳.
光照不均匀的结构光图像的条纹快速提取方法
Fast Stripe Extraction Method for Structured Light Images with Uneven Illumination
计算机科学, 2019, 46(5): 272-278. https://doi.org/10.11896/j.issn.1002-137X.2019.05.042
[12] 张逸然, 陈龙, 安向哲, 颜深根.
面向GPU计算平台的归约算法的性能优化研究
Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform
计算机科学, 2019, 46(2): 306-314. https://doi.org/10.11896/j.issn.1002-137X.2019.02.047
[13] 邓定胜.
一种基于可编程GPU的实时烟雾模拟算法研究
Study on Real-time Smoke Simulation Algorithm Based on Programmable GPU
计算机科学, 2019, 46(11A): 604-608.
[14] 苏庆华, 付景超, 谷焓, 张姗姗, 李奕飞, 江方舟, 白翰林, 赵地.
前列腺癌辅助诊断GPU并行算法设计
Parallel Algorithm Design for Assisted Diagnosis of Prostate Cancer
计算机科学, 2019, 46(11A): 524-527.
[15] 卢献华, 王洪俊.
基于大数据计算框架的分布式新闻聚类系统设计
Design of Distributed News Clustering System Based on Big Data Computing Framework
计算机科学, 2019, 46(11A): 220-223.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!