多核/众核平台上推荐算法的实现与性能评估

doi:10.11896/j.issn.1002-137X.2017.10.013

计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 71-74.doi: 10.11896/j.issn.1002-137X.2017.10.013

• 2016 全国高性能计算学术年会 • 上一篇下一篇

多核/众核平台上推荐算法的实现与性能评估

陈静,方建滨,唐滔,杨灿群

国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073

出版日期:2018-12-01 发布日期:2018-12-01
基金资助:
本文受国家自然科学基金项目(61170049,61402488,4,61602501),国家863项目(2015AA01A301)资助

Implementation and Performance Evaluation of Recommender Algorithms Based on Multi-/Many-core Platforms

CHEN Jing, FANG Jian-bin, TANG Tao and YANG Can-qun

Online:2018-12-01 Published:2018-12-01

摘要/Abstract

摘要： 用OpenCL语言标准设计并实现了推荐系统领域的两种经典算法:交替最小二乘法(Alternating Least Squares,ALS)与循环坐标下降法(Cyclic Coordinate Descent,CCD)。将其应用到CPU,GPU,MIC多核与众核平台上,探索了在该平台上影响算法性能的因子:潜在特征维数与线程个数。同时,将OpenCL实现的两种算法与CUDA和OpenMP的实现进行比较,得出了一系列结论。在同等条件下,与ALS算法相比,CCD算法的精度更高,收敛速度更快且更稳定,但所耗时间更长。ALS和CCD算法基于OpenCL的实现性能不亚于CUDA(CCD 上加速比为1.03x,ALS上加速比为1.2x)和OpenMP的实现(CCD与ALS上加速比大约为1.6~1.7x),并且两种算法在CPU平台上的性能均比GPU与MIC好。

关键词: 推荐系统,OpenCL,ALS,CCD

Abstract: In this paper,we designed and implemented two typical recommender algorithms,alternating least squares and cyclic coordinate descent in openCL.Then we evaluated them on Intel CPUs,NVIDIA GPUs and Intel MIC,and investigated the performance impacting factors: potential feature dimension and the number of thread.Meanwhile,we compared the OpenCL implementation with that of CUDA and OpenMP.Our experimental results show that in the same condition,CCD converges faster and performs more steadily,but is more time-consuming than ALS.We also observed that the performance based on OpenCL is better than CUDA and OpenMP when running on the same platform:the training time on GPU is slightly faster than that of the CUDA implementation (1.03x for CCD and 1.2x for ALS),and the training time on CPU is 1.6~1.7 times less than that of the OpenMP implementation with 16 threads.When running the OpenCL implementation on different platforms,we noticed that CPU performs better than both the GPU and the MIC.

Key words: Recommender system,OpenCL,ALS,CCD

陈静,方建滨,唐滔,杨灿群. 多核/众核平台上推荐算法的实现与性能评估[J]. 计算机科学, 2017, 44(10): 71-74. https://doi.org/10.11896/j.issn.1002-137X.2017.10.013

CHEN Jing, FANG Jian-bin, TANG Tao and YANG Can-qun. Implementation and Performance Evaluation of Recommender Algorithms Based on Multi-/Many-core Platforms[J]. Computer Science, 2017, 44(10): 71-74. https://doi.org/10.11896/j.issn.1002-137X.2017.10.013

参考文献

[1] RODRIGUES A V,JORGE A,DUTRA I.Accelerating Recommender Systems using GPUs[C]∥ACM Symposium on Applied Computing.ACM,2015:879-884.
[2] GATES M,ANZT H,KURZAK J,et al.Accelerating Collaborative Filtering Using Concepts from High Performance Computing[C]∥2015 IEEE International Conference on Big Data (Big Data).IEEE,2015:667-676.
[3] PATEREK A.Improving regularized singular value decomposition for collaborative filtering[C]∥ACM International Con-ference on Knowledge Discovery and Data Mining.2007:39-42.
[4] ZHOU Y H,WILKINSON D,SCHREIBER R,et al.Large-scale Parallel Collaborative Filtering for the Netflix Prize[C]∥Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management.2008:337-348.
[5] YU H F,HSIEH C J,SI S,et al.Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems[C]∥2013 IEEE 13th International Conference on Data Mining(2012).2012:765-774.
[6] KOREN Y,BELL R,VOLINSKY C.Matrix Factorization Tech-niques for Recommender Systems[J].Computer,2009,2(8):30-37.
[7] ZHUANG Y,CHIN W S,JUAN W C,et al.A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems[C]∥Proceedings of ACM Recommender Systems 2013.2013:249-256.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

多核/众核平台上推荐算法的实现与性能评估

Implementation and Performance Evaluation of Recommender Algorithms Based on Multi-/Many-core Platforms

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0