计算机科学 ›› 2013, Vol. 40 ›› Issue (3): 62-67.

• 2012多值逻辑专栏 • 上一篇    下一篇

基于Fermi架构的Join算法

李观钊,陈思桐,甄 真,陈 虎   

  1. (华南理工大学计算机科学与工程学院 广州510006);(华南理工大学软件学院 广州510006)
  • 出版日期:2018-11-16 发布日期:2018-11-16

.loin Algorithms Based on Fermi Architecture

  • Online:2018-11-16 Published:2018-11-16

摘要: 在列数据库中,连接操作依然是最核心和最耗时的操作,GPU强大的计算能力可为此提供新的优化手段。基于Fermi架构,提出了新的Hash Join算法和Sort merge Join算法,其基本思想是充分利用该架构新增的缓存结构来减少连接操作的cache缺失率。与CUDA stream技术相结合,新算法在输出结果较多时可以有效地隐藏主存与显存间数据传输带来的延迟,进一步提升其执行效率。实验结果证实了基于Fcrmi架构的Hash Join算法处理偏抖数据的高效性及Sort merge Join算法的稳定性,并且通过比较表明,这两种算法的性能全面优于基于多核CPU充分优化的Join算法,最大加速2.4倍,在外键分布高偏抖时新的Hash Join算法的执行速度甚至达到每秒217M元组。

关键词: Join算法,Fermi架构,缓存,CUDA stream

Abstract: In column database, the join operation is still the most important and the most timcconsuming operation.UPUs' powerful computing capabilities can provide new optimizing solutions. This paper proposed a new Hash Join algorithm and a new Sort merge Join which are both based on Fermi architecture. These two new implementation approaches take full advantage of the new parallel cache hierarchy of Fermi, and successfully reduce the cache miss rate of the join operation. Combined with CUDA stream technology, the new join algorithms can effectively hide the data transfer delay between the main memory and global memory, when the join operation generates a large number of results. The experimental results show that Hash Join based on Fermi performs better when it deals with data skew, while Sort merge Join based on Fermi is more stable. Comparing with the implementations based on multi-core CPUs, these two algorithms are faster, speed up 2.4 times at most, and the new Hash Join even achieves 217M tuples per second when foreign key's dataset is high skew.

Key words: Join algorithm, Fermi architecture, Cache, CUDA stream

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!