计算机科学 ›› 2012, Vol. 39 ›› Issue (4): 282-286.

• 体系结构 • 上一篇    下一篇

PLASMA自适应调优与性能优化的设计与实现

吕渐春,张云泉,王 婷,肖玄基   

  1. (中国科学院软件所并行计算实验室 北京100190);(中国科学院研究生院 北京100190)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Design and Implementation for PLASMA Auto-tuning and Performance Optimizing

  • Online:2018-11-16 Published:2018-11-16

摘要: PLASMA是一个高效的线性代数软件包,其数据分布结合分堆、细粒度并行以及乱序执行机制等大大提高了程序的性能。但PLASMA仍然存在一些问题,比如分块大小对程序性能的影响非常大,以及产生了大量的数据拷贝等。通过对比传统的LAPACK和PLASMA的实现机制,分析了PLASMA中存在的优势和不足,介绍了两种弥补PLASMA自身不足的方法。针对PLASMA的架构,经过大量的测试与分析,提出了边缘矩阵的概念并分析了其对性能的影响,据此提出了一种自适应调优的方法。并通过数据拷贝与计算并行的运行方式,进一步提高了PLASMA性能,最后通过大量的测试验证了该优化方法的效果。

关键词: LAPACK , PLASMA,自适应调优,优化

Abstract: PLASMA is a high performance linear algebra package. Its innovative approach such as block data layout with tiling,fine grain parallelism and out of order execution mechanism greatly improves the performance of the program. However, there arc still some problems, for example, the size of block plays a severe role in performance and this mechanism brings some data copy. In this paper, by comparing the traditional LAPACK and PLASMA's mechanism, we aimed to analyze the advantages and disadvantages of PLASMA, and proposed two methods to make up the disadvantages. As to the PLASMA architecture, we proposed a concept of marginal matrix and analysed their impact on performance via extensive testing and analysis, and then proposed a method of auto-tuning. Besides, we also found a way to further improve the performance of PLASMA,which is adopting data transmission and computing in parallel. Finally,we verified the effect of optimized method by doing a large number of testing.

Key words: LAPACK, PLASMA, Auto-tuning, Optimization

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!