摘要: 多核处理器能够提升多线程程序的性能,但早已存在的诸多单线程程序无法从中获益,程序员也习惯于编写单线程程序。自动并行化技术是将单线程程序移植到多核上的重要手段,但是当循环中存在无法确定的数据依赖或复杂的控制流时,传统的自动并行化技术无法取得良好效果。Ottoni等人针对传统自动并行失败的循环提出了Decoupled Software Pipelining(DSWP)算法用以实现指令级的细粒度并行,但其需要对处理器体系结构的深入了解以及对核间通信队列和专用指令的硬件支持,并行性能和应用广泛性受到限制。基于OpenMP应用编程接口实现的DSWP并行不依赖于硬件上对核间通信队列和专用指令的支持,且不受平台的限制,但现有的OpenMP任务调度机制无法满足DSWP并行中对任务调度的需求。对现有的OpenMP任务调度机制进行扩展,增加了任务与线程绑定的属性,保证了基于OpenMP的DSWP并行程序的正确执行。在GCC的OpenMP运行库libgomp中扩展了任务绑定属性子句的功能,扩展后的GCC作为OpenMP DSWP程序的基础编译器,为自动并行提供支持。通过对基准测试集NPB3.3.1的测试表明,传统自动并行失败的循环,经OpenMP DSWP自动并行后在双核处理器上平均加速比达到1.23以上;使用添加了OpenMP DSWP算法的Open64编译器生成的并行程序,与仅使用传统自动并行方法的Intel编译器和Open64编译器所得程序相比,平均加速比分别高出22%和26%。
[1] Benoit A,Melhem R,Renaud-Goud P,et al.Power-aware Manhattan routing on chip multiprocessors[C]∥Proceedings of 26th International Parallel and Distributed Processing Symposium.Shanghai,2012:189-200 [2] Jin Hao-qiang,Jespersen D,Mehrotra P,et al.High performance computing using MPI and OpenMP on multi-core parallel systems[J].Parallel Computing,2011,37(9):562-575 [3] 丁锐,赵荣彩,韩林.基于主导值的计算和数据自动划分算法[J].计算机科学,2012,39(3):290-294 [4] Allen R,Kennedy K.Optimizing compilers for modern architectures:a dependence-based approach[M].California:Morgan Kaufmann Publisher,2001:63-68 [5] Lin Yu-te,Wang Shao-chung,Shih Wen-li,et al.Enable OpenCL compiler with Open64infrastructures[C]∥Proceedings of 13th IEEE International Conference on High Performance Computing and Communications.Alberta,2011:863-868 [6] Gerber R,Smith K B,Bik A J C,et al.The sofware optimization cookbook:high-performance recipes for IA-32platforms(2st ed)[M].Hillsboro:Intel Press,2006:13-27 [7] Ottoni G,Rangan R,Stoler A,et al.Automatic thread extraction with decoupled software pipelining[C]∥Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,2005:105-118 [8] August D I,Connors D A,Mahlke S A,et al.Integrated predication and speculative execution in the IMPACT EPIC architecture[C]∥Proceedings of the 25th International Symposium on Computer Architecture.Barcelona,1998:227-237 [9] 富弘毅,丁滟,宋伟,等.一种利用并行复算实现的OpenMP容错机制[J].软件学报,2012,23(2):411-427 [10] Thoman P,Jordan H,Pellegrini S,et al.Automatic OpenMPloop scheduling:a combined compiler and runtime approach[C]∥Proceedings of 8th International Workshop on OpenMP.Rome,2012:88-101 [11] Ramshankar R.Open64 Compiler Developer Guide.ht-tp://developer.amd.com/tools/cpu/ open64/Documents/open64_compiler_developer_guide.html,2009-12 [12] Hurson A R,Lim J T,Kavi K M,et al.Parallelization of DOALL and DOACROSS loops——a survey[J].Advances in Computers,1997,45:53-103 |
No related articles found! |
|