Computer Science ›› 2013, Vol. 40 ›› Issue (9): 38-43.

Previous Articles     Next Articles

Extension to OpenMP Task Scheduling Mechanism for DSWP Parallelization and its Implementation

LIU Xiao-xian,ZHAO Rong-cai and DING Rui   

  • Online:2018-11-16 Published:2018-11-16

Abstract: While multicore processors increase throughput for multi-programmed and multithreaded codes,many important applications are single threaded and thus are not benefited.Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platform.Unfortunately,the prevalence of control flow,recursive data structures,and general pointer accesses in ordinary programs renders the existing techniques unsuitable.Ottoni et al.proposed an automatic parallelization algorithm called Decoupled Software Pipelining(DSWP)to exploit fine-grained pipeline parallelism at the instruction level.But it requires knowledge of micro-architectural properties and hardware support of a communication channel and two special instructions.The improved DSWP algorithm based on OpenMP increases the parallel granularity and does not rely on hardware support any more,but the existing OpenMP task scheduling mechanism cannot satisfy the need of DSWP.A new binding clause for the task construct in OpenMP was proposed to extend the task scheduling mechanism.It guarantees the correctness of the OpenMP DSWP parallelization.The new clause is implemented in the GCC runtime library libgomp,which provides support for the compilation of OpenMP DSWP programs.The experimental results show that loops failed to be parallelized by existing techniques can be parallelized by the improved automatic parallelization algorithm and gain significant performance improvement on dual-core CPU.The average performance speedup is up to 1.23.Compared with Intel and Open64compilers,the compiler with the improved algorithm can increase execution efficiency evidently and the average speedup of the OpenMP DSWP programs generated by it increases more than 22% and 26%.

Key words: Automatic parallelization,OpenMP,Decoupled software pipelining,Task scheduling mechanism,GCC

[1] Benoit A,Melhem R,Renaud-Goud P,et al.Power-aware Manhattan routing on chip multiprocessors[C]∥Proceedings of 26th International Parallel and Distributed Processing Symposium.Shanghai,2012:189-200
[2] Jin Hao-qiang,Jespersen D,Mehrotra P,et al.High performance computing using MPI and OpenMP on multi-core parallel systems[J].Parallel Computing,2011,37(9):562-575
[3] 丁锐,赵荣彩,韩林.基于主导值的计算和数据自动划分算法[J].计算机科学,2012,39(3):290-294
[4] Allen R,Kennedy K.Optimizing compilers for modern architectures:a dependence-based approach[M].California:Morgan Kaufmann Publisher,2001:63-68
[5] Lin Yu-te,Wang Shao-chung,Shih Wen-li,et al.Enable OpenCL compiler with Open64infrastructures[C]∥Proceedings of 13th IEEE International Conference on High Performance Computing and Communications.Alberta,2011:863-868
[6] Gerber R,Smith K B,Bik A J C,et al.The sofware optimization cookbook:high-performance recipes for IA-32platforms(2st ed)[M].Hillsboro:Intel Press,2006:13-27
[7] Ottoni G,Rangan R,Stoler A,et al.Automatic thread extraction with decoupled software pipelining[C]∥Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,2005:105-118
[8] August D I,Connors D A,Mahlke S A,et al.Integrated predication and speculative execution in the IMPACT EPIC architecture[C]∥Proceedings of the 25th International Symposium on Computer Architecture.Barcelona,1998:227-237
[9] 富弘毅,丁滟,宋伟,等.一种利用并行复算实现的OpenMP容错机制[J].软件学报,2012,23(2):411-427
[10] Thoman P,Jordan H,Pellegrini S,et al.Automatic OpenMPloop scheduling:a combined compiler and runtime approach[C]∥Proceedings of 8th International Workshop on OpenMP.Rome,2012:88-101
[11] Ramshankar R.Open64 Compiler Developer Guide.ht-tp://developer.amd.com/tools/cpu/ open64/Documents/open64_compiler_developer_guide.html,2009-12
[12] Hurson A R,Lim J T,Kavi K M,et al.Parallelization of DOALL and DOACROSS loops——a survey[J].Advances in Computers,1997,45:53-103

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!