计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 137-143.doi: 10.11896/jsjkx.241200072
徐金龙1,3, 王庚武1,2, 韩林1, 聂凯1, 李浩然1, 陈梦尧1,2, 刘浩浩1,2
XU Jinlong1,3, WANG Gengwu1,2, HAN Lin1, NIE Kai1, LI Haoran1, CHEN Mengyao1,2, LIU Haohao1,2
摘要: 调度策略是编译器进行并行化的重要组成部分,其作用是保持多核处理器下的负载均衡。然而,当前申威GCC编译器在进行自动并行编译流程中,默认采用静态调度划分循环的迭代次数,导致其在非规则循环结构中出现了负载不均衡的问题,影响了申威平台并行程序的运行效率。针对这一问题,所提出的方法在权衡调度开销和负载均衡的同时结合梯式调度策略,对申威GCC原有的调度策略进行改进,提高了申威GCC编译器的并行化效率。该调度策略基于SW3231处理器,在GCC编译器功能测试套件的844个并行测试用例上进行正确性测试,并在SPEC OMP 2012测试集和4种循环类型的典型应用程序上进行性能测评。实验结果表明,相比申威GCC中标准的3种调度策略,该梯式调度算法分别最高获得了1.10和4.54的性能提升,该方法能够在科学计算程序中提高申威GCC编译器的线程级并行化效率,可为申威处理器平台并行化编译提供参考。
中图分类号:
[1]AREZOO A,SHAHRIAR L,HABIB I.TEA-SEA:Tiling andscheduling of non-uniform two-level perfectly nested loops using an evolutionary approach[J].Expert Systems with Applications,2022,191:1-21. [2]JIN H,JESPERSEN D,MEHROTRA P,et al.High performancecomputing using MPI and OpenMP on multi-core parallel systems[J].Parallel Computing,2011,37(9):562-575. [3]MAC Y,LU B X,YE X J,et al.Automatic parallelization framework for complex nested loops based on LLVM Pass[J].Journal of Software,2023,34(7):3022-3042. [4]DIMAKOPOULOSV V,LEONTIADIS E,TZOUMAS G.Aportable C compiler for OpenMP V.2.0[C]//Proceedings of EWOMP.2003:5-11. [5]LIU S F,ZHANG Y Q,SUN X Z.Research on an improvedOpenMP guided scheduling strategy[J].Journal of Computer Research and Development,2010,47(4):687-694. [6]FAN H M,LI Z T.Multi-thread load balancing scheduling stra-tegy based on OpenMP[J].Computer and Modernization,2013(12):192-195. [7]HUMMELS F,SCHONBERG E,FLYNN L E.Factoring:Amethod for scheduling parallel loops[J].Communications of the ACM,1992,35(8):90-101. [8]LI Y P,PANG J M,XU J L,et al.A Nonlinear Static SchedulingStrategy for Linear Loop Structure[J].Computer Engineering,2022,48(1):155-162. [9]YANGC D,ZHANG S Q.A parallel loop self-scheduling on extremely heterogeneous PC clusters[J].Journal of Information Science and Engineering,2004,20(2):263-273. [10]BAK S,GUO Y,BALAJI P,et al.Optimized execution of parallel loops via user-defined scheduling policies[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10. [11]BALACHANDRAN S.Compiler Enhanced Scheduling for OpenMPfor Heterogeneous Multiprocessors[J].arXiv:1808.06074,2018. [12]TZENT H,NI L M.Trapezoid self-scheduling:A practicalscheduling scheme for parallel compilers[J].IEEE Transactions on Parallel and Distributed Systems,1993,4(1):87-98. [13]PARK I,VOSS M J,KIM S W,et al.Parallel programming environment for OpenMP[J].Scientific Programming,2001,9(2/3):143-161. [14]GNU Offloading and Multi Processing Runtime Library[EB/OL].https://gcc.gnu.org/onlinedocs/libgomp.pdf. [15]LI H,TANDRI S,STUMM M,et al.Locality and loop scheduling on NUMA multiprocessors[C]//1993 International Confe-rence on Parallel Processing-ICPP’93.IEEE,1993:140-147. [16]HOU B X,CHEN L.Research Overview of Database Technology Development[J].Software Guide,2024,23(6):214-220. [17]SUGIURAK,NISHIMURA M,ISHIKAWA Y.Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs[J].arXiv:2404.01710,2024. [18]GAO L,ZHAO Y C,ZHANG W G,et al.Survey on Thread Synchronization in GPU Parallel Programming[J].Journal of Software,2024,35(2):1028-1047. |
|