计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 137-143.doi: 10.11896/jsjkx.241200072

• 高性能计算 • 上一篇    下一篇

基于申威编译器的并行调度策略优化技术研究

徐金龙1,3, 王庚武1,2, 韩林1, 聂凯1, 李浩然1, 陈梦尧1,2, 刘浩浩1,2   

  1. 1 郑州大学国家超级计算郑州中心 郑州 450001
    2 郑州大学计算机与人工智能学院 郑州 450001
    3 解放军信息工程大学 郑州 450001
  • 收稿日期:2024-12-10 修回日期:2025-05-15 出版日期:2025-09-15 发布日期:2025-09-11
  • 通讯作者: 聂凯(ieknie@zzu.edu.cn)
  • 作者简介:(longkaizh@163.com)
  • 基金资助:
    2024河南省重大科技专项(241100210100);2024河南省科技攻关项目(242102211094);2023国家重点研发计划高性能计算专项(2023YFB3002505);2022河南省重大科技专项(221100210600)

Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler

XU Jinlong1,3, WANG Gengwu1,2, HAN Lin1, NIE Kai1, LI Haoran1, CHEN Mengyao1,2, LIU Haohao1,2   

  1. 1 National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450001,China
    2 School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
    3 Information Engineering University,Zhengzhou 450001,China
  • Received:2024-12-10 Revised:2025-05-15 Online:2025-09-15 Published:2025-09-11
  • About author:XU Jinlong,born in 1985,Ph.D,lectu-rer,master’s supervisor.His main research interests include high-perfor-mance computing and parallel compilation.
    NIE Kai,born in 1987,Ph.D,lecturer,postgraduate supervisor.His main research interests include advanced compilation techniques,high-performance computing,etc.
  • Supported by:
    2024 Henan Provincial Major Science and Technology Project(241100210100),2024 Henan Provincial Science and Technology Research Project(242102211094),2023 National Key R&D Program for High-Performance Computing(2023YFB3002505) and 2022 Henan Provincial Major Science and Technology Project(221100210600).

摘要: 调度策略是编译器进行并行化的重要组成部分,其作用是保持多核处理器下的负载均衡。然而,当前申威GCC编译器在进行自动并行编译流程中,默认采用静态调度划分循环的迭代次数,导致其在非规则循环结构中出现了负载不均衡的问题,影响了申威平台并行程序的运行效率。针对这一问题,所提出的方法在权衡调度开销和负载均衡的同时结合梯式调度策略,对申威GCC原有的调度策略进行改进,提高了申威GCC编译器的并行化效率。该调度策略基于SW3231处理器,在GCC编译器功能测试套件的844个并行测试用例上进行正确性测试,并在SPEC OMP 2012测试集和4种循环类型的典型应用程序上进行性能测评。实验结果表明,相比申威GCC中标准的3种调度策略,该梯式调度算法分别最高获得了1.10和4.54的性能提升,该方法能够在科学计算程序中提高申威GCC编译器的线程级并行化效率,可为申威处理器平台并行化编译提供参考。

关键词: OpenMP调度策略, 负载均衡, 梯式调度, 调度开销, 申威GCC

Abstract: Scheduling strategies are an important part of compiler parallelization,ensuring load balancing on multi-core processors.However,the default static scheduling used by the Sunway GCC compiler divides loop iterations statically,causing load imbalance in irregular loop structures,which impacts the performance of parallel programs on the Sunway platform.To address this problem,the proposed method combines trapezoid scheduling strategy,balancing scheduling overhead and load balancing,to improve the existing scheduling strategy of Sunway GCC.This strategy tested on the SW3231 processor using 844 parallel test cases from the GCC compiler test suite,and performance tested on the SPEC OMP 2012 benchmark and four typical loop types,shows a performance improvement of up to 1.10 and 4.54 compared to the three standard scheduling strategies in Sunway GCC.This method enhances thread-level parallelism in scientific computing programs,providing valuable insights for parallel compilation on the Sunway processor platform.

Key words: OpenMP scheduling strategy, Load balancing, Trapezoid self-scheduling, Scheduling overhead, Sunway GCC

中图分类号: 

  • TP314
[1]AREZOO A,SHAHRIAR L,HABIB I.TEA-SEA:Tiling andscheduling of non-uniform two-level perfectly nested loops using an evolutionary approach[J].Expert Systems with Applications,2022,191:1-21.
[2]JIN H,JESPERSEN D,MEHROTRA P,et al.High performancecomputing using MPI and OpenMP on multi-core parallel systems[J].Parallel Computing,2011,37(9):562-575.
[3]MAC Y,LU B X,YE X J,et al.Automatic parallelization framework for complex nested loops based on LLVM Pass[J].Journal of Software,2023,34(7):3022-3042.
[4]DIMAKOPOULOSV V,LEONTIADIS E,TZOUMAS G.Aportable C compiler for OpenMP V.2.0[C]//Proceedings of EWOMP.2003:5-11.
[5]LIU S F,ZHANG Y Q,SUN X Z.Research on an improvedOpenMP guided scheduling strategy[J].Journal of Computer Research and Development,2010,47(4):687-694.
[6]FAN H M,LI Z T.Multi-thread load balancing scheduling stra-tegy based on OpenMP[J].Computer and Modernization,2013(12):192-195.
[7]HUMMELS F,SCHONBERG E,FLYNN L E.Factoring:Amethod for scheduling parallel loops[J].Communications of the ACM,1992,35(8):90-101.
[8]LI Y P,PANG J M,XU J L,et al.A Nonlinear Static SchedulingStrategy for Linear Loop Structure[J].Computer Engineering,2022,48(1):155-162.
[9]YANGC D,ZHANG S Q.A parallel loop self-scheduling on extremely heterogeneous PC clusters[J].Journal of Information Science and Engineering,2004,20(2):263-273.
[10]BAK S,GUO Y,BALAJI P,et al.Optimized execution of parallel loops via user-defined scheduling policies[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10.
[11]BALACHANDRAN S.Compiler Enhanced Scheduling for OpenMPfor Heterogeneous Multiprocessors[J].arXiv:1808.06074,2018.
[12]TZENT H,NI L M.Trapezoid self-scheduling:A practicalscheduling scheme for parallel compilers[J].IEEE Transactions on Parallel and Distributed Systems,1993,4(1):87-98.
[13]PARK I,VOSS M J,KIM S W,et al.Parallel programming environment for OpenMP[J].Scientific Programming,2001,9(2/3):143-161.
[14]GNU Offloading and Multi Processing Runtime Library[EB/OL].https://gcc.gnu.org/onlinedocs/libgomp.pdf.
[15]LI H,TANDRI S,STUMM M,et al.Locality and loop scheduling on NUMA multiprocessors[C]//1993 International Confe-rence on Parallel Processing-ICPP’93.IEEE,1993:140-147.
[16]HOU B X,CHEN L.Research Overview of Database Technology Development[J].Software Guide,2024,23(6):214-220.
[17]SUGIURAK,NISHIMURA M,ISHIKAWA Y.Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs[J].arXiv:2404.01710,2024.
[18]GAO L,ZHAO Y C,ZHANG W G,et al.Survey on Thread Synchronization in GPU Parallel Programming[J].Journal of Software,2024,35(2):1028-1047.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!