基于申威编译器的并行调度策略优化技术研究

doi:10.11896/jsjkx.241200072

计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 137-143.doi: 10.11896/jsjkx.241200072

基于申威编译器的并行调度策略优化技术研究

徐金龙^1,3, 王庚武^1,2, 韩林¹, 聂凯¹, 李浩然¹, 陈梦尧^1,2, 刘浩浩^1,2

1 郑州大学国家超级计算郑州中心郑州 450001
2 郑州大学计算机与人工智能学院郑州 450001
3 解放军信息工程大学郑州 450001

收稿日期:2024-12-10 修回日期:2025-05-15 出版日期:2025-09-15 发布日期:2025-09-11
通讯作者: 聂凯(ieknie@zzu.edu.cn)
作者简介:(longkaizh@163.com)
基金资助:
2024河南省重大科技专项(241100210100);2024河南省科技攻关项目(242102211094);2023国家重点研发计划高性能计算专项(2023YFB3002505);2022河南省重大科技专项(221100210600)

Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler

XU Jinlong^1,3, WANG Gengwu^1,2, HAN Lin¹, NIE Kai¹, LI Haoran¹, CHEN Mengyao^1,2, LIU Haohao^1,2

1 National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450001,China
2 School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
3 Information Engineering University,Zhengzhou 450001,China

Received:2024-12-10 Revised:2025-05-15 Online:2025-09-15 Published:2025-09-11
About author:XU Jinlong,born in 1985,Ph.D,lectu-rer,master’s supervisor.His main research interests include high-perfor-mance computing and parallel compilation.
NIE Kai,born in 1987,Ph.D,lecturer,postgraduate supervisor.His main research interests include advanced compilation techniques,high-performance computing,etc.
Supported by:
2024 Henan Provincial Major Science and Technology Project(241100210100),2024 Henan Provincial Science and Technology Research Project(242102211094),2023 National Key R&D Program for High-Performance Computing(2023YFB3002505) and 2022 Henan Provincial Major Science and Technology Project(221100210600).

摘要/Abstract

摘要： 调度策略是编译器进行并行化的重要组成部分,其作用是保持多核处理器下的负载均衡。然而,当前申威GCC编译器在进行自动并行编译流程中,默认采用静态调度划分循环的迭代次数,导致其在非规则循环结构中出现了负载不均衡的问题,影响了申威平台并行程序的运行效率。针对这一问题,所提出的方法在权衡调度开销和负载均衡的同时结合梯式调度策略,对申威GCC原有的调度策略进行改进,提高了申威GCC编译器的并行化效率。该调度策略基于SW3231处理器,在GCC编译器功能测试套件的844个并行测试用例上进行正确性测试,并在SPEC OMP 2012测试集和4种循环类型的典型应用程序上进行性能测评。实验结果表明,相比申威GCC中标准的3种调度策略,该梯式调度算法分别最高获得了1.10和4.54的性能提升,该方法能够在科学计算程序中提高申威GCC编译器的线程级并行化效率,可为申威处理器平台并行化编译提供参考。

关键词: OpenMP调度策略, 负载均衡, 梯式调度, 调度开销, 申威GCC

Abstract: Scheduling strategies are an important part of compiler parallelization,ensuring load balancing on multi-core processors.However,the default static scheduling used by the Sunway GCC compiler divides loop iterations statically,causing load imbalance in irregular loop structures,which impacts the performance of parallel programs on the Sunway platform.To address this problem,the proposed method combines trapezoid scheduling strategy,balancing scheduling overhead and load balancing,to improve the existing scheduling strategy of Sunway GCC.This strategy tested on the SW3231 processor using 844 parallel test cases from the GCC compiler test suite,and performance tested on the SPEC OMP 2012 benchmark and four typical loop types,shows a performance improvement of up to 1.10 and 4.54 compared to the three standard scheduling strategies in Sunway GCC.This method enhances thread-level parallelism in scientific computing programs,providing valuable insights for parallel compilation on the Sunway processor platform.

Key words: OpenMP scheduling strategy, Load balancing, Trapezoid self-scheduling, Scheduling overhead, Sunway GCC

中图分类号:

TP314

徐金龙, 王庚武, 韩林, 聂凯, 李浩然, 陈梦尧, 刘浩浩. 基于申威编译器的并行调度策略优化技术研究[J]. 计算机科学, 2025, 52(9): 137-143. https://doi.org/10.11896/jsjkx.241200072

XU Jinlong, WANG Gengwu, HAN Lin, NIE Kai, LI Haoran, CHEN Mengyao, LIU Haohao. Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler[J]. Computer Science, 2025, 52(9): 137-143. https://doi.org/10.11896/jsjkx.241200072

参考文献

[1]AREZOO A,SHAHRIAR L,HABIB I.TEA-SEA:Tiling andscheduling of non-uniform two-level perfectly nested loops using an evolutionary approach[J].Expert Systems with Applications,2022,191:1-21.
[2]JIN H,JESPERSEN D,MEHROTRA P,et al.High performancecomputing using MPI and OpenMP on multi-core parallel systems[J].Parallel Computing,2011,37(9):562-575.
[3]MAC Y,LU B X,YE X J,et al.Automatic parallelization framework for complex nested loops based on LLVM Pass[J].Journal of Software,2023,34(7):3022-3042.
[4]DIMAKOPOULOSV V,LEONTIADIS E,TZOUMAS G.Aportable C compiler for OpenMP V.2.0[C]//Proceedings of EWOMP.2003:5-11.
[5]LIU S F,ZHANG Y Q,SUN X Z.Research on an improvedOpenMP guided scheduling strategy[J].Journal of Computer Research and Development,2010,47(4):687-694.
[6]FAN H M,LI Z T.Multi-thread load balancing scheduling stra-tegy based on OpenMP[J].Computer and Modernization,2013(12):192-195.
[7]HUMMELS F,SCHONBERG E,FLYNN L E.Factoring:Amethod for scheduling parallel loops[J].Communications of the ACM,1992,35(8):90-101.
[8]LI Y P,PANG J M,XU J L,et al.A Nonlinear Static SchedulingStrategy for Linear Loop Structure[J].Computer Engineering,2022,48(1):155-162.
[9]YANGC D,ZHANG S Q.A parallel loop self-scheduling on extremely heterogeneous PC clusters[J].Journal of Information Science and Engineering,2004,20(2):263-273.
[10]BAK S,GUO Y,BALAJI P,et al.Optimized execution of parallel loops via user-defined scheduling policies[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10.
[11]BALACHANDRAN S.Compiler Enhanced Scheduling for OpenMPfor Heterogeneous Multiprocessors[J].arXiv:1808.06074,2018.
[12]TZENT H,NI L M.Trapezoid self-scheduling:A practicalscheduling scheme for parallel compilers[J].IEEE Transactions on Parallel and Distributed Systems,1993,4(1):87-98.
[13]PARK I,VOSS M J,KIM S W,et al.Parallel programming environment for OpenMP[J].Scientific Programming,2001,9(2/3):143-161.
[14]GNU Offloading and Multi Processing Runtime Library[EB/OL].https://gcc.gnu.org/onlinedocs/libgomp.pdf.
[15]LI H,TANDRI S,STUMM M,et al.Locality and loop scheduling on NUMA multiprocessors[C]//1993 International Confe-rence on Parallel Processing-ICPP’93.IEEE,1993:140-147.
[16]HOU B X,CHEN L.Research Overview of Database Technology Development[J].Software Guide,2024,23(6):214-220.
[17]SUGIURAK,NISHIMURA M,ISHIKAWA Y.Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs[J].arXiv:2404.01710,2024.
[18]GAO L,ZHAO Y C,ZHANG W G,et al.Survey on Thread Synchronization in GPU Parallel Programming[J].Journal of Software,2024,35(2):1028-1047.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于申威编译器的并行调度策略优化技术研究

Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0