计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 44-51.doi: 10.11896/jsjkx.240300166

• 计算机软件 • 上一篇    下一篇

基于循环代价分析的循环不变量外提算法

姜军, 翟彦河, 曾志恒, 顾轶超, 黄亮明   

  1. 无锡先进技术研究院 江苏 无锡 214122
  • 收稿日期:2024-03-26 修回日期:2024-07-15 出版日期:2025-06-15 发布日期:2025-06-11
  • 通讯作者: 黄亮明(liangming_huang@126.com)
  • 作者简介:(goodsun_jj@163.com)
  • 基金资助:
    科技部重点支持项目(GG20210701)

Loop-invariant Code Motion Algorithm Based on Loop Cost Analysis

JIANG Jun, ZHAI Yanhe, ZENG Zhiheng, GU Yichao, HUANG Liangming   

  1. Wuxi Institute of Advanced Technology,Wuxi,Jiangsu 214122,China
  • Received:2024-03-26 Revised:2024-07-15 Online:2025-06-15 Published:2025-06-11
  • About author:JIANG Jun,born in 1980,master,senior engineer.His main research interests include compiler optimization,architecture-oriented performance analysis and optimization,etc.
    HUANG Liangming,born in 1988,Ph.D,engineer.His main research in-terests include compiler optimization,architecture-oriented performance ana-lysis and optimization,etc.
  • Supported by:
    Key Project of the Ministry of Science and Technology(GG20210701).

摘要: 循环不变量外提算法是一种针对程序中循环结构的常用编译优化算法,其通过将循环体中的不变计算移动到循环外部来减少重复计算的开销,从而提高程序运行的速度。但在LLVM编译器中,传统的循环不变量外提算法会将全部循环不变量外提到循环体外部,当循环不变量达到一定数量时会导致寄存器溢出,在循环内引入额外的访存代价,对循环产生负优化效果。针对上述问题,在传统LLVM循环不变量外提算法的基础上,引入了一种循环代价分析算法,通过计算循环不变量在循环体中的运行代价和外提操作可能带来的溢出代价,评估其外提可能带来的收益,只对产生正收益的循环不变量进行外提,在有效减少循环体内重复计算的同时,规避引入额外开销的风险。在国产申威831处理器平台,使用典型用例进行优化效果测评,在千万级循环下,相较于传统循环不变量优化算法,提出的新优化算法具有17%以上的性能提升;使用SPEC CPU2017基准测试集(SPECspeed 2017 Interger套件)、Perl解释器DKbench基准测试集、Python解释器pyperformance基准测试集进行综合优化效果测评,结果表明,相较于传统循环不变量优化算法,提出的新优化算法分别具有0.4%,0.63%和1%的性能提升。

关键词: LLVM编译器, 编译优化, 循环不变量外提, 寄存器溢出, 循环代价分析

Abstract: Loop-invariant code motion(LICM) is a commonly used compilation optimization algorithm for loop structures in programs.By moving the invariant calculations in the loop body to outside the loop,the algorithm reduces the overhead of duplicate calculations,thus improving program execution speed.However,in LLVM compiler,the traditional LICM algorithm hoists all loop-invariants outside the loop body,which will lead to register overflow when the number of loop-invariant reaches a certain level.It will introduce additional memory access cost in the loop,resulting in a negative optimization effect on the loop.To address this issue,a loop cost analysis algorithm is introduced based on the traditional LLVM LICM algorithm.This algorithm evaluates the running cost of loop-invariant code inside the loop and the overflow cost that may be caused by moving the code outside the loop,and assesses the benefits of moving the code outside the loop.Only the loop-invariant code that produces positive benefits is moved outside the loop,effectively reducing the overhead of duplicate calculations in the loop while avoiding the risk of introducing additional costs.Theproposed new optimization algorithm achieves more than 17% performance improvement compared to the traditional LICM algorithm in typical use cases for the domestic SW831 processor platform under millions of loops.Comprehensive optimization effect evaluations are conducted using the SPEC CPU 2017 benchmark test suite(SPECspeed2017 Integer Suite),Perl interpreter DKbench benchmark test suite,and Python interpreter pyperformance benchmark test suite.The results show that compared with the traditional LICM algorithm,the proposed algorithm has 0.4%,0.63% and 1% performance improvement respectively.

Key words: LLVM compiler, Compilation optimization, Loop-invariant code motion, Register overflow, Loop cost analysis

中图分类号: 

  • TP314
[1]LATTNER C,ADVE V.LLVM:A compilation framework for lifelong program analysis & transformation[C]//International Symposium on Code Generation and Optimization.IEEE,2004:75-86.
[2]CHO J,CHO D,KIM Y.Study on LLVM application in ParallelComputing System[J].The Journal of the Convergence on Culture Technology,2019,5(1):395-399.
[3]WANG L,GAO K,ZHAO Y Q,et al.Deep Learning Compiler Load Balancing Optimization Method for Model Training[J].Journal of Frontiers of Computer Science and Technology,2024,18(1):111-126.
[4]ZHU X L.The implementation and optimization of deep learningcompiler based on GPDSP[D].Changsha:National University of Defense Technology,2021.
[5]XIE H C.An Auto Performance Predict Research for Scientific Program Based on LLVM[D].Harbin:Harbin Institute of Technology,2016.
[6]PEELER H,LI S S,SLOSS A N,et al.Optimizing LLVM pass sequences with Shackleton:a linear genetic programming framework[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion.2022:578-581.
[7]CHAI Y,CHEN M,LI J,et al.Implementation and optimization of data prefetching algorithm based on LLVM compilation system[C]//6th International Conference on Electronic Technology and Information Science.2021.
[8]HUANG Z F,SHANG J D.Optimization based on LLVM globalinstruction selection[C]//2021 International Conference on Computer Network Security and Software Engineering.2021.2021.
[9]AHO A V,SETHI R,ULLMAN J D.Compilers:principles,techniques,and tools[M].Reading:Addison-Wesley,1986.
[10]ZAKOWSKI Y,BECK C,YOON I,et al.Modular,composi-tional,and executable formal semantics for LLVM IR[J].Pro-ceedings of the ACM on Programming Languages,2021,5(ICFP):1-30.
[11]ANDREW W A.SSA is Functional Programming[J].Acm Sigplan Notices,1998,33(4):17-20.
[12]YAN Y,PAN Z,YU L,et al.Research on the influencing factors of LLVM IR optimization effect[C]//2023 IEEE 3rd International Conference on Information Technology,Big Data and Artificial Intelligence(ICIBA).IEEE,2023,3:756-763.
[13]SHANG J,XU K,HAN L,et al.Optimization of Access Ad-dress Calculation for LLVM[C]//2023 4th International Conference on Information Science,Parallel and Distributed Systems(ISPDS).IEEE,2023:458-464.
[14]RODRIGUEZ-CANCIO M,COMBEMALE B,BAUDRY B.Automatic microbenchmark generation to prevent dead code elimination and constant folding[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.2016:132-143.
[15]GUO Z H,WU Y X,AN L F,et al.Study on function inlining optimization technologies based on LLVM[J].Computer Engineering and Applications,2017,53(3):41-46.
[16]WANG C X,HAN L,LIU H H.Optimization of loop unrolling based on instruction Cache and register pressure[J].Computer Engineering & Science,2022,44(12):2111-2119.
[17]HU Y X,ZHEGN Q L.Research on Deep Learning Based Loop Auto-schedule[J].Journal of Chinese Computer Systems,2024,45(7):1770-1777.
[18]MELLOR-CRUMMEY J,ADVE V.Simplifying control flow in compiler-generated parallel code[J].International Journal of Parallel Programming,1998,26(5):613-638.
[19]LI H X,LIU J.Methods of Data Flow Analysis[J].ComputerEngineering and Applications,2003,39(13):142-144.
[20]GONG L Q,SHEN L,ZHOU Q L,et al.Research and Implementation of Compile Lock Mechanism Based on LLVM[J].Computer Application and Software,2021,38(11):11-17.
[21]EBNER D,BRANDNER F,SCHOLZ B,et al.Generalized in-struction selection using SSA-graphs[C]//Proceedings of the 2008 ACM SIGPLAN-SIGBED Conference on Languages,Compilers,and Tools for Embedded Systems.2008:31-40.
[22]LOZANO R C,CARLSSON M,DREJHAMMAR F,et al.Constraint-based register allocation and instruction scheduling[C]//International Conference on Principles and Practice of Constraint Programming.Berlin,Heidelberg:Springer Berlin Heidelberg,2012:750-766.
[23]DE SOUZA XAVIER T C,OLIVEIRA G S,DE LIMA E D,et al.A detailed analysis of the llvm's register allocators[C]//2012 31st International Conference of the Chilean Computer Science Society.IEEE,2012:190-198.
[24]LIANG J L,HUA B J,LV Y S,et al.Loop Invariant Code Motion Algorithm for Deep Learning Operators[J].Journal of Frontiers of Computer Science and Technology,2023,17(1):127-139.
[25]SHI H K,WANG Z S,ZHANG S Z,et al.Performance Evaluation Benchmark of General-Purpose CPU:A Survey[J].Acta Electronica Sinica,2023,51(1):246-256.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!