Computer Science ›› 2025, Vol. 52 ›› Issue (5): 67-75.doi: 10.11896/jsjkx.241100175

• High Performance Computing • Previous Articles     Next Articles

TS3:Energy-Efficiency-First Optimal Thread Number Search Algorithm Based on Specific Starting Point Classification

MA Zhaoyang, CHEN Juan, ZHOU Yichang, WU Xianyu, GAO Pengfei, RUAN Wenhao, ZHAN Haoming   

  1. College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China
  • Received:2024-11-28 Revised:2025-02-27 Online:2025-05-15 Published:2025-05-12
  • About author:MA Zhaoyang,born in 2004,undergra-duate.His main research interests include high performance computing and so on.
    CHEN Juan,born in 1980,Ph.D,professor.Her main research interests include high performance computing,low-power compiler and power management,etc.
  • Supported by:
    Parallel and Distributed Computing National Key Laboratory Fund(2023-KJWPDL-01).

Abstract: Optimal thread number setting is one of the key factors affecting the performance and power consumption of multi-threaded programs.However,current algorithms for finding the optimal number of threads usually start the search from a single fixed point,which cause the problem of low precision and large search overhead.The distribution and location of the optimal number of threads are related to various,factors,including types of programs,optimization objectives(performance,power consumption,and EDP),parallel multi-threaded areas,and software-hardware configuration parameters.This paper focuses on the problem of searching for the optimal number of threads with an emphasis on energy efficiency and proposes an energy-efficiency-first optimal thread number search algorithm based on specific starting point classification(abbreviated as TS3 method).By designing a multi-threaded program classifier to optimize the setting of search starting points,and applying heuristic and binary search algorithms to enhance search efficiency,the method effectively improves the accuracy of the optimal number of threads search under energy efficiency priorities(optimal performance,optimal powerconsumption,optimal EDP) and reduces search costs.The effectiveness of the algorithm is experimentally validated using eight benchmarks on two x86 platforms and one ARM platform.Compared to the baseline,the TS3method achieves an average performance improvement of 0.29%(Platform A),0.17%(Platform B),and 10.77%(Platform C);average power consumption reduction of 2.35%(Platform A),1.87%(PlatformB),and 15.97%(Platform C);and average EDP reduction of 6.36%(Platform A),5.07%(Platform B),and 46.94%(Platform C).Across the three platforms,compared to current classical search methods,the TS3 method demonstrates an average performance improvement of 10.16%,an average reduction in power consumption of 13.45%,and an average reduction in EDP of 23.77%,the search overhead is reduced by 86.8%.

Key words: Multithreaded program, Energy efficiency optimization, Optimal number of threads, Random forest algorithm, Heuristic algorithm

CLC Number: 

  • TP302
[1]WANG R,LU K,CHEN J,et al.Brief introduction of tianhe exascaleprototype system[J].Tsinghua Science and Technology,2021,26(3):361-369.
[2]DUTOT P F,GEORGIOU Y,GLESSER D,et al.Towardsenergy budget control in hpc[C]//2017 17th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing(CCGRID).2017:381-390.
[3]LEE J,PARK J H,KIM H,et al.Adaptive execution techniques of parallel programs for multiprocessors[J].Journal of Parallel and Distributed Computing,2010,70(5):467-480.
[4]SULEMAN M A,QURESHI M K,PATT Y N.Feedback-driven threading:power-efficient and high-performance execution of multi-threadedworkloads on cmps[C]//Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2008:277-286.
[5]SUBRAMANIAN L,SESHADRI V,KIM Y,et al.Mise:Providing performance predictability and improving fairness in shared main memory systems[C]//2013 IEEE 19th International Symposium on High Performance Computer Architecture(HPCA).2013:639-650.
[6]JOAO J A,SULEMAN M A,MUTLU O,et al.Bottleneck identification and scheduling in multithreaded applications[J].ACM SIGARCH Computer Architecture News,2012,40(1):223-234.
[7]CHEN J,ZHOU W,DONG Y,et al.Analyzingtime-dimensioncommunication characterizations for representative scientific applications onsupercomputer systems[J].Frontiers of Computer Science,2019,13(6):1228-1242.
[8]NASA.Nasparallel benchmarks[EB/OL].https://www.nas.nasa.gov/software/npb.html.
[9]MOORI M K,ROCHA H M G D A,LORENZON A F,et al.Searching for the ideal number of threads on asymmetric multiprocessors[C]//2023 XIII Brazilian Symposium on Computing Systems Engineering(SBESC).2023:1-6.
[10]MALAVE S H,SHINDE S K.Adaptive manta ray foraging optimizer fordetermining optimal thread count on many-core architecture[C]//Congress on Intelligent Systems.Singapore:Springer,2023:209-222.
[11]MARR D T,BINNS F,HILL D L,et al.Hyper-ThreadingTechnology in Microprocessor[J].Intel Technology Journal,2014,2(10):1-12.
[12]JUE R,HENG L.Fast and accurate long-read assembly with wtdbg2[J].Nature Methods,2019,17(2):155-158.
[13]LORENZON A F,DE OLIVEIRA C C,SOUZA J D,et al.Aurora:Seamless Optimization of OpenMP Applications[J]IEEE Transactions on Parallel and Distributed Systems,2019,3(5):1007-1021.
[14]PUSUKURI K K,GUPTA R,BHUYAN L N.Thread reinforcer:Dynamically determining number of threads via os level monitoring[C]//2011 IEEE International Symposium on Workload Characterization(IISWC).2011:116-125.
[15]LEE J,WU H,RAVICHANDRAN M,et al.Thread tailor:dynamically weaving threads together for efficient,adaptive parallel applications[J].ACM SIGARCH Computer Architecture News,2010,38(3):270-279.
[16]MOORE R W,CHILDERS B R.Using utilityprediction models to dynamically choose program thread counts[C]//2012 IEEE International Symposium on Performance Analysis of Systems Software.2012:135-144.
[17]DANCHEVA T,GUSEV M,ZDRAVEVSKI V,et al.AnOpenMP runtime profiler/configuration tool for dynamic optimization of the number of threads[C]//2016 39th International Convention on Information and Communication Technology,Electronics and Microelectronics(MIPRO).IEEE,2016:192-197.
[18]SCHWARZROCK J,LORENZON A F,NAVAUX P O,et al.Potentialgains in EDP by dynamically adapting the number of threads for OpenMP applications in embedded systems[C]//2017 VII Brazilian Symposiumon Computing Systems Enginee-ring(SBESC).IEEE,2017:79-85.
[19]PESTEL S D,STEEN S V D,AKRAM S,et al.RPPM:Rapid performanceprediction of multithreaded workloads on multicore processors[C]//2019 IEEE International Symposium on Performance Analysis of Systems and Software.2019.
[20]WANG W,DAVIDSON J W,SOFFA M L.Predicting the me-mory bandwidth and optimal core allocations for multi-threaded applications onlarge-scale numa machines[C]//2016 IEEE International Symposiumon High Performance Computer Architecture(HPCA).2016:419-431.
[21]CURTIS-MAURY M,DING X,ANTONOPOULOS C D,et al.An evaluation of OpenMP on current and emerging multithreaded/multicore processors[C]LNTCS.2008:133-144.
[22]NIEPLOCHA J,MÁRQUEZ A,FEO J,et al.Evaluating the po-tential of multithreaded platforms for irregular scientific computations[C]//Proceedings of the 4th International Conference on Computing Frontiers.2007:47-58.
[23]MOSELEY T.Methods for modeling resource contention on simultaneousmultithreading processors[C]//2005 International Conference on Computer Design.2005.
[24]SANJUAN-ESTRADAJF,CASADOLG,GARCÍAI.Adaptive pa-rallel interval branch and bound algorithms based on their performance for multicore architectures[J].the Journal of Supercomputing,2011,58:376-384.
[1] LI Zhoucheng, ZHANG Yi, SUN Jin. Stochastic Optimization Method for Multi-exit Deep Neural Networks for Edge Intelligence Applications [J]. Computer Science, 2025, 52(4): 85-93.
[2] LIN Zheng, LIU Sicong, GUO Bin, DING Yasan, YU Zhiwen. Adaptive Operator Parallel Partitioning Method for Heterogeneous Embedded Chips in AIoT [J]. Computer Science, 2025, 52(2): 299-309.
[3] LI Haixia, SONG Danlei, KONG Jianing, SONG Yafei, CHANG Haiyan. Evaluation of Hyperparameter Optimization Techniques for Traditional Machine Learning Models [J]. Computer Science, 2024, 51(8): 242-255.
[4] YANG Da, LUO Liang, ZHENG Long. New Global Optimization Algorithm:Carbon Cycle Algorithm [J]. Computer Science, 2023, 50(6A): 220300131-7.
[5] HUANG Hua, JIANG Jun, YANG Yongkang, CAO Bin. Online Service Function Chain Orchestration Method for Profit Maximization [J]. Computer Science, 2023, 50(6): 66-73.
[6] WANG Wen-qiang, JIA Xing-xing, LI Peng. Adaptive Ensemble Ordering Algorithm [J]. Computer Science, 2022, 49(6A): 242-246.
[7] GENG Hai-jun, WANG Wei, YIN Xia. Single Node Failure Routing Protection Algorithm Based on Hybrid Software Defined Networks [J]. Computer Science, 2022, 49(2): 329-335.
[8] LIU Wen-wen, XIONG Wei, HAN Chi. Communication Satellite Task Relaxation Scheduling Method Based on Improved Hyper-heuristic Algorithm [J]. Computer Science, 2022, 49(11A): 210900125-6.
[9] LIU Zhong-hui, ZHAO Qi, ZOU Lu, MIN Fan. Heuristic Construction of Triadic Concept and Its Application in Social Recommendation [J]. Computer Science, 2021, 48(6): 234-240.
[10] GUO Qi-cheng, DU Xiao-yu, ZHANG Yan-yu, ZHOU Yi. Three-dimensional Path Planning of UAV Based on Improved Whale Optimization Algorithm [J]. Computer Science, 2021, 48(12): 304-311.
[11] GUO Fei-yan, TANG Bing. Mobile Edge Server Placement Method Based on User Latency-aware [J]. Computer Science, 2021, 48(1): 103-110.
[12] ZHANG Xu, WANG Li-li, YANG Bo-tao. Heuristic Algorithms for Two-dimensional Irregular Bin Packing Problem with GuillotineConstraints [J]. Computer Science, 2020, 47(5): 212-216.
[13] YANG Ting, LUO Fei, DING Wei-chao, LU Hai-feng. Bin Packing Algorithm Based on Adaptive Optimization of Slack [J]. Computer Science, 2020, 47(4): 211-216.
[14] LI Meng-ke, ZHENG Qiu-sheng, WANG Lei. Dynamic Hybrid Data Race Detection Algorithm Based on Sampling Technique [J]. Computer Science, 2020, 47(10): 315-321.
[15] LUO Fei, REN Qiang, DING Wei-chao, LU Hai-feng. Heuristic One-dimensional Bin Packing Algorithm Based on Minimum Slack [J]. Computer Science, 2019, 46(9): 315-320.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!