计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 1-9.doi: 10.11896/jsjkx.201200115

• 计算机体系结构* • 上一篇    下一篇

一种面向构件化并行应用程序的性能骨架分析方法

傅天豪1,3, 田鸿运1, 金煜阳2, 杨章1, 翟季冬2, 武林平1, 徐小文1   

  1. 1 北京应用物理与计算数学研究所 北京100094
    2 清华大学计算机科学与技术系 北京100084
    3 中国工程物理研究院研究生院 北京100088
  • 收稿日期:2020-12-12 修回日期:2021-03-23 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 徐小文(xwxu@iapcm.ac.cn)
  • 基金资助:
    科技部重点研发计划高性能计算重点专项课题(2017YFB0202103);国防基础科研核科学挑战专题项目(TZ2019002)

Performance Skeleton Analysis Method Towards Component-based Parallel Applications

FU Tian-hao1,3, TIAN Hong-yun1, JIN Yu-yang2, YANG Zhang1, ZHAI Ji-dong2, WU Lin-ping1, XU Xiao-wen1   

  1. 1 Institute of Applied Physics and Computational Mathematics,Beijing 100094,China
    2 Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
    3 Graduate School of China Academy of Engineering Physics (CAEP),Beijing 100088,China
  • Received:2020-12-12 Revised:2021-03-23 Online:2021-06-15 Published:2021-06-03
  • About author:FU Tian-hao,born in 1996,master candidate,is a student member of China Computer Federation.His main research interests include high perfor-mance computing and performance ana-lysis for parallel applications.(futianhao18@gscaep.ac.cn)
    XU Xiao-wen,born in 1978,Ph.D professor.His research interests include high performance numerical algorithm &software in scientific and engineering fields,parallel programming framework for large-scale numerical simulations.He is member of China Computer Fe-deration,SIAM and CSIAM.
  • Supported by:
    National Key R&D Program of China(2017YFB0202103) and Science Challenge Project(TZ2019002).

摘要: 性能骨架分析技术通过刻画并行应用程序的程序结构,为并行应用程序性能建模提供输入,是大规模并行应用程序性能分析、性能优化的基础。文中针对数值模拟领域中的一类构件化并行应用程序,在面向通用程序二进制文件的动静态结构分析技术的基础上,提出并实现了一种基于“构件-循环-调用”关系树(Component-Loop-Call-Tree,CLCT)的程序结构自动化生成方法,在此基础上,研制了一种面向构件化并行应用程序的性能骨架分析工具(CLCT SkeleTon Analysis Toolkit,CLCT-STAT)。该方法可以自动识别构件化应用程序中构件类成员函数符号,生成以构件为最小单位的并行应用程序性能骨架。在多个构件化并行应用程序上的测试表明,相比分析建模手动生成性能骨架的方法,所提方法不仅能提供更丰富的程序结构信息,还可以节约人工分析的时间成本。

关键词: “构件-循环-调用”关系树, CLCT-STAT, 并行计算构件, 性能骨架

Abstract: Performance skeleton analysis technology (PSTAT) provides input parameters for performance modeling of parallel applications by describing the program structure of parallel applications.PSTAT is the basis of performance analysis and performance optimization for large-scale parallel applications.Aiming at a kind of component-based parallel applications in the field of numerical simulation,based on the dynamic and static application structure analysis technology oriented to general program binary file,this paper proposes and implements an automatic performance skeleton generation method based on “component-loop-call” tree.On this foundation,a performance skeleton analysis toolkit CLCT-STAT(Component-Loop-Call-Tree SkeleTon Analysis Toolkit) is developed.This method can automatically identify the function symbols of component class members in component-based applications,and generate the performance skeleton of parallel application with component as the smallest unit.Compared with the method of manual generation of performance skeleton by analytical modeling,the proposed method can provide more program structure information and save the cost of manual analysis.

Key words: CLCT-STAT, Component-loop-call tree, Parallel computing component, Performance skeleton

中图分类号: 

  • TP302
[1]YU D.Research for Scientific Computing on Large Scales[J].China Basic Science,2001(1):19-25.
[2]ZHAO G L.The 55th Global Supercomputing TOP500 List Released[EB/OL].China Science Daily.(2020-06-24)[2020-07-06].http://news.science net.cn/sbhtml news/202016/356058.shtm.
[3]YANG X J.Sixty Years of Parallel Computing[J].ComputerEngineering & Science,2012,34(8):1-10.
[4]DATTA K,KAMIL S,OLIKER L,et al.Optimization and performance modeling of stencil computations on modern microprocessors[J].Siam Review,2009,51(1):129-159.
[5]DING N.Research on Automatic Performance Modeling Technique for Large Scale Scientific Computing Applications[D].Beijing:Tsinghua University,2018.
[6]CHEN Z X,ZHAN J Y,HAO Z B.Method for Static Function Call Analysis with Control Flow[J].Computer Engineering,2011,37(9):47-50.
[7]KOMONDOOR R,HORWITZ S.Using Slicing to Identify Duplication in Source Code[C]//International Symposium onSta-tic Analysis.Springer-Verlag,2001.
[8]WANG X D,ZHANG Y K.Analysis of the C++ Source Program Structure Based on GCC Abstract Syntax Tree[J].Computer Engineering and Applications,2006,42(23):97-99.
[9]MO Z Y.Progress on High Performance Programming Framework for Numerical Simulation[J].E-science Technology & Application,2015,6(4):11-19.
[10]DOE Workshop Report.Exascale Programming Challenges[OL].http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/ProgrammingChallengesWorkshopReport.pdf.
[11]MO Z Y,ZHANG A Q,LIU Q K,et al.Research on the components and practices for domain-specific parallel programming models for numerical simulation[J].Scientia Sinica Informa-tionis,2015,45(3):385-397.
[12]MO Z Y,ZHANG A Q,LIU Q K,et al.Parallel algorithm and parallel programming:from specialty to generality as well as software reuse[J].Scientia Sinica Informationis,2016,46(10):1392-1410.
[13]MO Z Y,ZHANG A Q,CAO X L,et al.JASMIN:a parallelsoftware infrastructure for scientific computing[J].Frontiers of Computer Science in China,2010,4(4):480-488.
[14]HORNUNG R D,KOHN S R.Managing Application Complexity in the SAMRAI Object-Oriented Framework[J].Concurrency &Computation Practice & Experience,2002,14(5):347-368.
[15]SHYUE K M.A Fluid-Mixture Type Algorithm for Compressible Multicomponent Flow with Mie-Grüneisen Equation of State[J].Journal of Computational Physics,2001,171(2):678-707.
[16]LIU Q K,ZHAO W B,CHEN J,et al.A Programming Framework for Large Scale Numerical Simulations on Unstructured Mesh[C]// 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity),IEEE International Conference on High Performance and Smart Computing (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS).IEEE,2016.
[17]STEWART J R,EDWARDS H C.The SIERRA Framework for Developing Advanced Parallel Mechanics Applications[M].Large-Scale PDE-Constrained Optimization.Berlin Heidelberg:Springer,2003.
[18]ZHANG L B,ZHENG W Y,LU B Z,et al.The toolbox PHG and its applications[J].Scientia Sinica Informationis,2016(10):1442-1464.
[19]ZHANG B Y,LI G,DENG L,et al.Research and Development of JCOGIN for Monte Carlo Particle Transport Code[J].Atomic Energy Science and Technology,2013,47(z2):448-452.
[20]SEFL M.Geant4 simulation toolkit[J].Nuclear Instruments & Methods in Physics Research,2012,506(3):250-303.
[21]YANG F Q,MEI H,HUANG G.Design and Implementation of Component-Based Software[M].Tsinghua University Press,2008.
[22]MO Z Y,ZHANG A Q.JASMIN2.0 User Guide[M].Institute of Applied Physics and Computational Mathematics,2011.
[23]BHATTACHARYYA A,HOEFLER T.PEMOGEN:Automa-tic Adaptive Performance Modeling During Program Runtime[C]//Proceedings of the 23rd International Conference on Pa-rallel Architectures and Compilation.ACM,2014:393-404.
[24]BHATTACHARYYA A,KWASNIEWSKI G,HOEFLER T.Using Compiler Techniques to Improve Automatic Performance Modeling[C]//International Conference on Parallel Architecture & Compilation.IEEE,2015.
[25]ZHAI J D,HU J F,TANG X C,et al.CYPRESS:Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression[C]//Proceedings of the International Con-ference for High Performance Computing,Networking,Storageand Analysis,SC14.New Orleans:LA,2014:143-153.
[26]JASMIN[EB/OL].http://www.caep-scns.ac.cn/JASMIN.php.
[1] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[2] 吴功兴, 孙兆洋, 琚春华.
考虑中断风险与模糊定价的闭环供应链网络设计模型
Closed-loop Supply Chain Network Design Model Considering Interruption Risk and Fuzzy Pricing
计算机科学, 2022, 49(7): 220-225. https://doi.org/10.11896/jsjkx.201100084
[3] 傅思清, 黎铁军, 张建民.
面向粒子输运程序加速的体系结构设计
Architecture Design for Particle Transport Code Acceleration
计算机科学, 2022, 49(6): 81-88. https://doi.org/10.11896/jsjkx.210600179
[4] 郑智捷.
消解逻辑悖论建立元知识智能化体系
Meta Knowledge Intelligent Systems on Resolving Logic Paradoxes
计算机科学, 2022, 49(1): 9-16. https://doi.org/10.11896/jsjkx.210700023
[5] 刘炜, 阮敏捷, 佘维, 张志鸿, 田钊.
面向物联网的PBFT优化共识算法
PBFT Optimized Consensus Algorithm for Internet of Things
计算机科学, 2021, 48(11): 151-158. https://doi.org/10.11896/jsjkx.210500038
[6] 崔国楠, 王立松, 康介祥, 高忠杰, 王辉, 尹伟.
结合多目标优化算法的模糊聚类有效性指标及应用
Fuzzy Clustering Validity Index Combined with Multi-objective Optimization Algorithm and Its Application
计算机科学, 2021, 48(10): 197-203. https://doi.org/10.11896/jsjkx.200900061
[7] 郭彪, 唐麒, 文智敏, 傅娟, 王玲, 魏急波.
一种面向动态部分可重构片上系统的列表式软硬件划分算法
List-based Software and Hardware Partitioning Algorithm for Dynamic Partial Reconfigurable System-on-Chip
计算机科学, 2021, 48(6): 19-25. https://doi.org/10.11896/jsjkx.200700198
[8] 俞建业, 戚湧, 王宝茁.
基于Spark的车联网分布式组合深度学习入侵检测方法
Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark
计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129
[9] 张航, 唐聃, 蔡红亮.
分布式存储系统中的预测式纠删码研究
Study on Predictive Erasure Codes in Distributed Storage System
计算机科学, 2021, 48(5): 130-139. https://doi.org/10.11896/jsjkx.200300124
[10] 鄂海红, 张田宇, 宋美娜.
基于Web的数据可视化图表渲染优化方法
Web-based Data Visualization Chart Rendering Optimization Method
计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038
[11] 王妍, 韩笑, 曾辉, 刘荆欣, 夏长清.
边缘计算环境下服务质量可信的任务迁移节点选择
Task Migration Node Selection with Reliable Service Quality in Edge Computing Environment
计算机科学, 2020, 47(10): 240-246. https://doi.org/10.11896/jsjkx.190900054
[12] 王喆, 唐麒, 王玲, 魏急波.
一种基于模拟退火的动态部分可重构系统划分-调度联合优化算法
Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing
计算机科学, 2020, 47(8): 26-31. https://doi.org/10.11896/jsjkx.200500110
[13] 王国澎, 杨剑新, 尹飞, 蒋生健.
负载均衡的处理器运算资源分配方法
Computing Resources Allocation with Load Balance in Modern Processor
计算机科学, 2020, 47(8): 41-48. https://doi.org/10.11896/jsjkx.191000148
[14] 庄奕, 杨家海.
限时点到多点跨数据中心传输的多源树调度算法
Multi-source Tree-based Scheduling Algorithm for Deadline-aware P2MP Inter-datacenter Transfers
计算机科学, 2020, 47(7): 213-219. https://doi.org/10.11896/jsjkx.200300069
[15] 朱丽花, 王玲, 唐麒, 魏急波.
一种针对动态部分可重构SoC软硬件划分的高效MILP模型
Efficient MILP Model for HW/SW Partitioning of Dynamic Partial Reconfigurable SoC
计算机科学, 2020, 47(4): 18-24. https://doi.org/10.11896/jsjkx.190300001
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!