计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 1-9.doi: 10.11896/jsjkx.201200115

• 计算机体系结构* • 上一篇    下一篇

一种面向构件化并行应用程序的性能骨架分析方法

傅天豪1,3, 田鸿运1, 金煜阳2, 杨章1, 翟季冬2, 武林平1, 徐小文1   

  1. 1 北京应用物理与计算数学研究所 北京100094
    2 清华大学计算机科学与技术系 北京100084
    3 中国工程物理研究院研究生院 北京100088
  • 收稿日期:2020-12-12 修回日期:2021-03-23 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 徐小文(xwxu@iapcm.ac.cn)
  • 基金资助:
    科技部重点研发计划高性能计算重点专项课题(2017YFB0202103);国防基础科研核科学挑战专题项目(TZ2019002)

Performance Skeleton Analysis Method Towards Component-based Parallel Applications

FU Tian-hao1,3, TIAN Hong-yun1, JIN Yu-yang2, YANG Zhang1, ZHAI Ji-dong2, WU Lin-ping1, XU Xiao-wen1   

  1. 1 Institute of Applied Physics and Computational Mathematics,Beijing 100094,China
    2 Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
    3 Graduate School of China Academy of Engineering Physics (CAEP),Beijing 100088,China
  • Received:2020-12-12 Revised:2021-03-23 Online:2021-06-15 Published:2021-06-03
  • About author:FU Tian-hao,born in 1996,master candidate,is a student member of China Computer Federation.His main research interests include high perfor-mance computing and performance ana-lysis for parallel applications.(futianhao18@gscaep.ac.cn)
    XU Xiao-wen,born in 1978,Ph.D professor.His research interests include high performance numerical algorithm &software in scientific and engineering fields,parallel programming framework for large-scale numerical simulations.He is member of China Computer Fe-deration,SIAM and CSIAM.
  • Supported by:
    National Key R&D Program of China(2017YFB0202103) and Science Challenge Project(TZ2019002).

摘要: 性能骨架分析技术通过刻画并行应用程序的程序结构,为并行应用程序性能建模提供输入,是大规模并行应用程序性能分析、性能优化的基础。文中针对数值模拟领域中的一类构件化并行应用程序,在面向通用程序二进制文件的动静态结构分析技术的基础上,提出并实现了一种基于“构件-循环-调用”关系树(Component-Loop-Call-Tree,CLCT)的程序结构自动化生成方法,在此基础上,研制了一种面向构件化并行应用程序的性能骨架分析工具(CLCT SkeleTon Analysis Toolkit,CLCT-STAT)。该方法可以自动识别构件化应用程序中构件类成员函数符号,生成以构件为最小单位的并行应用程序性能骨架。在多个构件化并行应用程序上的测试表明,相比分析建模手动生成性能骨架的方法,所提方法不仅能提供更丰富的程序结构信息,还可以节约人工分析的时间成本。

关键词: 并行计算构件, 性能骨架, “构件-循环-调用”关系树, CLCT-STAT

Abstract: Performance skeleton analysis technology (PSTAT) provides input parameters for performance modeling of parallel applications by describing the program structure of parallel applications.PSTAT is the basis of performance analysis and performance optimization for large-scale parallel applications.Aiming at a kind of component-based parallel applications in the field of numerical simulation,based on the dynamic and static application structure analysis technology oriented to general program binary file,this paper proposes and implements an automatic performance skeleton generation method based on “component-loop-call” tree.On this foundation,a performance skeleton analysis toolkit CLCT-STAT(Component-Loop-Call-Tree SkeleTon Analysis Toolkit) is developed.This method can automatically identify the function symbols of component class members in component-based applications,and generate the performance skeleton of parallel application with component as the smallest unit.Compared with the method of manual generation of performance skeleton by analytical modeling,the proposed method can provide more program structure information and save the cost of manual analysis.

Key words: Parallel computing component, Performance skeleton, Component-loop-call tree, CLCT-STAT

中图分类号: 

  • TP302
[1] YU D.Research for Scientific Computing on Large Scales[J].China Basic Science,2001(1):19-25.
[2] ZHAO G L.The 55th Global Supercomputing TOP500 List Released[EB/OL].China Science Daily.(2020-06-24)[2020-07-06].http://news.science net.cn/sbhtml news/202016/356058.shtm.
[3] YANG X J.Sixty Years of Parallel Computing[J].ComputerEngineering & Science,2012,34(8):1-10.
[4] DATTA K,KAMIL S,OLIKER L,et al.Optimization and performance modeling of stencil computations on modern microprocessors[J].Siam Review,2009,51(1):129-159.
[5] DING N.Research on Automatic Performance Modeling Technique for Large Scale Scientific Computing Applications[D].Beijing:Tsinghua University,2018.
[6] CHEN Z X,ZHAN J Y,HAO Z B.Method for Static Function Call Analysis with Control Flow[J].Computer Engineering,2011,37(9):47-50.
[7] KOMONDOOR R,HORWITZ S.Using Slicing to Identify Duplication in Source Code[C]//International Symposium onSta-tic Analysis.Springer-Verlag,2001.
[8] WANG X D,ZHANG Y K.Analysis of the C++ Source Program Structure Based on GCC Abstract Syntax Tree[J].Computer Engineering and Applications,2006,42(23):97-99.
[9] MO Z Y.Progress on High Performance Programming Framework for Numerical Simulation[J].E-science Technology & Application,2015,6(4):11-19.
[10] DOE Workshop Report.Exascale Programming Challenges[OL].http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/ProgrammingChallengesWorkshopReport.pdf.
[11] MO Z Y,ZHANG A Q,LIU Q K,et al.Research on the components and practices for domain-specific parallel programming models for numerical simulation[J].Scientia Sinica Informa-tionis,2015,45(3):385-397.
[12] MO Z Y,ZHANG A Q,LIU Q K,et al.Parallel algorithm and parallel programming:from specialty to generality as well as software reuse[J].Scientia Sinica Informationis,2016,46(10):1392-1410.
[13] MO Z Y,ZHANG A Q,CAO X L,et al.JASMIN:a parallelsoftware infrastructure for scientific computing[J].Frontiers of Computer Science in China,2010,4(4):480-488.
[14] HORNUNG R D,KOHN S R.Managing Application Complexity in the SAMRAI Object-Oriented Framework[J].Concurrency &Computation Practice & Experience,2002,14(5):347-368.
[15] SHYUE K M.A Fluid-Mixture Type Algorithm for Compressible Multicomponent Flow with Mie-Grüneisen Equation of State[J].Journal of Computational Physics,2001,171(2):678-707.
[16] LIU Q K,ZHAO W B,CHEN J,et al.A Programming Framework for Large Scale Numerical Simulations on Unstructured Mesh[C]// 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity),IEEE International Conference on High Performance and Smart Computing (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS).IEEE,2016.
[17] STEWART J R,EDWARDS H C.The SIERRA Framework for Developing Advanced Parallel Mechanics Applications[M].Large-Scale PDE-Constrained Optimization.Berlin Heidelberg:Springer,2003.
[18] ZHANG L B,ZHENG W Y,LU B Z,et al.The toolbox PHG and its applications[J].Scientia Sinica Informationis,2016(10):1442-1464.
[19] ZHANG B Y,LI G,DENG L,et al.Research and Development of JCOGIN for Monte Carlo Particle Transport Code[J].Atomic Energy Science and Technology,2013,47(z2):448-452.
[20] SEFL M.Geant4 simulation toolkit[J].Nuclear Instruments & Methods in Physics Research,2012,506(3):250-303.
[21] YANG F Q,MEI H,HUANG G.Design and Implementation of Component-Based Software[M].Tsinghua University Press,2008.
[22] MO Z Y,ZHANG A Q.JASMIN2.0 User Guide[M].Institute of Applied Physics and Computational Mathematics,2011.
[23] BHATTACHARYYA A,HOEFLER T.PEMOGEN:Automa-tic Adaptive Performance Modeling During Program Runtime[C]//Proceedings of the 23rd International Conference on Pa-rallel Architectures and Compilation.ACM,2014:393-404.
[24] BHATTACHARYYA A,KWASNIEWSKI G,HOEFLER T.Using Compiler Techniques to Improve Automatic Performance Modeling[C]//International Conference on Parallel Architecture & Compilation.IEEE,2015.
[25] ZHAI J D,HU J F,TANG X C,et al.CYPRESS:Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression[C]//Proceedings of the International Con-ference for High Performance Computing,Networking,Storageand Analysis,SC14.New Orleans:LA,2014:143-153.
[26] JASMIN[EB/OL].http://www.caep-scns.ac.cn/JASMIN.php.
[1] 郭彪, 唐麒, 文智敏, 傅娟, 王玲, 魏急波. 一种面向动态部分可重构片上系统的列表式软硬件划分算法[J]. 计算机科学, 2021, 48(6): 19-25.
[2] 俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法[J]. 计算机科学, 2021, 48(6A): 518-523.
[3] 张航, 唐聃, 蔡红亮. 分布式存储系统中的预测式纠删码研究[J]. 计算机科学, 2021, 48(5): 130-139.
[4] 鄂海红, 张田宇, 宋美娜. 基于Web的数据可视化图表渲染优化方法[J]. 计算机科学, 2021, 48(3): 119-123.
[5] 王妍, 韩笑, 曾辉, 刘荆欣, 夏长清. 边缘计算环境下服务质量可信的任务迁移节点选择[J]. 计算机科学, 2020, 47(10): 240-246.
[6] 王喆, 唐麒, 王玲, 魏急波. 一种基于模拟退火的动态部分可重构系统划分-调度联合优化算法[J]. 计算机科学, 2020, 47(8): 26-31.
[7] 王国澎, 杨剑新, 尹飞, 蒋生健. 负载均衡的处理器运算资源分配方法[J]. 计算机科学, 2020, 47(8): 41-48.
[8] 庄奕, 杨家海. 限时点到多点跨数据中心传输的多源树调度算法[J]. 计算机科学, 2020, 47(7): 213-219.
[9] 朱丽花, 王玲, 唐麒, 魏急波. 一种针对动态部分可重构SoC软硬件划分的高效MILP模型[J]. 计算机科学, 2020, 47(4): 18-24.
[10] 汪晨欣, 杨家海, 庄奕, 罗念龙. 未来网络试验设施的节点资源调度算法[J]. 计算机科学, 2019, 46(12): 95-100.
[11] 贾迅, 钱磊, 邬贵明, 吴东, 谢向辉. FPGA应用于高性能计算的研究现状和未来挑战[J]. 计算机科学, 2019, 46(11): 11-19.
[12] 叶跃进, 陈德训, 胡江凯, 马欣, 张小曳. GRAPES_CUACE大气化学耦合模式并行优化[J]. 计算机科学, 2019, 46(11A): 528-534.
[13] 邓定胜. 一种基于可编程GPU的实时烟雾模拟算法研究[J]. 计算机科学, 2019, 46(11A): 604-608.
[14] 张彬彬, 王娟, 岳昆, 武浩, 郝佳. 基于随机森林的虚拟机性能预测与配置优化[J]. 计算机科学, 2019, 46(9): 85-92.
[15] 梁媛,袁景凌,陈旻骋. 利用空间优化的增强学习Sarsa改进预取算法[J]. 计算机科学, 2019, 46(3): 327-331.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[8] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[9] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .
[10] 施超,谢在鹏,柳晗,吕鑫. 基于稳定匹配的容器部署策略的优化[J]. 计算机科学, 2018, 45(4): 131 -136 .