计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 15-22.doi: 10.11896/jsjkx.220900250

• 高性能计算 • 上一篇    下一篇

面向处理器设计的快速性能评测方法

邓林, 张瑶, 罗家豪   

  1. 国防科技大学计算机学院 长沙 410073
  • 收稿日期:2022-09-26 修回日期:2023-03-17 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 邓林(denglin@nudt.edu.cn)
  • 基金资助:
    高层次科技创新人才工程人选自主科研项目(22-TDRCJH-02-006)

Fast Performance Evaluation Method for Processor Design

DENG Lin, ZHANG Yao, LUO Jiahao   

  1. College of Computer Science and Technology,National University of Defense Technology,Changsha,410073,China
  • Received:2022-09-26 Revised:2023-03-17 Online:2023-11-15 Published:2023-11-06
  • About author:DENG Lin,born in 1980,Ph.D,asso-ciate research fellow,is a member of China Computer Federation.His main research interests include computer architecture,microprocessor design,integrated circuit verification and basic software.
  • Supported by:
    High-level Scientific and Technological Innovation Talent Project Candidates Independent Research Projects(22-TDRCJH-02-006).

摘要: 面对日益复杂的处理器设计和有限的设计周期,如何有效地快速进行性能评估,是每一个处理器设计团队需要解决的问题。完整的性能测试集需要运行较长的时间,特别是在硅前验证阶段,高昂的时间成本导致设计团队无法使用完整的性能测试集进行性能评估分析。文中介绍了一种通用处理器快速性能评测方法(Fast-Eval),Fast-Eval性能评测方法基于SimPoint技术,使用FastParallel-BBV方法、最优模拟点的选取以及模拟点的热迁移等方法,显著缩短了BBV生成时间和性能测试时间。实验结果表明,相比完整运行SPEC CPU 2006 REF数据规模测试程序获得的性能数据,所提方法在ARM64处理器上BBV生成时间缩短为原来的16.88%,性能评估时间缩短为原来的1.26%,性能评估结果的平均相对误差为0.53%;在FPGA开发板上测试集的平均相对误差可以达到0.40%,运行时间仅为完整运行时间的0.93%。

关键词: 快速BBV生成, 性能评测, SimPoint, 处理器, 验证

Abstract: In the face of increasingly complex processor design and limited design cycles,how to efficiently and quickly perform performance evaluation is a problem faced by each processor design team.The complete performance test suite requires longer run time,especially in the pre-silicon validation phase,and the high time cost makes it impossible for the design team to use the full performance test suite for performance evaluation analysis.In this paper,a general processor Fast-Eval method based on the SimPoint technique,using the Fast Parallel-BBV method,the selection of the optimal simulation points and the thermal migration of the simulation points,significantly reduces the performance test time and BBV generation time.Experimental results show that the performance evaluation time of the ARM64 processor is reduced to16.88% ofthe original,and the performance evaluation time is reduced to 1.26% of the original,and the average relative error of the performance evaluation results is 0.53%.The ave-rage relative error of the test set on the FPGA board can reach 0.40%,and the running time is only 0.93% of the full running time.

Key words: Rapid BBV generation, Performance evaluation, SimPoint, Processor, Verification

中图分类号: 

  • TP306
[1]ZHANG Q L,HOU R,YANG S B,et al.The role of architecture simulator in processor design process[J].Computer Research and Development,2019,56(12):2702-2719.
[2]BUTKO A,GARIBOTTI R,OST L,et al.Accuracy evaluation of gem5 simulator system[C]//7th International Workshop on Reconfigurable and Communication-centric Systems-on-chip(ReCoSoC).IEEE,2012:1-7.
[3]HEIRMAN W,CARLSON T,EECKHOUT L.Sniper:Scalable and accurate parallel multi-core simulation[C]//8th Interna-tional Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems(ACACES-2012).High-Performance and Embedded Architecture and Compilation Network of Excellence(HiPEAC),2012:91-94.
[4]BINKERT N,BECKMANN B,BLACK G,et al.The gem5 si-mulator[J].ACM SIGARCH Computer Architecture News,2011,39(2):1-7.
[5]TA T,CHENG L,BATTEN C.Simulating multi-core RISC-Vsystems in gem5[C]//Workshop on Computer Architecture Research with RISC-V.2018.
[6]LUO T,WANG X,QU C,et al.An FPGA-based hardware emulator for neuromorphic chip with RRAM[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018,39(2):438-450.
[7]PATEL H V,RATHOD S S,SHAH P H.An FPGA basedHardware Emulator for Neuromorphic Chip[C]//2020 International Conference on Electronics and Sustainable Communication Systems(ICESC).IEEE,2020:1131-1136.
[8]LIU S,LAU F C M,SCHAFER B C.Accelerating FPGA proto-typing through predictive model-based HLS design space exploration[C]//Proceedings of the 56th Annual Design Automation Conference.2019:1-6.
[9]DENNIS D K,PRIYAM A,VIRK S S,et al.Single cycle RISC-V micro architecture processor and its FPGA prototype[C]//2017 7th International Symposium on Embedded Computing and System Design(ISED).IEEE,2017:1-5.
[10]JIANG X Z.Software-Hardware Co-emulation Automation Ve-rification Platform Design[D].Xi’an:Xidian University,2019.
[11]SUKHWANI B,ROEWER T,HAYMES C L,et al.Contutto:A novel FPGA-based prototyping platform enabling innovation in the memory subsystem of a server class processor[C]//Proceedings of the 50th Annual IEEE/ACM International Sympo-sium on Microarchitecture.2017:15-26.
[12]GUO H,HUANG L B,ZHENG Z,et al.Proto-perf:A fast and accurate performance evaluation method for general purpose processor prototype system[J].Computer Engineering and Science,2021,43(4):579-585.
[13]Valgrind.Valgrind Documentation[EB/OL].(2022-10-24)[2022-09-26].https://valgrind.org/docs/manual/valgrind_ma-nual.pdf.
[14]PHANSALKAR A,JOSHI A,JOHN L K.Analysis of redun-dancy and application balance in the SPEC CPU2006 benchmark suite[C]//Proceedings of the 34th Annual International Symposium on Computer architecture.2007:412-423.
[15]CRIU.Checkpoint/restore in user space[EB/OL].(2013-12-26)[2022-09-26].https://criu.org/CRIU:About.
[16]QEMU.QEMU is a generic and open source machine emulator and virtualizer[EB/OL].(2020-07-07)[2022-09-26].https://wiki.qemu.org/Main_Page.
[17]WEAVER V M,MCKEE S A.Using dynamic binary instrumen-tation to generate multi-platform simpoints:Methodology and accuracy[C]//International Conference on High-Performance Embedded Architectures and Compilers.Berlin:Springer,2008:305-319.
[18]CALDER B,SHERWOOD T,HAMERLY G,et al.Simpoint:Picking representative samples to guide simulation[J/OL].https://sites.cs.ucsb.edu/~sherwood/pubs/CHAPTER-simpoint.pdf.
[19]SHERWOOD T,PERELMAN E,HAMERLY G,et al.Auto-matically characterizing large scale program behavior[J].ACM SIGPLAN Notices,2002,37(10):45-57.
[20]LIKAS A,VLASSIS N,VERBEEK J J.The global k-means clustering algorithm[J].Pattern Recognition,2003,36(2):451-461.
[21]HENNING J L.SPEC CPU2006 benchmark descriptions[J].ACM SIGARCH Computer Architecture News,2006,34(4):1-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!