计算机科学 ›› 2011, Vol. 38 ›› Issue (5): 287-289.
• 体系结构 • 上一篇 下一篇
刘勇燕,刘勇鹏,冯华,迟万庆
出版日期:
发布日期:
基金资助:
LIU Yong-yan,LIU Yong-peng,FENG Hua,CHI Wan-qing
Online:
Published:
摘要: 检查点机制是高性能并行计算系统中重要的容错手段,随着系统规模的增大,并行检查点的可扩展性受文件访问的制约。针对大规模并行计算系统的多级文件系统结构,提出了cache式并行检查点技术。它将全局同步并行检查点转化为局部文件操作,并利用多处理器结构进行乱序流水线式写回调度,将检查点的写回时机合理分布,从而有效地隐藏了检查点的写回开销,保证了并行检查点文件访问的高性能和高可扩展性。
关键词: Cache式检查点,并行计算,多级文件系统,多处理器,乱序流水线
Abstract: Checkpointing is a typical technique for fault tolerance, whereas its scalability is limited by the overhead of file access. According to the multi level file system architecture, the cache-style parallel checkpointing was introduced,which translates global coordinated checkpointing into local file operation by out of-order pipelining of checkpoint flushing opportunity. The overhead of writcback is hidden effectively to increase the performance and the scalability of parallel checkpointing.
Key words: Cachcstylc checkpointing, Parallel computing, Multi-level file system, Multi-processor, Out-of-order pipeline
刘勇燕,刘勇鹏,冯华,迟万庆. 面向大规模计算系统的Cache式并行检查点[J]. 计算机科学, 2011, 38(5): 287-289. https://doi.org/
LIU Yong-yan,LIU Yong-peng,FENG Hua,CHI Wan-qing. Cache-style Parallel Checkpointing for Large-scale Computing System[J]. Computer Science, 2011, 38(5): 287-289. https://doi.org/
0 / / 推荐
导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks
链接本文: https://www.jsjkx.com/CN/
https://www.jsjkx.com/CN/Y2011/V38/I5/287
Cited