计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 52-58.doi: 10.11896/jsjkx.210800091
李明亮, 庞建民, 岳峰
LI Ming-liang, PANG Jian-min, YUE Feng
摘要: 在并行程序中,互斥锁通常被用来避免访问共享资源时发生冲突。申威26010处理器是“神威·太湖之光”超级计算机采用的异构众核处理器,众核之间并无硬件互斥锁机制。其开发人员基于原子操作实现了一种软件互斥锁,但是该软件锁在激烈锁竞争情况下会产生大量的锁操作开销,影响了并行程序的性能。针对这一问题,提出了一种分布式传递锁机制HDT-LOCK。首先,提出并实现了基于众核上便签存储器和主存的混合分布锁来避免访存拥塞;其次,设计了基于寄存器通信和单指令多数据指令(Single-instruction Multiple-data Instruction)的锁传递机制,以进一步提高HDT-LOCK机制的吞吐量。实验结果表明,与原锁机制相比,所提HDT-LOCK机制避免了访存拥塞,并且可扩展性更佳。此外,锁传递机制使HDT-LOCK的吞吐量提升最高可达5.6倍。
中图分类号:
[1]ZHU Y,PANG J M,XU J L,et al.Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor[J].Computer Science,2021,48(6):10-18. [2]TAO X H,PANG J M,GAO W,et al.Performance Optimization of FT Program Based on SW26010 Processor[J].Computer Science,2019,46(4):321-328. [3]WIENKE S,SPRINGER P,TERBOVEN C,et al.OpenACC—first experiences with real-world applications[C]//European Conference on Parallel Processing.Berlin:Springer,2012:859-870. [4]DALESSANDRO L,DICE D,SCOTT M,et al.Transactionalmutex locks[C]//European Conference on Parallel Processing.Berlin:Springer,2010:2-13. [5]ALFRANSEDER M,DEUBZER M,JUSTUS B,et al.An efficient spin-lock based multi-core resource sharing protocol[C]//2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC).Piscataway:IEEE Press,2014:1-7. [6]DUAN X H.Optimization of Molecular Dynamics AlgorithmsBased on the Sunway TaihuLight Supercomputer[D].Jinan:Shandong University,2020. [7]CHABBI M,FAGAN M,MELLOR-CRUMMEY J.High performance locks for multi-level NUMA systems[J].ACM SIGPLAN Notices,2015,50(8):215-226. [8]DICE D.Malthusian locks[C]//Proceedings of the 12th Euro-pean Conference on Computer Systems (EuroSys'17).New York:ACM,2017:314-327. [9]DICE D,MARATHE V J,SHAVIT N.Lock cohorting:A ge-neral technique for designing NUMA locks[J].ACM Transactions on Parallel Computing (TOPC),2015,1(2):1-42. [10]FU H,LIAO J,YANG J,et al.The Sunway TaihuLight supercomputer:system and applications [J].Science China-Information Sciences,2016,59(7):1-16. [11]CHEN D X,LIU X.Parallel programming and optimization of Sunway TaihuLight[M].Wuxi:National Parallel Computer Engineering Technology Research Center,2017. [12]EPCC.EPCC OpenACC Benchmarks[EB/OL].(2013-09-23) [2021-08-10].https://github.com/EPCCed/epcc-openacc-bench-marks. [13]ANDERSON T E.The performance of spin lock alternatives for shared-memory multiprocessors[J].IEEE Transactions on Pa-rallel and Distributed Systems,1990,1(1):6-16. [14]KWAK B J,SONG N O,MILLER L E.Performance analysis of exponential backoff[J].IEEE/ACM Transactions on Networking,2005,13(2):343-355. [15]CRAIG T.Building FIFO and priorityqueuing spin locks fromatomic swap:Technical Report TR 93-02-02[R].Seattle:Department of Computer Science,University of Washington,1993. [16]GOPALAKRISHNA K,LU S,ZHANG Z,et al.Untanglingcluster management with Helix[C]//Proceedings of the Third ACM Symposium on Cloud Computing.New York:ACM,2012:1-13. [17]FRANCE-PILLOIS M,MARTIN J,ROUSSEAU F.Implementation and evaluation of a hardware decentralized synchronization lock for MPSoCs[C]//2020 IEEE International Parallel and Distributed Processing Symposium(IPDPS).Piscataway:IEEE Press,2020:1112-1121. [18]TANG X,ZHAI J,QIAN X,et al.plock:A fast lock for architectures with explicit inter-core message passing[C]//Procee-dings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2019:765-778. |
[1] | 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化 Performance Optimization of FT Program Based on SW26010 Processor 计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050 |
|