Computer Science ›› 2022, Vol. 49 ›› Issue (10): 52-58.doi: 10.11896/jsjkx.210800091

• High Perfonnance Computing • Previous Articles     Next Articles

Distributed Lock with Inter-core Passing for SW26010 Processor

LI Ming-liang, PANG Jian-min, YUE Feng   

  1. State Key Laboratory of Mathematical Engineering and Advanced Computing,PLA Information Engineering University,Zhengzhou 450000,China
  • Received:2021-08-11 Revised:2022-01-07 Online:2022-10-15 Published:2022-10-13
  • About author:LI Ming-liang,born in 1991,Ph.D.His main research interests include high-performance computing and binary translation.
    PANG Jian-min,born in 1964,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include high-performance computing and information security.
  • Supported by:
    National Natural Science Foundation of China(61472447,61802433,61802435).

Abstract: In parallel programs,a mutual exclusive lock is often used to avoid conflict when accessing shared resources.The SW26010 processor,which is deployed on the Sunway TaihuLight supercomputer,is a heterogeneous many-core processor and there is no hardware lock mechanism for the co-processing cores.Developers have developed a software lock mechanism based on atomic instructions,but the software lock will lead to significant overhead and affect the performance of parallel programs.To solve this issue,the HDT-LOCK designed as distributed lock mechanism with inter-core passing is proposed.Firstly,the hybrid distributed lock is proposed and implemented based on scratchpad memory on co-processing cores to mitigate memory congestion.Furthermore,the inter-core passing mechanism using register communication and the single-instruction multiple-data instruction is developed to improve the throughput of HDT-LOCK.Experimental results show that the proposed HDT-LOCK mechanism mitigates memory congestion,and has better scalability.In addition,the lock passing mechanism improves HDT-LOCK throughput up to 5.6X.

Key words: SW26010 processor, Hybrid distributed lock, Inter-core passing, Single-instruction multiple-data instruction, Register communication

CLC Number: 

  • TP319
[1]ZHU Y,PANG J M,XU J L,et al.Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor[J].Computer Science,2021,48(6):10-18.
[2]TAO X H,PANG J M,GAO W,et al.Performance Optimization of FT Program Based on SW26010 Processor[J].Computer Science,2019,46(4):321-328.
[3]WIENKE S,SPRINGER P,TERBOVEN C,et al.OpenACC—first experiences with real-world applications[C]//European Conference on Parallel Processing.Berlin:Springer,2012:859-870.
[4]DALESSANDRO L,DICE D,SCOTT M,et al.Transactionalmutex locks[C]//European Conference on Parallel Processing.Berlin:Springer,2010:2-13.
[5]ALFRANSEDER M,DEUBZER M,JUSTUS B,et al.An efficient spin-lock based multi-core resource sharing protocol[C]//2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC).Piscataway:IEEE Press,2014:1-7.
[6]DUAN X H.Optimization of Molecular Dynamics AlgorithmsBased on the Sunway TaihuLight Supercomputer[D].Jinan:Shandong University,2020.
[7]CHABBI M,FAGAN M,MELLOR-CRUMMEY J.High performance locks for multi-level NUMA systems[J].ACM SIGPLAN Notices,2015,50(8):215-226.
[8]DICE D.Malthusian locks[C]//Proceedings of the 12th Euro-pean Conference on Computer Systems (EuroSys'17).New York:ACM,2017:314-327.
[9]DICE D,MARATHE V J,SHAVIT N.Lock cohorting:A ge-neral technique for designing NUMA locks[J].ACM Transactions on Parallel Computing (TOPC),2015,1(2):1-42.
[10]FU H,LIAO J,YANG J,et al.The Sunway TaihuLight supercomputer:system and applications [J].Science China-Information Sciences,2016,59(7):1-16.
[11]CHEN D X,LIU X.Parallel programming and optimization of Sunway TaihuLight[M].Wuxi:National Parallel Computer Engineering Technology Research Center,2017.
[12]EPCC.EPCC OpenACC Benchmarks[EB/OL].(2013-09-23) [2021-08-10].https://github.com/EPCCed/epcc-openacc-bench-marks.
[13]ANDERSON T E.The performance of spin lock alternatives for shared-memory multiprocessors[J].IEEE Transactions on Pa-rallel and Distributed Systems,1990,1(1):6-16.
[14]KWAK B J,SONG N O,MILLER L E.Performance analysis of exponential backoff[J].IEEE/ACM Transactions on Networking,2005,13(2):343-355.
[15]CRAIG T.Building FIFO and priorityqueuing spin locks fromatomic swap:Technical Report TR 93-02-02[R].Seattle:Department of Computer Science,University of Washington,1993.
[16]GOPALAKRISHNA K,LU S,ZHANG Z,et al.Untanglingcluster management with Helix[C]//Proceedings of the Third ACM Symposium on Cloud Computing.New York:ACM,2012:1-13.
[17]FRANCE-PILLOIS M,MARTIN J,ROUSSEAU F.Implementation and evaluation of a hardware decentralized synchronization lock for MPSoCs[C]//2020 IEEE International Parallel and Distributed Processing Symposium(IPDPS).Piscataway:IEEE Press,2020:1112-1121.
[18]TANG X,ZHAI J,QIAN X,et al.plock:A fast lock for architectures with explicit inter-core message passing[C]//Procee-dings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2019:765-778.
[1] TAO Xiao-han, PANG Jian-min, GAO Wei, WANG Qi, YAO Jin-yang. Performance Optimization of FT Program Based on SW26010 Processor [J]. Computer Science, 2019, 46(4): 321-328.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!