计算机科学 ›› 2017, Vol. 44 ›› Issue (7): 47-56.doi: 10.11896/j.issn.1002-137X.2017.07.009

• 2016 年全国理论计算机科学学术年会 • 上一篇    下一篇

面向通讯同步的多处理器阵列重构

吴亚兰,武继刚,姜文超,刘竹松   

  1. 广东工业大学计算机学院 广州510006,广东工业大学计算机学院 广州510006,广东工业大学计算机学院 广州510006,广东工业大学计算机学院 广州510006
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61572144),广东省科技计划应用专项基金(2015B010129014),广东省自然科学基金项目(2016A030313703)资助

Reconfiguring Multiprocessor Arrays for Synchronous Communication

WU Ya-lan, WU Ji-gang, JIANG Wen-chao and LIU Zhu-song   

  • Online:2018-11-13 Published:2018-11-13

摘要: 从多处理器阵列中获取所需大小并且同步通讯性能优良的子阵列,是高性能拓扑重构的核心问题之一。基于不同的逻辑列剔除策略提出了3种面向通讯同步的拓扑重构算法:基于分治思想剔除逻辑列的重构算法(SCA_01),该算法能够使被优化的逻辑列相对均匀地分布在物理阵列中;优先剔除长逻辑列的贪心重构算法(SCA_02),该算法能够使被优化的逻辑列的长链接总数最少;基于分治与长链接数的混成重构算法(SCA_03),该算法将某一区域内的最长逻辑列剔除,且尽可能将剩余逻辑列均匀分布在物理阵列中。同时,对逻辑阵列的最大通讯延时给出了下界的求解算法。实验结果表明,3种算法在故障率小于1%、逻辑列的剔除率超过20%时,算法重构出的逻辑阵列的通讯延时特别接近计算出的性能下界。在多数情况下SCA_01优于SCA_02和SCA_03,而后两者的性能相近。在小阵列上且故障率与剔除率较小时,SCA_02具有性能优势,但在大阵列上SCA_03具有优势。在32×32的阵列上,SCA_01构造的阵列产生的通讯延时较SCA_02和SCA_03产生的延时平均减少25%,并且运行速度也提升了19.4%。

关键词: VLSI阵列,拓扑重构,容错,分治,算法

Abstract: Reconfiguring VLSI arrays to get a logical array with given size and synchronous communication is one of the key problems in reconfigurable topology for high performance computing.This paper presented three algorithms based on three different strategies of excluding logical co-lumns.The first algorithm,named SCA_01,can make the logical co-lumns in the uniform distribution in the host array,based on the divide-and-conquer for excluding logical columns.The second algorithm,named SCA_02,can minimize the number of the long interconnects of the logical array,based on the excluding the long logical column in priority.The third algorithm,named SCA_03,keeps the logical columns distributed in uniform way based on the hybrid strategy from excluding the long logical column and divide-and-conquer.In addition,this paper contributed an algorithm to calculate the lower bound of the communication delay for the given logical array.Simulation results show that,the communication delay of the logical array reconstructed by three algorithms is close to the lower bound proposed in this paper,when fault rate is less than 1% and the exclusion rate of logical columns is over 20%.Algorithm SCA_01 is superior to SCA_02 and SCA_03 in most cases,while SCA_02 and SCA_03 have nearly the same performance.But SCA_02 is better on smaller arrays and SCA_03 is better on large arrays,when the fault rate and exclusion rate are relatively small.The communication delay generated by SCA_01 is less than that of SCA_02 and SCA_03 by 25% on 32×32 host arrays.Moreover,SCA_01 is faster than the other two algorithms,and the running time is saved by 19.4%.It is concluded that SCA_01 is one of the relatively desirable algorithms to generate the logical arrays with minimum communication delay for high performance computing.

Key words: VLSI array,Topology reconstruction,Fault-tolerance,Divide and conquer,Algorithm

[1] NEGRINI R,SAMI M G,STEFANELLI R.Fault tolerancethrough reconfiguration in VLSI and WSI arrays[M].Cambridge,MA;The MIT Press,1989.
[2] CHEN Y Y,UPADHYAYA S J,CHENG C H.A Comprehensive Reconfiguration Scheme for Fault-Tolerant VLSI/WSI Array Processors [J].IEEE Trans.on Computers,1997,46(12):1363-1371.
[3] HORITA T,TAKANAMI I.Fault-Tolerant Processor ArraysBased on the 1.5-Track Switches with Flexible Spare Distributions [J].IEEE Trans.on Computers,2000,49(6):542-552.
[4] ZHANG C,CHENG T.A self-reconfigurable camera array [J].Eurographics Symposium on Rendering,2004,40(6):243-254.
[5] HSU C L,CHENG C H,LIU Y.Built-in self-detection/correction architecture for motion estimation computing arrays [J].IEEE Trans.Very Large Scale Integration Systems,2010,18(2):319-324.
[6] SCHBER,VOLKER,PAUL S,et al.Memory Built-In Self-Repair using redundant words [C]∥IEEE International Test Conference (TC).2001:995-1001.
[7] KUO S Y,CHEN I Y.Efficient Reconfiguration Algorithms for Degradable VLSI/WSI Arrays [J].IEEE Trans.Computer-Aided Design of Integrated Circuits and Systems,1992,11(10):1289-1300.
[8] LOW C P.An Efficient Reconfiguration Algorithm for Degra-dable VLSI/WSI Arrays [J].IEEE Trans.on Computers,2000,49(6):553-559.
[9] WU J G,HAN X G.New upper bound of target array for reconfigurable VLSI arrays [C]∥2011 International Conference of Electron Devices and Solid-State Circuits (EDSSC).IEEE.2011,1387:1-2.
[10] XU X,SHEN Y Z,SUN X M,et al.A New Upper Bound for Reconfigurable Multiprocessor Array with Faults [J].Journal of Wuhan University,2011,57(6):483-488.(in Chinese) 徐雄,沈宇泽,孙学梅,等.可重构处理器阵列的容错上界 [J].武汉大学学报:理学版,2011,57(6):483-488.
[11] LOW C P,LEONG H W.On the Reconfiguration of Degradable VLSI/WSI Arrays [J].IEEE Trans.Computer-Aided Design of Integrated Circuits and Systems,1997,16(10):1213-1221.
[12] WU J G,SRIKANTHAN T.Partial Rerouting Algorithm for Reconfigurable VLSI Arrays [C]∥Proc.IEEE Int’l Symp.Circuits and Systems.2003:641-644.
[13] WU J G,SRIKANTHA T.Fast Reconfiguring Mesh-connected VLSI Arrays[C]∥Proc.IEEE Int’l Symp.Circuits and Systems.2004:949-952.
[14] WU J G,SRIKANTHA T.Accelerating Reconfiguration of Degradable VLSI Arrays [J].Proc.IEE Circuits,Devices and Systems,2006,153(4):383-389.
[15] FUKUSHI V,HORIGUCHI S.Reconfiguration Algorithm for Degradable Processor Arrays Based on Row and Column Rerou-ting[C]∥Proc.IEEE 19th Int’l Symp.Defect and Fault To-lerance in VLSI Systems.2004:496-504
[16] FUKUSHI M,FUKUSHIMA Y,HORIGUCHI S.A GeneticApproach for the Reconfiguration of Degradable Processor Arrays [J].Proc.IEEE 20th Int’l Symp.Defect and Fault Tole-rance in VLSI Systems.2005:63-71.
[17] FUKUSHIMA Y,FUKUSHI M,HORIGUCHI S.An Improved Reconfiguration Method for Degradable Processor Arrays Using Genetic Algorithm [C]∥Proc.IEEE 21st Int’l Symp.Defect and Fault To-lerance in VLSI Systems.2006:353-361.
[18] WU J G,SRIKANTHA T,WANG X.Integrated Row and Colu-mn Re-Routing for Reconfiguration of VLSI Arrays with 4-Port Switches [J].IEEE Trans.on Computers,2007,56(10):1387-1400.
[19] WU J G,SRIKANTHA T,HAN X.Preprocessing and Partial Rerouting Techniques for Accelerating Reconfiguration of Degradable VLSI Arrays [J].IEEE Trans.Very Large Scale Integration Systems,2010,18(2):315-319.
[20] ZHU Y B,WU J G,LAM S K,et al.Preprocessing technique for accelerating reconfiguration of degradable VLSI arrays [C]∥IEEE International Symposium on Circuits & Systems.2013:2424-2427.
[21] WU J G,SRIKANTHA T.Reconfiguration Algorithms for PowerEfficient VLSI Subarrays with 4-Port Switches [J].IEEE Trans.on Computers,2006,55(3):243-253.
[22] WU J G,SRIKANTHA T,JIANG G,et al.Constructing Sub-Arrays with Short Interconnects from Degradable VLSI Arrays [J].IEEE Trans.on Prallel and Distributed Systems,2014,25(4):929-938.
[23] WU J G,NIU Z P,ZHU Y B,et al.Reconfiguration algorithm for low temperature sub-array on VLSI/WSI arrays with faults [J].IEEE International Symposium on the Physical & Failure Analysis of Integrated Circuits,2011,1416(1):1-4.
[24] ZHU Y B,WU J G,LAM S K,et al.Reconfiguration Algorithms for Degradable VLSI Arrays with Switch Faults[C]∥IEEE International Conference on Parallel & Distributed Systems.2012:356-361.
[25] JIANG G,WU J,HA Y,et al.Reconfiguring Three-Dimensional Processor Arrays for Fault-Tolerance:Hardness and Heuristic Algorithms [J].IEEE Trans.on Computers,2015,64(10):2926-2939.
[26] ZHANG Y R,WU J G,DUAN X M.Improved Algorithm for Communication Synchronization Oil Reconfigurable Mesh with Faults [J].Computer Science,2012,39(3):295-298.(in Chinese) 张元瑞,武继刚,段新明.可重构矩阵的同步性能优化算法 [J].计算机科学,2012,39(3):295-298.
[27] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to Algorithms [M].Massachusetts Institute of Technolo-gy Press,2009.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!