Computer Science ›› 2022, Vol. 49 ›› Issue (5): 363-370.doi: 10.11896/jsjkx.220100119
• Interdiscipline & Frontier • Previous Articles Next Articles
JI Ying-rui1, YUAN Liang2, ZHANG Yun-quan2
CLC Number:
[1]WOLF M E,LAM M S.A data locality optimizing algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.1991:30-44. [2]SONG Y,LI Z.New tiling techniques to improve cache temporal locality[J].ACM SIGPLAN Notices,1999,34(5):215-228. [3]RIVERA G,TSENG C W.Tiling optimizations for 3D scientific computations[C]//Proceedings of the 2000 ACM/IEEE Confe-rence on Supercomputing(SC’00).IEEE,2000:32. [4]MENG J,SKADRON K.Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs[C]//Proceedings of the 23rd International Conference on Supercomputing.2009:256-265. [5]RASTELLO F,DAUXOIS T.Efficient tiling for an ode discrete integration program:Redundant tasks instead of trapezoidal shaped-tiles[C]//Proceedings of the IEEE IPDPS’02.Pisca-taway,NJ:IEEE,2002:138. [6]NGUYEN A,SATISH N,CHHUGANI J,et al.3.5-D blocking optimization for stencil computations on modern CPUs and GPUs[C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing,Networking,Sto-rage and Analysis(SC’10).IEEE,2010:1-13. [7]DING C,HE Y.A ghost cell expansion method for reducingcommunications in solving PDE problems[C]//Proceedings of the 2001 ACM/IEEE Conference on Supercomputing(SC’01).IEEE,2001:55-55. [8]JIN G,MELLOR-CRUMMEY J,FOWLER R.Increasing temporal locality with skewing and recursive blocking[C]//Proceedings of the 2001 ACM/IEEE Conference on Supercomputing(SC’01).IEEE,2001:57-57. [9]WONNACOTT D.Achieving scalable locality with time ske-wing[J].International Journal of Parallel Programming,2002,30(3):181-221. [10]MALAS T,HAGER G,LTAIEF H,et al.Multicore-optimized wavefront diamond blocking for optimizing stencil updates[J].SIAM Journal on Scientific Computing,2015,37(4):C439-C464. [11]WELLEIN G,HAGER G,ZEISER T,et al.Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization[C]//2009 33rd Annual IEEE International Computer Software and Applications Conference.IEEE,2009:579-586. [12]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation.2008:101-113. [13]BANDISHTI V,PANANILATH I,BONDHUGULA U.Tiling stencil computations to maximize parallelism[C]//Proceedings of the International Conference on High Performance Computing,Networking,Storage and Analysis(SC’12).IEEE,2012:1-11. [14]PANANILATH I,ACHARYA A,VASISTA V,et al.An optimizing code generator for a class of lattice-boltzmann computations[J].ACM Transactions on Architecture and Code Optimization (TACO),2015,12(2):1-23. [15]GROSSER T,VERDOOLAEGE S,COHEN A,et al.The relation between diamond tiling and hexagonal tiling[J].Parallel Processing Letters,2014,24(3):1441002. [16]FRIGO M,STRUMPEN V.Cache oblivious stencil computa-tions[C]//Proceedings of the 19th Annual International Conference on Supercomputing.2005:361-366. [17]FRIGO M,STRUMPEN V.The cache complexity of mul-tithreaded cache oblivious algorithms[J].Theory of Computing Systems,2009,45(2):203-233. [18]STRZODKA R,SHAHEEN M,PAJAK D,et al.Cache obli-vious parallelograms in iterative stencil computations[C]//Proceedings of the 24th ACM International Conference on Supercomputing.2010:49-59. [19]TANG Y,CHOWDHURY R A,KUSZMAUL B C,et al.The pochoir stencil compiler[C]//Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures.2011:117-128. [20]KRISHNAMOORTHY S,BASKARAN M,BONDHUGULA U,et al.Effective automatic parallelization of stencil computations[J].ACM Sigplan Notices,2007,42(6):235-244. [21]CUI H M,WANG L,FAN D R,et al.Landing stencil code on Godson-T[J].Journal of Computer Science and Technology,2010,25(4):886-894. [22]WONNACOTT D.Using time skewing to eliminate idle timedue to memory bandwidth and network limitations[C]//Proceedings 14th International Parallel and Distributed Processing Symposium(IPDPS 2000).IEEE,2000:171-180. [23]ZHOU X,GIACALONE J P,GARZARÁN M J,et al.Hierarchical overlapped tiling[C]//Proceedings of the Tenth International Symposium on Code Generation and Optimization.2012:207-218. [24]GROSSER T,COHEN A,KELLY P H J,et al.Split tiling for GPUs:automatic parallelization using trapezoidal tiles[C]//Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units.2013:24-31. [25]HENRETTY T,VERAS R,FRANCHETTI F,et al.A stencil compiler for short-vector simd architectures[C]//Proceedings of the 27th International ACM Conference on International Confe-rence on Supercomputing.2013:13-24. [26]STRZODKA R,SHAHEEN M,PAJAK D,et al.Cache accurate time skewing in iterative stencil computations[C]//2011 International Conference on Parallel Processing.IEEE,2011:571-581. [27]GROSSER T,COHEN A,HOLEWINSKI J,et al.Hybrid hexagonal/classical tiling for GPUs[C]//Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization.2014:66-75. [28]YUAN L,ZHANG Y,GUO P,et al.Tessellating stencils[C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.2017:1-13. |
[1] | TIAN Bing-chuan, TIAN Chen, ZHOU Yu-hang, CHEN Gui-hai, DOU Wan-chun. Reducing Head-of-Line Blocking on Network in Hadoop Clusters [J]. Computer Science, 2022, 49(3): 11-22. |
[2] | ZHAO Hui-qun, WU Kai-feng. Big Data Valuation Algorithm [J]. Computer Science, 2020, 47(9): 110-116. |
[3] | TAO Yang,JI Rui-juan,YANG Li,WANG Jin. Study on Dynamic Priority Admission Control Algorithm in Heterogeneous Wireless Networks [J]. Computer Science, 2020, 47(3): 242-247. |
[4] | CHEN Dai-bin and YANG Xiao-mei. Block-coded Video Deblocking Based on Low-rank Tensor Recovery [J]. Computer Science, 2016, 43(9): 280-283. |
[5] | HU Bo and TAN Liang. Optimization of Communication Performance of RPC Client in HBase Architecture [J]. Computer Science, 2016, 43(4): 97-101. |
[6] | WANG Yue, ZHOU Cheng, XIONG Cheng-yi and SHU Zhen-yu. Enhanced Block Compressed Sensing of Images Based on Total Variation Using Texture Information [J]. Computer Science, 2016, 43(2): 307-310. |
[7] | PANG Ren-bo, ZHANG Yun-quan, TAN Guang-ming, XU Jian-liang, JIA Hai-peng and XIE Qing-chun. Parallelization of Hydrostatic Numerical Forecasting Model of Marginal Sea [J]. Computer Science, 2016, 43(1): 14-17. |
[8] | FAN Wen-chao, LI Xiao-yu, WEI Kai and CHEN Xing-lin. Moving Target Detection Based on Improved Gaussian Mixture Model [J]. Computer Science, 2015, 42(5): 286-288. |
[9] | WU Gui-ming, WANG Miao, XIE Xiang-hui, DOU Yong and GUO Song. Sparse Matrix Blocking Method for Custom Architecture [J]. Computer Science, 2015, 42(11): 63-64. |
[10] | TAN Ming-chao,DIAO Xing-chun and CAO Jian-jun. Survey on Entity Resolution [J]. Computer Science, 2014, 41(4): 9-12. |
[11] | TIAN Hong-jin and ZHAN Yin-wei. Moving Object Detection Based on Adaptive Image Blocking and SSIM [J]. Computer Science, 2014, 41(2): 119-122. |
[12] | ZHANG Qi-liang and CHEN Yong-sheng. Bi-directional Blocking Job Shop Model and Particle Swarm Optimization Algorithm for Train Scheduling Problem on Single-track Lines [J]. Computer Science, 2013, 40(12): 276-281. |
[13] | . Communication Optimization Algorithm Using Reordering Transformation and Loop Distribution [J]. Computer Science, 2012, 39(9): 296-301. |
[14] | . Pipeline Hardware Design and Optimization for H.264 Deblocking Filter [J]. Computer Science, 2011, 38(12): 288-292. |
[15] | WANG Xin,LU Zhi-bo. Locate Tampered Image Region Automatically Based on Inconsistency of ,JPEG Blocking Artifacts [J]. Computer Science, 2010, 37(2): 269-273. |
|