Computer Science ›› 2021, Vol. 48 ›› Issue (6): 10-18.doi: 10.11896/jsjkx.200700059
• Computer Architecture • Previous Articles Next Articles
ZHU Yu, PANG Jian-min, XU Jin-long, TAO Xiao-han, WANG Jun
CLC Number:
[1]RIVERA G,TSENG C W.Tiling optimizations for 3D scientific computations [C]//Proceedings of the 2000 ACM/IEEE Confe-rence on Supercomputing(SC’00).Piscataway:IEEE,2000:32. [2]YUAN L,ZHANG Y,GUO P,et al.Tessellating stencils [C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.New York:ACM,2017:1-13. [3]WIJESINGHE T,SENEVIRATHNE K,SIRIWARDHANA C,et al.Parameterized Diamond Tiling for Parallelizing stencil computations [C]//2017 Moratuwa Engineering Research Conference (MERCon).Piscataway:IEEE,2017:99-104. [4]GUO P,YUAN L,ZHANG Y,et al. Parallel Stencil Algorithm Based on Tessellating [J].Journal of Frontiers of Computer Science and Technology,2019,13(2):181-194. [5]MURANUSHI T,MAKINO J.Optimal Temporal Blocking for Stencil Computation [J].Procedia Computer Science,2015,51:1303-1312. [6]WELLEIN G,HAGER G,ZEISER T,et al.Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization [C]//2009 33rd Annual IEEE International Computer Software and Applications Conference.Piscataway:IEEE,2009:579-586. [7]FRIGO M,STRUMPEN V.Cache oblivious stencil computa-tions [C]//Proceedings of the 19th Annual International Confe-rence on Supercomputing.New York:ACM,2005:361-366. [8]KRISHNAMOORTHY S,BASKARAN M,BONDHUGULAU,et al.Effective automatic parallelization of stencil computations [C]//Programming Language Design and Implementation.New York:ACM,2007:235-244. [9]NGUYEN A,SATISH N,CHHUGANI J,et al.3.5-D blocking optimization for stencil computations on modern CPUs and GPUs [C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing,Networking,Sto-rage and Analysis(SC’10).Piscataway:IEEE,2010:1-13. [10]RAWAT P S,HONG C,RAVISHANKAR M,et al.Effectiveresource management for enhancing performance of 2D and 3D stencils on GPUs [C]//Proceedings of the 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit(GPGPU ’16).New York:ACM,2016:92-102. [11]SZUSTAK L,ROJEK K,OLAS T,et al. Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor [J].Scientific Programming,2015(5):1-14. [12]WILLIAMS S,SHALF J,OLIKER L,et al. Scientific computing kernels on the cell processor [J].International Journal of Parallel Programming,2007,35(3):263-298. [13]AO Y,YANG C,WANG X,et al.26 pflops stencil computations for atmospheric modeling on sunway taihulight [C]//2017 IEEE International Parallel and Distributed Processing Sympo-sium (IPDPS).Piscataway:IEEE,2017:535-544. [14]MENG J,SKADRON K.Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs [C]//International Conference on Supercomputing.New York:ACM,2009:256-265. [15]RAWAT P S,VAIDYA M,SUKUMARAN-RAJAM A,et al. Domain-Specific Optimization and Generation of High-Perfor-mance GPU Code for Stencil Computations [J].Proceedings of the IEEE,2018,106(11):1902-1920 . [16]DATTA K,MURPHY M,VOLKOV V,et al.Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures [C]//Proceedings of the 2008 ACM/IEEE Con-ference on Supercomputing(SC’08).Piscataway:IEEE,2008:1-12. [17]MATSUMURA K,ZOHOURI H R,WAHIB M,et al.AN5D:automated stencil framework for high-degree temporal blocking on GPUs [C]//Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization.New York:ACM,2020 . [18]UNAT D,CAI X,BADEN S B.Mint:realizing CUDA perfor-mance in 3D stencil methods with annotated C [C]//International Conference on Supercomputing.New York:ACM,2011:214-224. [19]BONDHUGULA U,BANDISHTI V,PANANILATH I.Dia-mond Tiling:Tiling Techniques to Maximize Parallelism for Stencil Computations [J].IEEE Transactions on Parallel and Distributed Systems,2017,28(5):1285-1298. [20]WU Q,NI Y,HUANG X.Regional Ocean Model Parallel Optimization in “Sunway TaihuLight” [J].Journal of Computer Research and Development,2019,56(7):1556-1566. [21]LV X J,LIU Z,CHU X S,et al. Extreme-scale SimulationBased LBM Computing Fluid Dynamics Simulations [J].Computer Science,2020,47(4):13-17. [22]SAIDI S,TENDULKAR P,LEPLEY T,et al. Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms [J].Microprocessors Microsystems,2013,37(8):848-857. [23]XU Z,LIN J,MATSUOKA S.Benchmarking sw26010 many-core processor [C]//2017 IEEE International Parallel and Distributed Processing Symposium Workshops(IPDPSW).Piscata-way:IEEE,2017:743-752. |
[1] | HE Ya-ru, PANG Jian-min, XU Jin-long, ZHU Yu, TAO Xiao-han. Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform [J]. Computer Science, 2021, 48(6): 34-40. |
[2] | LIU Xiao-nan, JING Li-na, WANG Li-xin, WANG Mei-ling. Large-scale Quantum Fourier Transform Simulation Based on SW26010 [J]. Computer Science, 2020, 47(8): 93-97. |
[3] | YUAN Xin-hui, LIN Rong-fen, WEI Di, YIN Wan-wang, XU Jin-xiu. Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010 [J]. Computer Science, 2020, 47(8): 98-104. |
[4] | LV Xiao-jing, LIU Zhao, CHU Xue-sen, SHI Shu-peng, MENG Hong-song, HUANG Zhen-chun. Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations [J]. Computer Science, 2020, 47(4): 13-17. |
[5] | NI Hong, LIU Xin. Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids [J]. Computer Science, 2019, 46(6A): 518-522. |
[6] | TAO Xiao-han, PANG Jian-min, GAO Wei, WANG Qi, YAO Jin-yang. Performance Optimization of FT Program Based on SW26010 Processor [J]. Computer Science, 2019, 46(4): 321-328. |
[7] | YIN Meng-jia, XU Xian-bin, XIONG Zeng-gang and ZHANG Tao. Quantitative Performance Analysis Model of Matrix Multiplication Based on GPU [J]. Computer Science, 2015, 42(12): 13-17. |
[8] | JIN Gang,WANG Lei,WANG Zhi-ying. Performance Evaluation Method for Asynchronous Circuit Based on Static Data Flow Structure [J]. Computer Science, 2009, 36(12): 231-234. |
|