Computer Science ›› 2019, Vol. 46 ›› Issue (2): 306-314.doi: 10.11896/j.issn.1002-137X.2019.02.047
• Interdiscipline & Frontier • Previous Articles Next Articles
ZHANG Yi-ran1, CHEN Long2, AN Xiang-zhe2, YAN Shen-gen3
CLC Number:
[1]HARRIS M.Optimizing parallel reduction in CUDA[OL].http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf. [2]CATANZARO B.Opencl optimization case study:Simple reductions[OL].https://developer.amd.com/article/opencl-optimization-case-study-simple-reductions. [3]YAN S G,YUNQUAN Z,GUOPING L.Reduction algorithm optimization based on the OpenCL[J].Journal of Software,2011,22(S2):163-171. [4]OpenCV Library Open Source Project[OL].http://opencv. org. [5]BRADSKI G,KAEHLER A.Learning OpenCV:Computer vision with the OpenCV library[M].O’Reilly Media,Inc.,2008. [6]RAWAT P S,RASTELLO F,SUKUMARAN-RAJAM A,et al.Register optimizations for stencils on GPUs[C]∥Procee-dings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18).New York,NY,USA,2018. [7]CECKA C.Low communication FMM-accelerated FFT on GPUs [C]∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis (SC’17).ACM,New York,NY,USA,2017. [8]ZHANG X X,TAN G M,XUE S B,et al.Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning[C]∥Proceedings of the 22nd ACM SIGPLAN Sympo-sium on Principles and Practice of Parallel Programming (PPoPP’17).New York,NY,USA,2017. [9]LI B,SUN J,ANNAVARAM M,et al.Elastic-Cache:GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management[C]∥Proceedings of 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).Orlando,FL,2017. [10]KIRK D B,WEN-MEI W H.Programming massively parallel processors:a hands-on approach[M].Newnes,2012. [11]WU E H,LIU Y Q.The General Computing based on GPU[J].Proceeding of Journal of Computer-Aided Design and Computer Graphics,2004,16(5):601-612. [12]OWENS J D,HOUSTON M,LUEBKE D.et al.GPU Computing[J].Proceedings of the IEEE,2008,96(5):879-899. [13]OWENS J D,LUEBKE D,GOVINDARAJU N,et al.A survey of general-purpose computation on graphics hardware[J].Computer Graphics Forum:Journal of the European Association for Computer Graphics,2007,26(1):80-113. [14]Compute unified device architecture programming guide[OL].http://www.nvidia.com/object/cuda_home.html. [15]Nvidia cuda C.programming guide[OL].https://docs.nvidia.com/cuda/cuda-c-programming-guide. [16]NVIDIA cuda C.Best practices guide[OL].https://docs. nvidia.com/cuda/cuda-c-best-practices-guide. [17]The opencl 1.2 speciffication[M].Khronos OpenCL Working Group,2012. [18]STONE J E,GOHARA D,SHI G.OpenCL:A parallel programming standard for heterogeneous computing systems[J].Computing in Science & Engineering,2010,12(1-3):66-73. [19]GASTER B,HOWES L,KAELI D R,et al.Heterogeneous Computing with OpenCL:Revised OpenCL 1[M].Newnes,2012. [20]ZOU H,WANG H Q,HUANG Y.GPU Accelerated Rainbow Tables Analysis of MD5 Hash Password[J].Journal of Chongqing University of Technology(Natural Science),2013,27(7):61-66.(in Chinese) 邹航,王华秋,黄勇.基于GPU加速的彩虹表分析MD5哈希密码[J].重庆理工大学学报(自然科学),2013,27(7):61-66. |
[1] | GUO Zheng-wei, FU Ze-wen, LI Ning, BAI Lan. Study on Acceleration Algorithm for Raw Data Simulation of High Resolution Squint Spotlight SAR [J]. Computer Science, 2022, 49(8): 178-183. |
[2] | ZONG Di-di, XIE Yi-wu. Model Medial Axis Generation Method Based on Normal Iteration [J]. Computer Science, 2022, 49(6A): 764-770. |
[3] | WANG Jin, LIU Jiang. GPU-based Parallel DILU Preconditioning Technique [J]. Computer Science, 2022, 49(6): 108-118. |
[4] | LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204. |
[5] | QI Yan-rong, ZHOU Xia-bing, LI Bin, ZHOU Qing-lei. FPGA-based CNN Image Recognition Acceleration and Optimization [J]. Computer Science, 2021, 48(4): 205-212. |
[6] | HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28. |
[7] | LIU Shi-fang, ZHAO Yong-hua, YU Tian-yu, HUANG Rong-feng. Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster [J]. Computer Science, 2020, 47(4): 6-12. |
[8] | ZUO Xian-yu, ZHANG Zhe, SU Yue-han, LIU Yang, GE Qiang, TIAN Jun-feng. Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model [J]. Computer Science, 2020, 47(4): 25-29. |
[9] | KANG Lin-yao, TANG Bing, XIA Yan-min, ZHANG Li. GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm [J]. Computer Science, 2019, 46(8): 106-110. |
[10] | LU Xian-hua, WANG Hong-jun. Design of Distributed News Clustering System Based on Big Data Computing Framework [J]. Computer Science, 2019, 46(11A): 220-223. |
[11] | ZHU Chao, WU Su-ping. Parallel Harris Feature Point Detection Algorithm [J]. Computer Science, 2019, 46(11A): 289-293. |
[12] | SU Qing-hua, FU Jing-chao, GU Han, ZHANG Shan-shan, LI Yi-fei, JIANG Fang-zhou, BAI Han-lin, ZHAO Di. Parallel Algorithm Design for Assisted Diagnosis of Prostate Cancer [J]. Computer Science, 2019, 46(11A): 524-527. |
[13] | LIU Yu-cheng, Richard·DING, ZHANG Ying-chao. Research on Pan-real-time Problem of Medical Detection Based on BPNNs Recognition Algorithm [J]. Computer Science, 2018, 45(6): 301-307. |
[14] | LIAO Xing, YUAN Jing-ling and CHEN Min-cheng. Parallel PSO Container Packing Algorithm with Adaptive Weight [J]. Computer Science, 2018, 45(3): 231-234. |
[15] | WU Yu, YAN Guang-hui, WANG Ya-fei, MA Qing-qing, LIU Yu-xuan. Parallel CP Tensor Decomposition Algorithm Combining with GPU Technology [J]. Computer Science, 2018, 45(11): 298-303. |
|