Computer Science ›› 2020, Vol. 47 ›› Issue (8): 5-16.doi: 10.11896/jsjkx.200600045
Previous Articles Next Articles
YANG Wang-dong, WANG Hao-tian, ZHANG Yu-feng, LIN Sheng-le , CAI Qin-yun
CLC Number:
[1] GELADO I, KELM J H, RYOO S, et al.CUBA:an architecture for efficient CPU/coprocessor data communication∥Proceedings of the 22nd Annual International Conference on Supercomputing.2008:299-308. [2] ROWEN C, JOHNSON M, RIES P.The MIPS R3010 floating-point coprocessor.IEEE Micro, 1988, 8(3):53-62. [3] BREY B B.The Intel microprocessors:8086/8088, 80186/80188, 80286, 80386, 80486, Pentium, Pentium Pro processor, Pentium II, Pentium III, Pentium 4, and Core2 with 64-bit extensions:architecture, programming, and interfacing.Pearson Education India, 2009. HINDS C N.An enhanced floating point coprocessor for embedded signal processing and graphics applications∥Conference Record of the Thirty-Third Asilomar Conference onSignals, Systems, and Computers (Cat.No.CH37020).IEEE, 1999, 1:147-151. [5] SOHN J H, WOO J H, YOO J, et al.Design and test of fixed-point multimedia co-processor for mobile applications∥Proceedings of the Design Automation & Test in Europe Confe-rence.IEEE, 2006, 2:1-5. Outline of the Development of the Post-K computer[EB/OL].https://www.r-ccs.riken.jp/en/postk/project/outline [7] BARBALACE A, RAVINDRAN B, KATZ D.Popcorn:areplicated-kernel OS based on Linux∥Proceedings of the Linux Symposium.Ottawa, Canada, 2014. [8] MLLER M, SPINCZYK O.MxKernel:Rethinking OperatingSystem Architecture for Many-core Hardware∥9th Workshop on Systemsfor Multi-core and Heterogenous Architectures.2019. [9] AGGARWAL K, BONDHUGULA U.Optimizing the linear fascicle evaluation algorithm for many-core systems∥Procee-dings of the ACM International Conference on Supercomputing.2019:425-437. HUHN W P, LANGE B, YU V W, et al.GPGPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions.arXiv:1912.06636. [11]GUBNER T, TOM D, LANG H, et al.Fluid Co-processing:GPU Bloom-filters for CPU Joins∥Proceedings of the 15th International Workshop on Data Management on New Hardware.2019:1-10. [12]NIE J, ZHANG C, ZOU D, et al.Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture∥Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference.2019:6-10. [13]KHAIRY M, WASSAL A G, ZAHRAN M.A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity.Journal of Parallel and Distributed Computing, 2019, 127:65-88. [14]BARAJAS C, GOBBERT M K, KROIZ G C, et al.Challenges and opportunities for the simulation of calcium waves onmodern-multi-core and many-core parallel computing platforms.International Journal for Numerical Methods in Biomedical Engineering.https://doi.org/10.1002/cnm.3244. [15]SODANI A, GRAMUNT R, CORBAL J, et al.Knights landing:Second-generation intel xeon phi product.Ieee micro, 2016, 36(2):34-46. [16]MAGAKI I, KHAZRAEE M, GUTIERREZ L V, et al.Asicclouds:Specializing the datacenter∥2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).IEEE, 2016:178-190. [17]PENG Y, ZHU W, ZHAO Y.Cross-media analysis and reasoning:advances and directions[J].Frontiers of Information Technology & Electronic Engineering, 2017, 18(1):44-57. [18]LI B, GU J, JIANG W.Artificial Intelligence (AI)Chip Technology Review∥2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI).IEEE, 2019:114-117. [19]CHEN T, DU Z, SUN N, et al.Diannao:A small-footprint high-throughput accelerator for ubiquitous machine-learning.ACM SIGARCH Computer Architecture News, 2014, 42(1):269-284. [20]MU R, ZENG X.A Review of Deep Learning Research.TIIS, 2019, 13(4):1738-1764. [21]OVTCHAROV K, RUWASE O, KIM J Y, et al.Toward accelerating deep learning at scale using specialized hardware in the datacenter∥2015 IEEE Hot Chips 27 Symposium (HCS).IEEE Computer Society, 2015:1-38. [22]HU L J, CHEN N G, LI J, et al.FPGA Heterogeneous Computing Platform and Its Application.Electric Power Information and Communication Technology, 2016, 14(7):6-11. [23]STROMME A, CARLSON R, NEWHALL T.Chestnut:A Gpu programming language for non-experts∥Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores.2012:156-167. [24]AUERBACH J, BACON D F, CHENG P, et al.Lime:a Java-compatible and synthesizable language for heterogeneous architectures∥Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications.2010:89-108. [25]LINDERMAN M D, COLLINS J D, WANG H, et al.Merge:a programming model for heterogeneous multi-core systems.ACM SIGOPS Operating Systems Review, 2008, 42(2):287-296. [26]CUDA.https://developer.nvidia.com/cuda-zone. [27]HAN T D, ABDELRAHMAN T S.hiCUDA:a high-level directive-based language for GPU programming∥Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units.2009:52-61. [28]BAKHTIN V A, KRYUKOV V A, CHETVERUSHKIN B N, et al.Extension of the DVM parallel programming model for clusters with heterogeneous nodes.Doklady Mathematics, 2011, 84(3):879-881. [29]LEE S, VETTER J S.Moving Heterogeneous GPU Computing into the Mainstream with Directive-Based, High-Level Programming Models (Position Paper)∥DOE Exascale Research Conference.2012. [30]The OpenCL standard[OL].https://www.khron os.org/opencl/. [31]RASCH A, BIGGE J, WRODARCZYK M, et al.dOCAL:high-level distributed programming with OpenCL and CUDA.The Journal of Supercomputing, 2020, 76:5117-5138. [32]WU S, DONG X, ZHANG X, et al.NoT:a high-level no-threading parallel programming method for heterogeneous systems.The Journal of Supercomputing, 2019, 75(7):3810-3841. [33]PANDIT P, GOVINDARAJAN R.Fluidic kernels:Cooperativeexecution of opencl programs on multiple heterogeneous devices∥Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization.2014:273-283. [34]C++ Accelerated Massive Parallelism[OL].https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/hh265137(v=vs.110)?redirectedfrom=MSDN. [35]VIAS M, BOZKUS Z, FRAGUELA B B.Exploiting heterogeneous parallelism with the Heterogeneous Programming Library.Journal of Parallel and Distributed Computing, 2013, 73(12):1627-1638. DE SUPINSKI B R, SCOGLAND T R W, DURAN A, et al.The ongoing evolution of openmp.Proceedings of the IEEE, 2018, 106(11):2004-2019. [37]WANG X, LEIDEL J D, CHEN Y.OpenMP Memkind:An Extension for Heterogeneous Physical Memories∥2017 46th International Conference on Parallel Processing Workshops (ICPPW).IEEE, 2017:220-227. [38]FUMERO J J, DE SANDE F.accull:An user-directed approach to heterogeneous programming∥2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.IEEE, 2012:654-661. [39]LEE S, VETTER J S.OpenARC:open accelerator research compiler for directive-based, efficient heterogeneous computing∥Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing.2014:115-120. [40]LEE S, VETTER J S.OpenARC:extensible OpenACC compiler framework for directive-based accelerator programming study∥2014 First Workshop on Accelerator Programming Using Directives.IEEE, 2014:1-11. [41]ZHANG J, LU X, CHU C H, et al.C-GDR:High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks∥2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE, 2019:242-251. [42]CHEN Y W, HUNG S H, TU C H, et al.Virtual hadoop:Mapreduce over docker containers with an auto-scaling mechanism for heterogeneous environments∥Proceedings of the International Conference on Research in Adaptive and Convergent Systems.2016:201-206. [43]MAO Y, OAK J, POMPILI A, et al.Draps:Dynamic and re-source-aware placement scheme for docker containers in a hetero-geneous cluster∥2017 IEEE 36th InternationalPerfor-mance Computing and Communications Conference (IPCCC).IEEE, 2017:1-8. [44]YANG W, LI K, LI K.A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems.Journal of Parallel and Distributed Computing, 2017, 104:49-60. [45]HOSSEINABADY M, NUNEZ-YANEZ J.Sparse Matrix-Dense Matrix Multiplication on Heterogeneous CPU+ FPGA Embedded System∥Proceedings of the 11th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures/9th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms.2020:1-6. [46]KOBAYASHI R, FUJITA N, YAMAGUCHI Y, et al.GPU-FPGA Heterogeneous Computing with OpenCL-Enabled Direct Memory Access∥2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).IEEE, 2019:489-498. [47]QUAN Z, WANG Z J, YE T, et al.Task Scheduling for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(5):1165-1182. [48]PECCERILLO B, BARTOLINI S.Task-DAG Support in Single-Source PHAST Library:Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures∥Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores.2019:91-100. [49]ALEBRAHIM S, AHMAD I.Task scheduling for heterogeneous computing systems.The Journal of Supercomputing, 2017, 73(6):2313-2338. [50]KELEFOURAS V, DJEMAME K.Workflow Simulation Aware and Multi-Threading Effective Task Scheduling for Heterogeneous Computing∥2018 IEEE 25th International Conference on High Performance Computing (HiPC).IEEE, 2018:215-224. [51]KUMAR N, MAYANK J, MONDAL A.Reliability aware Energy Optimized Scheduling of Non-preemptive Periodic Real-Time Tasks on Heterogeneous Multiprocessor System.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(4):871-885. [52]CRUZ E H M, DIENER M, PILLA L L, et al.EagerMap:a task mapping algorithm to improve communication and load balancing in clusters of multicore systems.ACM Transactions on Parallel Computing (TOPC), 2019, 5(4):1-24. [53]CRUZ E H M, DIENER M, PILLA L L, et al.An efficient algorithm for communication-based task mapping∥2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.IEEE, 2015:207-214. [54]BOSCH J, VIDAL M, FILGUERAS A, et al.Breaking master-slave model between host and FPGAs∥Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2020:419-420. [55]LI A, SONG S L, CHEN J, et al.Evaluating Modern GPU Interconnect:PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(1):94-110. [56]SHUI C, YU X, YAN Y, et al.Revisiting linpack algorithm on large-scale CPU-GPU heterogeneous systems∥Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2020:411-412. [57]LIANG L, ZHANG Q, SONG P, et al.Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method.Annals of Nuclear Energy, 2020, 135:106988. [58]ZHANG J, JUNG M.An in-depth performance analysis of ma-ny-integrated core for communication efficient heterogeneous computing∥IFIP International Conference on Network and Parallel Computing.Cham:Springer, 2017:155-159. [59]HU Y, YANG H, LUAN Z, et al.Massively scaling seismic processing on sunway taihulight supercomputer.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(5):1194-1208. [60]ZHENG T, NELLANS D, ZULFIQAR A, et al.Towards high performance paged memory for GPUs∥2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).IEEE, 2016:345-357. [61]DAI H, LIN Z, LI C, et al.Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls∥2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).IEEE, 2018:208-220. [62]GANGULY D, ZHANG Z, YANG J, et al.Interplay betweenhardware prefetcher and page eviction policy in CPU-GPU unified virtual memory∥Proceedings of the 46th International Symposium on Computer Architecture.2019:224-235. [63]YU L, CHEN T, WU M, et al.Last level cache layout remapping for heterogeneous systems.Journal of Systems Architecture, 2018, 87:49-63. [64]RAWAT P S, RASTELLO F, SUKUMARAN-RAJAM A, et al.Register optimizations for stencils on GPUs∥Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2018:168-182. [65]NELSON J, PALMIERI R.Don’t Forget About Synchronization! A Case Study of K-Means on GPU∥Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores.2019:11-20. [66]NIDAW B Y, OH M H, KIM Y W.Appropriate Synchronization Time Allocation for Distributed Heterogeneous Parallel Computing Systems.KSII Transactions on Internet & Information Systems, 2019, 13(11). [67]OH C, ZHENG Z, SHEN X, et al.GOPipe:a granularity-oblivious programming framework for pipelined stencil executions on GPU∥Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming.2019:431-432. [68]ZHANG P, FANG J, YANG C, et al.Optimizing Streaming Pa-rallelism on Heterogeneous Many-Core Architectures.IEEE Transactions on Parallel and Distributed Systems, 2020, 31(8):1878-1896. [69]ZHENG Z, OH C, ZHAI J, et al.HiWayLib:A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations∥Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.2019:153-166. [70]FANG X D.Research on CPU GPU heterogeneous parallel technology for large-scale scientific computing .Changsha:National University of Defense Technology, 2009. [71]MICHALAKES J, VACHHARAJANI M.GPU Acceleration of NWP:Benchmark Kernels.http://www.inmm.ucar.edu/wrf/WG2/GPU.2009-02-25. [72]SARKAR S, ALAVANI G.How Easy it is to Write Software for Heterogeneous Systems?.ACM SIGSOFT Software Engineering Notes, 2018, 42(4):1-7. [73]AGULLO M, DEMMEL J, DONGARRA J, et al.Numericallinear algebra on emerging architectures:the PLASMA and MAGMA projects .Journal of Physics:Conference Series, 2009, 180(1):012037. [74]LTAIEF H, TOMOV S, NATH R, et al.A Sealable High Performant Cholesky Factorization for Multicore with GPU Acce-lerators ∥International Conference on High Performance Computing for Computational Science.Berlin:Springer, 2010:93-101. [75]LU F, SONG J, YIN F, et al.Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters.Computer Physics Communications, 2012, 183(6):1172-1181. [76]STONE J E, GOHARA D, SHI G.OpenCL:A Parallel Pro-gramming Standard for Heterogeneous Computing Systems.Computing in Science & Engineering, 2010, 12(3):66-73. [77]HAN T D, ABDELRAHMAN T S.hiCUDA:High-Level GPGPU Programming.IEEE Transactions on Parallel & Distri-buted Systems, 2011, 22(1):78-90. [78]LIU X Y, ZHAO Q, NIE W.Research on Computer Image Video Processing from the Perspective of C++AMP.China Computer & Communication, 2018(21):29. [79]XIAO S.Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing Systems.Blacksburg:Virginia Tech, 2013. [80]LIUY, LU F, WANG L, et al.Research on Heterogeneous Parallel Programming Model.Journal of Software, 2014, 25(7):1459-1475. [81]GODDEKE D, WOBKER H, STRZODKA R, et a1.Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU.International Journal of Computational Science and Engineering, 2009, 4(4):254-269. [82]KALIDAS R, DAGA M, KROMMYDAS K, et al.On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems∥ IEEE International Parallel &Distributed Processing Symposium Workshop.IEEE, 2015. [83]YUN K Y.Synthesis of asynchronous controllers for heterogeneous systems.Standford:Stanford University, 1994. [84]NVIDIA Corporation.CUDA C programming guide(Version 5)[Z].2013. [85]ANDRONIKOS T, CIORBA F M, RIAKIOTAKIS I, et al.Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems.Perfor-mance Evaluation, 2010, 67(12):1324-1339. [86]ZHONG Z, RYCHKOV V, LASTOVETSKY A.Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications∥2012 IEEE International Conference on Cluster Computing.IEEE, 2012:191-199. [87]YANG W, LI K, LI K.A hybrid computing methodof SpMV on CPU-GPU heterogeneous computing systems.Journal of Parallel and Distributed Computing, 2017, 104(JUN.):49-60. [88]ZHONG Z, RYCHKOV V, LASTOVETSKY A.Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models.IEEE Transactions on Computers, 2015, 64(9):2506-2518. [89]NEETESH K, PRAKASH V D.A Hybrid Heuristic for Load-Balanced Scheduling of Heterogeneous Workload on Heterogeneous Systems.The Computer Journal, 2019, 62(2):276-291. [90]BARAGLIA R, FERRINI R, RITROVATO P.A static mapping heuristics to map parallel applications to heterogeneous computing systems.Concurrency & Computation Practice & Experience, 2005, 17(13):1579-1605. [91]ITURRIAGA S, NESMACHNOW S, LUNA F, et al.A parallel local search in CPU/GPU for scheduling independent tasks on large heterogeneous computing systems.Journal of Supercomputing, 2015, 71(2):648-672. YANG Wang-dong, doctor, professor.His main research interests include high performance computing and parallel computing. |
[1] | CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107. |
[2] | FU Tian-hao, TIAN Hong-yun, JIN Yu-yang, YANG Zhang, ZHAI Ji-dong, WU Lin-ping, XU Xiao-wen. Performance Skeleton Analysis Method Towards Component-based Parallel Applications [J]. Computer Science, 2021, 48(6): 1-9. |
[3] | HE Ya-ru, PANG Jian-min, XU Jin-long, ZHU Yu, TAO Xiao-han. Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform [J]. Computer Science, 2021, 48(6): 34-40. |
[4] | LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204. |
[5] | HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28. |
[6] | XIE Jing-ming, HU Wei-fang, HAN Lin, ZHAO Rong-cai, JING Li-na. Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System [J]. Computer Science, 2021, 48(12): 36-42. |
[7] | MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122. |
[8] | CHEN Guo-liang, ZHANG Yu-jie, . Development of Parallel Computing Subject [J]. Computer Science, 2020, 47(8): 1-4. |
[9] | ZHANG Long-xin, ZHOU Li-qian, WEN Hong, XIAO Man-sheng, DENG Xiao-jun. Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems [J]. Computer Science, 2020, 47(8): 112-118. |
[10] | YANG Zong-lin, LI Tian-rui, LIU Sheng-jiu, YIN Cheng-feng, JIA Zhen, ZHU Jie. Streaming Parallel Text Proofreading Based on Spark Streaming [J]. Computer Science, 2020, 47(4): 36-41. |
[11] | DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429. |
[12] | JIANG Ze-tao, XU Juan-juan. Efficient Heterogeneous Cross-domain Authentication Scheme Based on Proxy Blind Signature in Cloud Environment [J]. Computer Science, 2020, 47(11): 60-67. |
[13] | XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23. |
[14] | XU Lei, CHEN Rong-liang, CAI Xiao-chuan. Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid [J]. Computer Science, 2019, 46(8): 84-88. |
[15] | SHU Na,LIU Bo,LIN Wei-wei,LI Peng-fei. Survey of Distributed Machine Learning Platforms and Algorithms [J]. Computer Science, 2019, 46(3): 9-18. |
|