异构混合并行计算综述

doi:10.11896/jsjkx.200600045

摘要/Abstract

摘要： 随着人工智能和大数据等计算机应用对算力需求的迅猛增长以及应用场景的多样化, 异构混合并行计算成为了研究的重点。文中介绍了当前主要的异构计算机体系结构, 包括CPU/协处理器、CPU/众核处理器、CPU/ASCI和CPU/FPGA等;简述了异构混合并行编程模型随着各类异构混合结构的发展而做出的改变, 异构混合并行编程模型可以是对现有的一种语言进行改造和重新实现, 或者是现有异构编程语言的扩展, 或者是使用指导性语句异构编程, 或者是容器模式协同编程。分析表明, 异构混合并行计算架构会进一步加强对AI的支持, 同时也会增强软件的通用性。文中还回顾了异构混合并行计算中的关键技术, 包括异构处理器之间的并行任务划分、任务映射、数据通信、数据访问, 以及异构协同的并行同步和异构资源的流水线并行等。根据这些关键技术, 文中指出了异构混合并行计算面临的挑战, 如编程困难、移植困难、数据通信开销大、数据访问复杂、并行控制复杂以及资源负载不均衡等。最后分析了异构混合并行计算面临的挑战, 指出目前关键的核心技术需要从通用与AI专用异构计算的融合、异构架构的无缝移植、统一编程模型、存算一体化、智能化任务划分和分配等方面进行突破。

关键词: 并行计算, 异构并行编程, 异构混合编程, 异构计算, 异构架构

Abstract: With the rapid increase in computing power demand of computer applications such as artificial intelligence and big data and the diversification of application scenarios, the research of heterogeneous hybrid parallel computing has become the focus of research.This paper introduces the current main heterogeneous computer architecture, including CPU/coprocessor, CPU/many-core processor, CPU/ASCI and CPU/FPGA heterogeneous architectures.The changes made by the heterogeneous hybrid parallel programming model with the development of various heterogeneous hybrid structures are briefly described, which is a transformation and re-implementation of an existing language, or an extension of an existing heterogeneous programming language, or heterogeneous programming using instructional statements, or container pattern collaborative programming.The analysis shows that the heterogeneous hybrid parallel computing architecture will further strengthen the support for AI, and will also enhance the versatility of the software.This paper reviewes the key technologies in heterogeneous hybrid parallel computing, including parallel task partitioning, task mapping, data communication, data access between heterogeneous processors, parallel synchronization of heterogeneous collaboration, and pipeline parallelism of heterogeneous resources.Based on these key technologies, this paper points out the challenges faced by heterogeneous hybrid parallel computing, such as programming difficulties, portability difficulties, large data communication overhead, complex data access, complex parallel control, and uneven resource load.The challenges faced by heterogeneous hybrid parallel computing are analyzed, and this paper concludes that the current key core technologies need to be integrated from general-purpose and AI-specific heterogeneous computing, seamless migration of heterogeneous architectures, unified programming model, integration of storage and computing, and intelligence breakthroughs in task division and allocation

Key words: Heterogeneous architecture, Heterogeneous computing, Heterogeneous hybrid programming, Heterogeneous parallel programming, Parallel computing

中图分类号:

TP301

阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘. 异构混合并行计算综述[J]. 计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045

YANG Wang-dong, WANG Hao-tian, ZHANG Yu-feng, LIN Sheng-le, CAI Qin-yun. Survey of Heterogeneous Hybrid Parallel Computing[J]. Computer Science, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045

参考文献

[1] GELADO I, KELM J H, RYOO S, et al.CUBA:an architecture for efficient CPU/coprocessor data communication∥Proceedings of the 22nd Annual International Conference on Supercomputing.2008:299-308.
[2] ROWEN C, JOHNSON M, RIES P.The MIPS R3010 floating-point coprocessor.IEEE Micro, 1988, 8(3):53-62.
[3] BREY B B.The Intel microprocessors:8086/8088, 80186/80188, 80286, 80386, 80486, Pentium, Pentium Pro processor, Pentium II, Pentium III, Pentium 4, and Core2 with 64-bit extensions:architecture, programming, and interfacing.Pearson Education India, 2009.
HINDS C N.An enhanced floating point coprocessor for embedded signal processing and graphics applications∥Conference Record of the Thirty-Third Asilomar Conference onSignals, Systems, and Computers (Cat.No.CH37020).IEEE, 1999, 1:147-151.
[5] SOHN J H, WOO J H, YOO J, et al.Design and test of fixed-point multimedia co-processor for mobile applications∥Proceedings of the Design Automation & Test in Europe Confe-rence.IEEE, 2006, 2:1-5.
Outline of the Development of the Post-K computer[EB/OL].https://www.r-ccs.riken.jp/en/postk/project/outline
[7] BARBALACE A, RAVINDRAN B, KATZ D.Popcorn:areplicated-kernel OS based on Linux∥Proceedings of the Linux Symposium.Ottawa, Canada, 2014.
[8] MLLER M, SPINCZYK O.MxKernel:Rethinking OperatingSystem Architecture for Many-core Hardware∥9th Workshop on Systemsfor Multi-core and Heterogenous Architectures.2019.
[9] AGGARWAL K, BONDHUGULA U.Optimizing the linear fascicle evaluation algorithm for many-core systems∥Procee-dings of the ACM International Conference on Supercomputing.2019:425-437.
HUHN W P, LANGE B, YU V W, et al.GPGPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions.arXiv:1912.06636.
[11]GUBNER T, TOM D, LANG H, et al.Fluid Co-processing:GPU Bloom-filters for CPU Joins∥Proceedings of the 15th International Workshop on Data Management on New Hardware.2019:1-10.
[12]NIE J, ZHANG C, ZOU D, et al.Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture∥Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference.2019:6-10.
[13]KHAIRY M, WASSAL A G, ZAHRAN M.A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity.Journal of Parallel and Distributed Computing, 2019, 127:65-88.
[14]BARAJAS C, GOBBERT M K, KROIZ G C, et al.Challenges and opportunities for the simulation of calcium waves onmodern-multi-core and many-core parallel computing platforms.International Journal for Numerical Methods in Biomedical Engineering.https://doi.org/10.1002/cnm.3244.
[15]SODANI A, GRAMUNT R, CORBAL J, et al.Knights landing:Second-generation intel xeon phi product.Ieee micro, 2016, 36(2):34-46.
[16]MAGAKI I, KHAZRAEE M, GUTIERREZ L V, et al.Asicclouds:Specializing the datacenter∥2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).IEEE, 2016:178-190.
[17]PENG Y, ZHU W, ZHAO Y.Cross-media analysis and reasoning:advances and directions[J].Frontiers of Information Technology & Electronic Engineering, 2017, 18(1):44-57.
[18]LI B, GU J, JIANG W.Artificial Intelligence (AI)Chip Technology Review∥2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI).IEEE, 2019:114-117.
[19]CHEN T, DU Z, SUN N, et al.Diannao:A small-footprint high-throughput accelerator for ubiquitous machine-learning.ACM SIGARCH Computer Architecture News, 2014, 42(1):269-284.
[20]MU R, ZENG X.A Review of Deep Learning Research.TIIS, 2019, 13(4):1738-1764.
[21]OVTCHAROV K, RUWASE O, KIM J Y, et al.Toward accelerating deep learning at scale using specialized hardware in the datacenter∥2015 IEEE Hot Chips 27 Symposium (HCS).IEEE Computer Society, 2015:1-38.
[22]HU L J, CHEN N G, LI J, et al.FPGA Heterogeneous Computing Platform and Its Application.Electric Power Information and Communication Technology, 2016, 14(7):6-11.
[23]STROMME A, CARLSON R, NEWHALL T.Chestnut:A Gpu programming language for non-experts∥Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores.2012:156-167.
[24]AUERBACH J, BACON D F, CHENG P, et al.Lime:a Java-compatible and synthesizable language for heterogeneous architectures∥Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications.2010:89-108.
[25]LINDERMAN M D, COLLINS J D, WANG H, et al.Merge:a programming model for heterogeneous multi-core systems.ACM SIGOPS Operating Systems Review, 2008, 42(2):287-296.
[26]CUDA.https://developer.nvidia.com/cuda-zone.
[27]HAN T D, ABDELRAHMAN T S.hiCUDA:a high-level directive-based language for GPU programming∥Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units.2009:52-61.
[28]BAKHTIN V A, KRYUKOV V A, CHETVERUSHKIN B N, et al.Extension of the DVM parallel programming model for clusters with heterogeneous nodes.Doklady Mathematics, 2011, 84(3):879-881.
[29]LEE S, VETTER J S.Moving Heterogeneous GPU Computing into the Mainstream with Directive-Based, High-Level Programming Models (Position Paper)∥DOE Exascale Research Conference.2012.
[30]The OpenCL standard[OL].https://www.khron os.org/opencl/.
[31]RASCH A, BIGGE J, WRODARCZYK M, et al.dOCAL:high-level distributed programming with OpenCL and CUDA.The Journal of Supercomputing, 2020, 76:5117-5138.
[32]WU S, DONG X, ZHANG X, et al.NoT:a high-level no-threading parallel programming method for heterogeneous systems.The Journal of Supercomputing, 2019, 75(7):3810-3841.
[33]PANDIT P, GOVINDARAJAN R.Fluidic kernels:Cooperativeexecution of opencl programs on multiple heterogeneous devices∥Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization.2014:273-283.
[34]C++ Accelerated Massive Parallelism[OL].https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/hh265137(v=vs.110)?redirectedfrom=MSDN.
[35]VIAS M, BOZKUS Z, FRAGUELA B B.Exploiting heterogeneous parallelism with the Heterogeneous Programming Library.Journal of Parallel and Distributed Computing, 2013, 73(12):1627-1638.
DE SUPINSKI B R, SCOGLAND T R W, DURAN A, et al.The ongoing evolution of openmp.Proceedings of the IEEE, 2018, 106(11):2004-2019.
[37]WANG X, LEIDEL J D, CHEN Y.OpenMP Memkind:An Extension for Heterogeneous Physical Memories∥2017 46th International Conference on Parallel Processing Workshops (ICPPW).IEEE, 2017:220-227.
[38]FUMERO J J, DE SANDE F.accull:An user-directed approach to heterogeneous programming∥2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.IEEE, 2012:654-661.
[39]LEE S, VETTER J S.OpenARC:open accelerator research compiler for directive-based, efficient heterogeneous computing∥Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing.2014:115-120.
[40]LEE S, VETTER J S.OpenARC:extensible OpenACC compiler framework for directive-based accelerator programming study∥2014 First Workshop on Accelerator Programming Using Directives.IEEE, 2014:1-11.
[41]ZHANG J, LU X, CHU C H, et al.C-GDR:High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks∥2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE, 2019:242-251.
[42]CHEN Y W, HUNG S H, TU C H, et al.Virtual hadoop:Mapreduce over docker containers with an auto-scaling mechanism for heterogeneous environments∥Proceedings of the International Conference on Research in Adaptive and Convergent Systems.2016:201-206.
[43]MAO Y, OAK J, POMPILI A, et al.Draps:Dynamic and re-source-aware placement scheme for docker containers in a hetero-geneous cluster∥2017 IEEE 36th InternationalPerfor-mance Computing and Communications Conference (IPCCC).IEEE, 2017:1-8.
[44]YANG W, LI K, LI K.A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems.Journal of Parallel and Distributed Computing, 2017, 104:49-60.
[45]HOSSEINABADY M, NUNEZ-YANEZ J.Sparse Matrix-Dense Matrix Multiplication on Heterogeneous CPU+ FPGA Embedded System∥Proceedings of the 11th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures/9th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms.2020:1-6.
[46]KOBAYASHI R, FUJITA N, YAMAGUCHI Y, et al.GPU-FPGA Heterogeneous Computing with OpenCL-Enabled Direct Memory Access∥2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).IEEE, 2019:489-498.
[47]QUAN Z, WANG Z J, YE T, et al.Task Scheduling for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(5):1165-1182.
[48]PECCERILLO B, BARTOLINI S.Task-DAG Support in Single-Source PHAST Library:Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures∥Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores.2019:91-100.
[49]ALEBRAHIM S, AHMAD I.Task scheduling for heterogeneous computing systems.The Journal of Supercomputing, 2017, 73(6):2313-2338.
[50]KELEFOURAS V, DJEMAME K.Workflow Simulation Aware and Multi-Threading Effective Task Scheduling for Heterogeneous Computing∥2018 IEEE 25th International Conference on High Performance Computing (HiPC).IEEE, 2018:215-224.
[51]KUMAR N, MAYANK J, MONDAL A.Reliability aware Energy Optimized Scheduling of Non-preemptive Periodic Real-Time Tasks on Heterogeneous Multiprocessor System.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(4):871-885.
[52]CRUZ E H M, DIENER M, PILLA L L, et al.EagerMap:a task mapping algorithm to improve communication and load balancing in clusters of multicore systems.ACM Transactions on Parallel Computing (TOPC), 2019, 5(4):1-24.
[53]CRUZ E H M, DIENER M, PILLA L L, et al.An efficient algorithm for communication-based task mapping∥2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.IEEE, 2015:207-214.
[54]BOSCH J, VIDAL M, FILGUERAS A, et al.Breaking master-slave model between host and FPGAs∥Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2020:419-420.
[55]LI A, SONG S L, CHEN J, et al.Evaluating Modern GPU Interconnect:PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(1):94-110.
[56]SHUI C, YU X, YAN Y, et al.Revisiting linpack algorithm on large-scale CPU-GPU heterogeneous systems∥Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2020:411-412.
[57]LIANG L, ZHANG Q, SONG P, et al.Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method.Annals of Nuclear Energy, 2020, 135:106988.
[58]ZHANG J, JUNG M.An in-depth performance analysis of ma-ny-integrated core for communication efficient heterogeneous computing∥IFIP International Conference on Network and Parallel Computing.Cham:Springer, 2017:155-159.
[59]HU Y, YANG H, LUAN Z, et al.Massively scaling seismic processing on sunway taihulight supercomputer.IEEE Transactions on Parallel and Distributed Systems, 2019, 31(5):1194-1208.
[60]ZHENG T, NELLANS D, ZULFIQAR A, et al.Towards high performance paged memory for GPUs∥2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).IEEE, 2016:345-357.
[61]DAI H, LIN Z, LI C, et al.Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls∥2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).IEEE, 2018:208-220.
[62]GANGULY D, ZHANG Z, YANG J, et al.Interplay betweenhardware prefetcher and page eviction policy in CPU-GPU unified virtual memory∥Proceedings of the 46th International Symposium on Computer Architecture.2019:224-235.
[63]YU L, CHEN T, WU M, et al.Last level cache layout remapping for heterogeneous systems.Journal of Systems Architecture, 2018, 87:49-63.
[64]RAWAT P S, RASTELLO F, SUKUMARAN-RAJAM A, et al.Register optimizations for stencils on GPUs∥Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2018:168-182.
[65]NELSON J, PALMIERI R.Don’t Forget About Synchronization! A Case Study of K-Means on GPU∥Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores.2019:11-20.
[66]NIDAW B Y, OH M H, KIM Y W.Appropriate Synchronization Time Allocation for Distributed Heterogeneous Parallel Computing Systems.KSII Transactions on Internet & Information Systems, 2019, 13(11).
[67]OH C, ZHENG Z, SHEN X, et al.GOPipe:a granularity-oblivious programming framework for pipelined stencil executions on GPU∥Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming.2019:431-432.
[68]ZHANG P, FANG J, YANG C, et al.Optimizing Streaming Pa-rallelism on Heterogeneous Many-Core Architectures.IEEE Transactions on Parallel and Distributed Systems, 2020, 31(8):1878-1896.
[69]ZHENG Z, OH C, ZHAI J, et al.HiWayLib:A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations∥Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.2019:153-166.
[70]FANG X D.Research on CPU GPU heterogeneous parallel technology for large-scale scientific computing .Changsha:National University of Defense Technology, 2009.
[71]MICHALAKES J, VACHHARAJANI M.GPU Acceleration of NWP:Benchmark Kernels.http://www.inmm.ucar.edu/wrf/WG2/GPU.2009-02-25.
[72]SARKAR S, ALAVANI G.How Easy it is to Write Software for Heterogeneous Systems?.ACM SIGSOFT Software Engineering Notes, 2018, 42(4):1-7.
[73]AGULLO M, DEMMEL J, DONGARRA J, et al.Numericallinear algebra on emerging architectures:the PLASMA and MAGMA projects .Journal of Physics:Conference Series, 2009, 180(1):012037.
[74]LTAIEF H, TOMOV S, NATH R, et al.A Sealable High Performant Cholesky Factorization for Multicore with GPU Acce-lerators ∥International Conference on High Performance Computing for Computational Science.Berlin:Springer, 2010:93-101.
[75]LU F, SONG J, YIN F, et al.Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters.Computer Physics Communications, 2012, 183(6):1172-1181.
[76]STONE J E, GOHARA D, SHI G.OpenCL:A Parallel Pro-gramming Standard for Heterogeneous Computing Systems.Computing in Science & Engineering, 2010, 12(3):66-73.
[77]HAN T D, ABDELRAHMAN T S.hiCUDA:High-Level GPGPU Programming.IEEE Transactions on Parallel & Distri-buted Systems, 2011, 22(1):78-90.
[78]LIU X Y, ZHAO Q, NIE W.Research on Computer Image Video Processing from the Perspective of C++AMP.China Computer & Communication, 2018(21):29.
[79]XIAO S.Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing Systems.Blacksburg:Virginia Tech, 2013.
[80]LIUY, LU F, WANG L, et al.Research on Heterogeneous Parallel Programming Model.Journal of Software, 2014, 25(7):1459-1475.
[81]GODDEKE D, WOBKER H, STRZODKA R, et a1.Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU.International Journal of Computational Science and Engineering, 2009, 4(4):254-269.
[82]KALIDAS R, DAGA M, KROMMYDAS K, et al.On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems∥ IEEE International Parallel &Distributed Processing Symposium Workshop.IEEE, 2015.
[83]YUN K Y.Synthesis of asynchronous controllers for heterogeneous systems.Standford:Stanford University, 1994.
[84]NVIDIA Corporation.CUDA C programming guide(Version 5)[Z].2013.
[85]ANDRONIKOS T, CIORBA F M, RIAKIOTAKIS I, et al.Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems.Perfor-mance Evaluation, 2010, 67(12):1324-1339.
[86]ZHONG Z, RYCHKOV V, LASTOVETSKY A.Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications∥2012 IEEE International Conference on Cluster Computing.IEEE, 2012:191-199.
[87]YANG W, LI K, LI K.A hybrid computing methodof SpMV on CPU-GPU heterogeneous computing systems.Journal of Parallel and Distributed Computing, 2017, 104(JUN.):49-60.
[88]ZHONG Z, RYCHKOV V, LASTOVETSKY A.Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models.IEEE Transactions on Computers, 2015, 64(9):2506-2518.
[89]NEETESH K, PRAKASH V D.A Hybrid Heuristic for Load-Balanced Scheduling of Heterogeneous Workload on Heterogeneous Systems.The Computer Journal, 2019, 62(2):276-291.
[90]BARAGLIA R, FERRINI R, RITROVATO P.A static mapping heuristics to map parallel applications to heterogeneous computing systems.Concurrency & Computation Practice & Experience, 2005, 17(13):1579-1605.
[91]ITURRIAGA S, NESMACHNOW S, LUNA F, et al.A parallel local search in CPU/GPU for scheduling independent tasks on large heterogeneous computing systems.Journal of Supercomputing, 2015, 71(2):648-672.

YANG Wang-dong, doctor, professor.His main research interests include high performance computing and parallel computing.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

[1]	陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157
[2]	傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文. 一种面向构件化并行应用程序的性能骨架分析方法 Performance Skeleton Analysis Method Towards Component-based Parallel Applications 计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115
[3]	何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化 Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform 计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051
[4]	冯凯, 马鑫玉. (n,k)-冒泡排序网络的子网络可靠性 Subnetwork Reliability of (n,k)-bubble-sort Networks 计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139
[5]	胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立. 基于GPU加速的并行WMD算法 Parallel WMD Algorithm Based on GPU Acceleration 计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[6]	谢景明, 胡伟方, 韩林, 赵荣彩, 荆丽娜. 基于“嵩山”超级计算机系统的量子傅里叶变换模拟 Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System 计算机科学, 2021, 48(12): 36-42. https://doi.org/10.11896/jsjkx.201200023
[7]	马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术 Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data 计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
[8]	陈国良, 张玉杰. 并行计算学科发展历程 Development of Parallel Computing Subject 计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027
[9]	冯凯, 李婧. k元n方体的子网络可靠性研究 Study on Subnetwork Reliability of k-ary n-cubes 计算机科学, 2020, 47(7): 31-36. https://doi.org/10.11896/jsjkx.190700170
[10]	杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对 Streaming Parallel Text Proofreading Based on Spark Streaming 计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070
[11]	邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用 Application of Improved DBSCAN Algorithm on Spark Platform 计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071
[12]	徐传福,王曦,刘舒,陈世钊,林玉. 基于Python的大规模高性能LBM多相流模拟 Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python 计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009
[13]	徐磊, 陈荣亮, 蔡小川. 基于非结构化网格的高可扩展并行有限体积格子 Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid 计算机科学, 2019, 46(8): 84-88. https://doi.org/10.11896/j.issn.1002-137X.2019.08.013
[14]	舒娜,刘波,林伟伟,李鹏飞. 分布式机器学习平台与算法综述 Survey of Distributed Machine Learning Platforms and Algorithms 计算机科学, 2019, 46(3): 9-18. https://doi.org/10.11896/j.issn.1002-137X.2019.03.002
[15]	李炎, 马俊明, 安博, 曹东刚. 一个基于Web的轻量级大数据处理与可视化工具 Web Based Lightweight Tool for Big Data Processing and Visualization 计算机科学, 2018, 45(9): 60-64. https://doi.org/10.11896／j.issn.1002-137X.2018.09.008