Computer Science ›› 2020, Vol. 47 ›› Issue (1): 7-16.doi: 10.11896/jsjkx.181202409
• Computer Architecture • Previous Articles Next Articles
YUAN Liang1,ZHANG Yun-quan1,BAI Xue-rui2,ZHANG Guang-ting1
CLC Number:
[1]HILL M D,MARTY M R.Amdahl’s Law in the Multicore Era[J].IEEE Computer,2008,41(7):33-38. [2]Message Passing Interface Forum.MPI:A Message-Passing Interface Standard(Version 2.1)[S].High-Performance Computing Center Stuttgart,2008. [3]OpenMP Standards Board.OpenMP Application Program Interface[OL]. [4]ZHANG Y Q,CHEN G L,SUN G Z,et al.Models of parallel computation:a survey and classification[J].Frontiers of Computer Science in China,2007,1(2):156-165. [5]ZHANG Y Q.DRAM(h):A Parallel Computation Model for High Performance Numerical Computing[J].Chinese Journal of Computers,2003,26(12):1660-1670. [6]FATAHALIAN K,KNIGHT T J,HOUSTON M,et al.Sequoia: programming the memory hierarchy[C]∥Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC).IEEE,2006:11-17. [7]KNIGHT T,PARK J,REN M,et al.Compilation for explicitly managed memory hierarchies[C]∥Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP).ACM,2007:14-17. [8]HOUSTON M,PARK J,REN M,et al.A portable runtime interface for multi-level memory hierarchies[C]∥Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).ACM,2008:20-23. [9]IEEE POSIX P1003.4a:Threads Extension for Portable Opera- ting Systems[M].Piscataway,NJ:IEEE Press,1994. [10]High Performance Fortran Forum.High Performance Fortran Language Specification [OL]. [11]NUMRICH R,REID J.Co-array Fortran for parallel programming[J].ACM SIGPLAN Fortran Forum,1998,17(2):1-31. [12]YELICK K,SEMENZATO L,PIKE G,et al.Titanium:A high-performance Java dialect[C]∥ACM 1998 Workshop on Java for High-Performance Network Computing.ACM,1998:1-10. [13]CARLSON W,DRAPER J M,CULLER D E,et al.Introduction to UPC and language specification[R].University of California-Berkeley,1999. [14]ALPERN B,CARTER L,FERRANTE J.ZPL A Machine Independent Programming[J].IEEE Transactions on Software Engineering,2000,26(3):197-211. [15]CALLAHAN D,CHAMBERLAIN B L,ZIMA H P.The Cascade high productivity language[C]∥Ninth International Workshop on High-Level Parallel Programming Models and Suppor-tive Environments.IEEE,2004:52-60. [16]CHARLES P,GROTHOFF C.X10:an object-oriented approach to non-uniform cluster computing[C]∥Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming,systems,languages,and applications.ACM,2005:16-20. [17]CUDA 2.2 Programming Guide[OL].http://www.nvidia. com/object/cuda_develop.html. [18]Stream_Computing_User_Guide[OL]. [19]Khronos OpenCL Working Group.The OpenCL Specification Version:1.0[OL]. [20]CHEN T,RAGHAVAN R,DALE J,et al.Cell broadband engine architecture and its first implementation:a performance view[J].IBM Journal of Research and Development,2007,51(5):559-572. [21]OREN G,GANAN Y,MALAMUD G.Automp:An automatic openmp parallization generator for variable-oriented high-performance scientific codes[J].IJCOPI,2018,9(1):46-53. [22]BERTOLACCI I,STROUT M,SUPINSKI B,et al.Extending openmp to facilitate loop optimization[C]∥14th International Workshop on OpenMP.IWOMP,2018:53-65. [23]KANG S,LEE A,LEE K.Performance comparison of openmp,mpi,and mapreduce in practical problems[J].Advances in Multimedia,2015,24(5):1-9. [24]WU X,TAYLOR V.Performance characteristics of hybrid mpi/openmp scientific applications on a largescale multithreaded blue-gene/q supercomputer[J].IJNDC,2013,1(4):213-225. [25]BENEDICT S.SCALE-EA:A scalability aware performance tuning framework for openmp applications[J].Scalable Computing:Practice and Experience,2018,19(1):15-30. [26]LI H,CHEN Z,GUPTA R.Parastack:effecient hang detection for MPI programs at large scale[C]∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2017:1-12. [27]MORRIS B,SKJELLUM A.Mpignite:An mpi-like language and prototype implementation for apache spark[J].arXiv:1707.04788. [28]AUBREY-JONES T,FISCHER B.Synthesizing MPI implementations from functional data-parallel programs[J].International Journal of Parallel Programming.Springer,2016,44(3):552-573. [29]KOWALEWSKI T,FÜRLINGER K.Nasty-mpi:Debugging synchronization errors in MPI-3 one-sided applications[M]∥European Conference on Parallel Processing.Cham:Springer,2016:51-62. [30]DENIS A,TRAHAY F.MPI overlap:Benchmark and analysis[C]∥45th International Conference on Parallel Processing.ICPP,2016:258-267. [31]KONIGES A,COOK B,DESLIPPE J,et al.MPI usage at NERSC:present and future[C]∥Proceedings of the 23rd European MPI Users’ Group Meeting.EuroMPI,2016:217-227. [32]IMANI M,KIM Y,ROSING T.MPIM:multi-purpose in-memory processing using con gurable resistive memory[C]∥22nd Asia and South Pacific Design Automation Conference.ASP-DAC,2017:757-763. [33]MÉNDEZ S,REXACHS D,LUQUE E.Analyzing the parallel I/O severity of MPI applications[C]∥Proceedings of the 17th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.IEEE,2017:953-962. [34]DAO T,CHIBA S.Semem:Deployment of mpi-based in-memory storage for hadoop on supercomputers[M]∥EuropeanConfe-rence on Parallel Processing.Cham:Springer,2017:442-454. [35]BAYSER M,CERQUEIRA R.Integrating MPI with docker for HPC[C]∥2017 IEEE International Conference on Cloud Engineering.IC2E,2017:259-265. [36]AHMED H,SKJELLUM A,BANGALORE P,et al.Transforming blocking MPI collectives to non-blocking and persistent operations[C]∥Proceedings of the 24th European MPI Users’ Group Meeting.EuroMPI,2017:1-11. [37]MAGOULÈS F,BENISSAN G.JACK2:an mpi-based communication library with non-blocking synchronization for asynchronous iterations[J].Advances in Engineering Software,2018,23(5):116-133. [38]NIELSEN F.Introduction to HPC with MPI for Data Science [M]∥Undergraduate Topics in Computer Science.Springer,2016. [39]BADER D.Evolving MPI+X toward exascale[J].IEEE Computer,2016,49(8):10-18. [40]MOHAMED H,MARCHAND-MAILLET S.MRO-MPI:ma- preduce overlapping using MPI and an optimized data exchange policy[J].Parallel Computing,2013,39(12):851-866. [41]YIN J,FORAN A,WANG J.DL-MPI:enabling data locality computation for mpi-based data-intensive applications[C]∥Proceedings of the 2013 IEEE International Conference on Big Data.IEEE,2013:506-511. [42]SNIR M.Technical perspective:The future of MPI[J].Communications of the ACM,2018,61(10):105-114. [43]RAMESH S,MAHÉO A,SHENDE S,et al.MPI performance engineering with the MPI tool interface:The integration of MVAPICH and TAU[J].Parallel Computing,2018,77(1):19-37. [44]PIMENTA A,CÉSAR E,SIKORA A.Methodology for MPI applications autotuning[C]∥20th European MPI Users’s Group Meeting.EuroMPI,2013:145-146. [45]DEUZEMAN A,REKER S,URBACH C.Lemon:An MPI pa- rallel I/O library for data encapsulation using LIME[J].Compu-ter Physics Communications,2012,183(6):1321-1335. [46]LI S,ZHANG Y,HOEFER T.Cache-oblivious MPI all-to-all communications based on morton order[J].IEEE Transactions on Parallel and Distributed Systems,2018,29(3):542-555. [47]ALFATAFTA M,ALSADER Z,AL-KISWANY S.COOL:A cloud-optimized structure for MPI collective operations[C]∥11th IEEE International Conference on Cloud Computing.CLOUD,2018:746-753. [48]DONATO D.Simple,effcient allocation of modelling runs on heterogeneous clusters with MPI[J].Environmental Modelling and Software,2017,88(3):48-57. [49]SUBRAMONI H,HAMIDOUCHE K,VENKATESH A,et al.Designing MPI library with dynamic connected transport (DCT) of infiniband:Early experiences[C]∥29th International Confe-rence Supercomputing.ISC,2014:278-295. [50]ISLAM T,MOHROR K,SCHULZ M.Exploring the MPI tool information interface:features and capabilities[J].IJHPCA,2016,30(2):212-222. [51]SHARMA A,MOULITSAS I.MPI to coarray fortran:Expe- riences with a CFD solver for unstructured meshes[J].Scientific Programming,2017,55(2):1-12. [52]SHTERENLIKHT A,MARGETTS L,CEBAMANOS L.Mo- delling fracture in heterogeneous materials on HPC systems using a hybrid mpi/fortran coarray multi-scale CAFE framework[J].Advances in Engineering Software,2018,12(5):155-166. [53]MATTSON P.A Programming System for the Imagine Media Processor[D].Stanford:University of Stanford,2002. [54]IBM Corporation.Software development kit for multi-core acce- leration[OL]. [55]SCHNEIDER S,YEOM S,ROSE B,et al.A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies[C]∥Proc.of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).ACM,2009:131-140. |
[1] | HUANG Pu, DU Xu-ran, SHEN Yang-yang, YANG Zhang-jing. Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation [J]. Computer Science, 2022, 49(6A): 407-411. |
[2] | HUANG Pu, SHEN Yang-yang, DU Xu-ran, YANG Zhang-jing. Face Recognition Based on Locality Constrained Feature Line Representation [J]. Computer Science, 2022, 49(6A): 429-433. |
[3] | QU Wei, YU Fei-hong. Survey of Research on Asymmetric Embedded System Based on Multi-core Processor [J]. Computer Science, 2021, 48(6A): 538-542. |
[4] | HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58. |
[5] | LI Rui-xiang, MAO Ying-chi, HAO Shuai. Cache Management Method in Mobile Edge Computing Based on Approximate Matching [J]. Computer Science, 2021, 48(1): 96-102. |
[6] | GUO Jie, GAO Xi-ran, CHEN Li, FU You, LIU Ying. Parallelizing Multigrid Application Using Data-driven Programming Model [J]. Computer Science, 2020, 47(8): 32-40. |
[7] | CHENG Sheng-gan, YU Hao-ran, WEI Jian-wen, James LIN. Design and Optimization of Two-level Particle-mesh Algorithm Based on Fixed-point Compression [J]. Computer Science, 2020, 47(8): 56-61. |
[8] | JIN Qi, WANG Jun-chang, FU Xiong. Cuckoo Hash Table Based on Smart Placement Strategy [J]. Computer Science, 2020, 47(8): 80-86. |
[9] | LI Jin-xia, ZHAO Zhi-gang, LI Qiang, LV Hui-xian and LI Ming-sheng. Improved Locality and Similarity Preserving Feature Selection Algorithm [J]. Computer Science, 2020, 47(6A): 480-484. |
[10] | CAI Yu-xin, TANG Zhi-wei, ZHAO Bo, YANG Ming and WU Yu-fei. Accelerated Software System Based on Embedded Multicore DSP [J]. Computer Science, 2020, 47(6A): 622-625. |
[11] | LV Xiao-jing, LIU Zhao, CHU Xue-sen, SHI Shu-peng, MENG Hong-song, HUANG Zhen-chun. Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations [J]. Computer Science, 2020, 47(4): 13-17. |
[12] | ZHU Xiao-ling, LI Kun, ZHANG Chang-sheng, DU Fu-xin. Elevator Boot Fault Diagnosis Method Based on Gabor Wavelet Transform and Multi-coreSupport Vector Machine [J]. Computer Science, 2020, 47(12): 258-261. |
[13] | ZHANG Zhou, HUANG Guo-rui, JIN Pei-quan. Task Scheduling on Storm:Current Situations and Research Prospects [J]. Computer Science, 2019, 46(9): 28-35. |
[14] | LIANG Yuan, YUAN Jing-ling, CHEN Min-cheng. Prefetching Algorithm of Sarsa Learning Based on Space Optimization [J]. Computer Science, 2019, 46(3): 327-331. |
[15] | SU Qing-hua, FU Jing-chao, GU Han, ZHANG Shan-shan, LI Yi-fei, JIANG Fang-zhou, BAI Han-lin, ZHAO Di. Parallel Algorithm Design for Assisted Diagnosis of Prostate Cancer [J]. Computer Science, 2019, 46(11A): 524-527. |