并行程序设计语言中局部性机制的研究

doi:10.11896/jsjkx.181202409

Abstract

Abstract: The memory access locality of a parallel program becomes a more and more important factor for exploiting more performance from the more and more complex memory hierarchy of current multi-core processors.In this paper,two different kinds of locality concept,horizontal locality and vertical locality,were proposed and defined.The state-of-the-art of parallel programming languages were investigated and analyzed,while the methods and mechanisms on how these parallel programming languages describe and control the memory access locality were analyzed in detail based on these two kinds view of horizontal locality and vertical locality.Finally,some future research directions on parallel programming languages were summarized,especially on the importance of integrating and support both horizontal locality and vertical locality in the future parallel programming language research.

Key words: Locality, Multi-core, Parallel programming language, Parallel programming model, Parallelism

CLC Number:

TP312

YUAN Liang,ZHANG Yun-quan,BAI Xue-rui,ZHANG Guang-ting. Research on Locality-aware Design Mechanism of State-of-the-art Parallel Programming Languages[J].Computer Science, 2020, 47(1): 7-16.

References

[1]HILL M D,MARTY M R.Amdahl’s Law in the Multicore Era[J].IEEE Computer,2008,41(7):33-38.
[2]Message Passing Interface Forum.MPI:A Message-Passing Interface Standard(Version 2.1)[S].High-Performance Computing Center Stuttgart,2008.
[3]OpenMP Standards Board.OpenMP Application Program Interface[OL].http://openmp.org/wp/openmp-specifications/.
[4]ZHANG Y Q,CHEN G L,SUN G Z,et al.Models of parallel computation:a survey and classification[J].Frontiers of Computer Science in China,2007,1(2):156-165.
[5]ZHANG Y Q.DRAM(h):A Parallel Computation Model for High Performance Numerical Computing[J].Chinese Journal of Computers,2003,26(12):1660-1670.
[6]FATAHALIAN K,KNIGHT T J,HOUSTON M,et al.Sequoia: programming the memory hierarchy[C]∥Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC).IEEE,2006:11-17.
[7]KNIGHT T,PARK J,REN M,et al.Compilation for explicitly managed memory hierarchies[C]∥Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP).ACM,2007:14-17.
[8]HOUSTON M,PARK J,REN M,et al.A portable runtime interface for multi-level memory hierarchies[C]∥Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).ACM,2008:20-23.
[9]IEEE POSIX P1003.4a:Threads Extension for Portable Opera- ting Systems[M].Piscataway,NJ:IEEE Press,1994.
[10]High Performance Fortran Forum.High Performance Fortran Language Specification [OL].http://hpff.rice.edu/versions/hpf2.
[11]NUMRICH R,REID J.Co-array Fortran for parallel programming[J].ACM SIGPLAN Fortran Forum,1998,17(2):1-31.
[12]YELICK K,SEMENZATO L,PIKE G,et al.Titanium:A high-performance Java dialect[C]∥ACM 1998 Workshop on Java for High-Performance Network Computing.ACM,1998:1-10.
[13]CARLSON W,DRAPER J M,CULLER D E,et al.Introduction to UPC and language specification[R].University of California-Berkeley,1999.
[14]ALPERN B,CARTER L,FERRANTE J.ZPL A Machine Independent Programming[J].IEEE Transactions on Software Engineering,2000,26(3):197-211.
[15]CALLAHAN D,CHAMBERLAIN B L,ZIMA H P.The Cascade high productivity language[C]∥Ninth International Workshop on High-Level Parallel Programming Models and Suppor-tive Environments.IEEE,2004:52-60.
[16]CHARLES P,GROTHOFF C.X10:an object-oriented approach to non-uniform cluster computing[C]∥Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming,systems,languages,and applications.ACM,2005:16-20.
[17]CUDA 2.2 Programming Guide[OL].http://www.nvidia. com/object/cuda_develop.html.
[18]Stream_Computing_User_Guide[OL].http://developer.amd.com/.
[19]Khronos OpenCL Working Group.The OpenCL Specification Version:1.0[OL].http://www.khronos.org/opencl/.
[20]CHEN T,RAGHAVAN R,DALE J,et al.Cell broadband engine architecture and its first implementation:a performance view[J].IBM Journal of Research and Development,2007,51(5):559-572.
[21]OREN G,GANAN Y,MALAMUD G.Automp:An automatic openmp parallization generator for variable-oriented high-performance scientific codes[J].IJCOPI,2018,9(1):46-53.
[22]BERTOLACCI I,STROUT M,SUPINSKI B,et al.Extending openmp to facilitate loop optimization[C]∥14th International Workshop on OpenMP.IWOMP,2018:53-65.
[23]KANG S,LEE A,LEE K.Performance comparison of openmp,mpi,and mapreduce in practical problems[J].Advances in Multimedia,2015,24(5):1-9.
[24]WU X,TAYLOR V.Performance characteristics of hybrid mpi/openmp scientific applications on a largescale multithreaded blue-gene/q supercomputer[J].IJNDC,2013,1(4):213-225.
[25]BENEDICT S.SCALE-EA:A scalability aware performance tuning framework for openmp applications[J].Scalable Computing:Practice and Experience,2018,19(1):15-30.
[26]LI H,CHEN Z,GUPTA R.Parastack:effecient hang detection for MPI programs at large scale[C]∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2017:1-12.
[27]MORRIS B,SKJELLUM A.Mpignite:An mpi-like language and prototype implementation for apache spark[J].arXiv:1707.04788.
[28]AUBREY-JONES T,FISCHER B.Synthesizing MPI implementations from functional data-parallel programs[J].International Journal of Parallel Programming.Springer,2016,44(3):552-573.
[29]KOWALEWSKI T,FÜRLINGER K.Nasty-mpi:Debugging synchronization errors in MPI-3 one-sided applications[M]∥European Conference on Parallel Processing.Cham:Springer,2016:51-62.
[30]DENIS A,TRAHAY F.MPI overlap:Benchmark and analysis[C]∥45th International Conference on Parallel Processing.ICPP,2016:258-267.
[31]KONIGES A,COOK B,DESLIPPE J,et al.MPI usage at NERSC:present and future[C]∥Proceedings of the 23rd European MPI Users’ Group Meeting.EuroMPI,2016:217-227.
[32]IMANI M,KIM Y,ROSING T.MPIM:multi-purpose in-memory processing using con gurable resistive memory[C]∥22nd Asia and South Pacific Design Automation Conference.ASP-DAC,2017:757-763.
[33]MÉNDEZ S,REXACHS D,LUQUE E.Analyzing the parallel I/O severity of MPI applications[C]∥Proceedings of the 17th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.IEEE,2017:953-962.
[34]DAO T,CHIBA S.Semem:Deployment of mpi-based in-memory storage for hadoop on supercomputers[M]∥EuropeanConfe-rence on Parallel Processing.Cham:Springer,2017:442-454.
[35]BAYSER M,CERQUEIRA R.Integrating MPI with docker for HPC[C]∥2017 IEEE International Conference on Cloud Engineering.IC2E,2017:259-265.
[36]AHMED H,SKJELLUM A,BANGALORE P,et al.Transforming blocking MPI collectives to non-blocking and persistent operations[C]∥Proceedings of the 24th European MPI Users’ Group Meeting.EuroMPI,2017:1-11.
[37]MAGOULÈS F,BENISSAN G.JACK2:an mpi-based communication library with non-blocking synchronization for asynchronous iterations[J].Advances in Engineering Software,2018,23(5):116-133.
[38]NIELSEN F.Introduction to HPC with MPI for Data Science
[M]∥Undergraduate Topics in Computer Science.Springer,2016.
[39]BADER D.Evolving MPI+X toward exascale[J].IEEE Computer,2016,49(8):10-18.
[40]MOHAMED H,MARCHAND-MAILLET S.MRO-MPI:ma- preduce overlapping using MPI and an optimized data exchange policy[J].Parallel Computing,2013,39(12):851-866.
[41]YIN J,FORAN A,WANG J.DL-MPI:enabling data locality computation for mpi-based data-intensive applications[C]∥Proceedings of the 2013 IEEE International Conference on Big Data.IEEE,2013:506-511.
[42]SNIR M.Technical perspective:The future of MPI[J].Communications of the ACM,2018,61(10):105-114.
[43]RAMESH S,MAHÉO A,SHENDE S,et al.MPI performance engineering with the MPI tool interface:The integration of MVAPICH and TAU[J].Parallel Computing,2018,77(1):19-37.
[44]PIMENTA A,CÉSAR E,SIKORA A.Methodology for MPI applications autotuning[C]∥20th European MPI Users’s Group Meeting.EuroMPI,2013:145-146.
[45]DEUZEMAN A,REKER S,URBACH C.Lemon:An MPI pa- rallel I/O library for data encapsulation using LIME[J].Compu-ter Physics Communications,2012,183(6):1321-1335.
[46]LI S,ZHANG Y,HOEFER T.Cache-oblivious MPI all-to-all communications based on morton order[J].IEEE Transactions on Parallel and Distributed Systems,2018,29(3):542-555.
[47]ALFATAFTA M,ALSADER Z,AL-KISWANY S.COOL:A cloud-optimized structure for MPI collective operations[C]∥11th IEEE International Conference on Cloud Computing.CLOUD,2018:746-753.
[48]DONATO D.Simple,effcient allocation of modelling runs on heterogeneous clusters with MPI[J].Environmental Modelling and Software,2017,88(3):48-57.
[49]SUBRAMONI H,HAMIDOUCHE K,VENKATESH A,et al.Designing MPI library with dynamic connected transport (DCT) of infiniband:Early experiences[C]∥29th International Confe-rence Supercomputing.ISC,2014:278-295.
[50]ISLAM T,MOHROR K,SCHULZ M.Exploring the MPI tool information interface:features and capabilities[J].IJHPCA,2016,30(2):212-222.
[51]SHARMA A,MOULITSAS I.MPI to coarray fortran:Expe- riences with a CFD solver for unstructured meshes[J].Scientific Programming,2017,55(2):1-12.
[52]SHTERENLIKHT A,MARGETTS L,CEBAMANOS L.Mo- delling fracture in heterogeneous materials on HPC systems using a hybrid mpi/fortran coarray multi-scale CAFE framework[J].Advances in Engineering Software,2018,12(5):155-166.
[53]MATTSON P.A Programming System for the Imagine Media Processor[D].Stanford:University of Stanford,2002.
[54]IBM Corporation.Software development kit for multi-core acce- leration[OL].http://www.ibm.com/developerworks/power/cell.
[55]SCHNEIDER S,YEOM S,ROSE B,et al.A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies[C]∥Proc.of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).ACM,2009:131-140.

Related Articles 15

[1]	HUANG Pu, DU Xu-ran, SHEN Yang-yang, YANG Zhang-jing. Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation [J]. Computer Science, 2022, 49(6A): 407-411.
[2]	HUANG Pu, SHEN Yang-yang, DU Xu-ran, YANG Zhang-jing. Face Recognition Based on Locality Constrained Feature Line Representation [J]. Computer Science, 2022, 49(6A): 429-433.
[3]	QU Wei, YU Fei-hong. Survey of Research on Asymmetric Embedded System Based on Multi-core Processor [J]. Computer Science, 2021, 48(6A): 538-542.
[4]	HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[5]	LI Rui-xiang, MAO Ying-chi, HAO Shuai. Cache Management Method in Mobile Edge Computing Based on Approximate Matching [J]. Computer Science, 2021, 48(1): 96-102.
[6]	GUO Jie, GAO Xi-ran, CHEN Li, FU You, LIU Ying. Parallelizing Multigrid Application Using Data-driven Programming Model [J]. Computer Science, 2020, 47(8): 32-40.
[7]	CHENG Sheng-gan, YU Hao-ran, WEI Jian-wen, James LIN. Design and Optimization of Two-level Particle-mesh Algorithm Based on Fixed-point Compression [J]. Computer Science, 2020, 47(8): 56-61.
[8]	JIN Qi, WANG Jun-chang, FU Xiong. Cuckoo Hash Table Based on Smart Placement Strategy [J]. Computer Science, 2020, 47(8): 80-86.
[9]	LI Jin-xia, ZHAO Zhi-gang, LI Qiang, LV Hui-xian and LI Ming-sheng. Improved Locality and Similarity Preserving Feature Selection Algorithm [J]. Computer Science, 2020, 47(6A): 480-484.
[10]	CAI Yu-xin, TANG Zhi-wei, ZHAO Bo, YANG Ming and WU Yu-fei. Accelerated Software System Based on Embedded Multicore DSP [J]. Computer Science, 2020, 47(6A): 622-625.
[11]	LV Xiao-jing, LIU Zhao, CHU Xue-sen, SHI Shu-peng, MENG Hong-song, HUANG Zhen-chun. Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations [J]. Computer Science, 2020, 47(4): 13-17.
[12]	ZHU Xiao-ling, LI Kun, ZHANG Chang-sheng, DU Fu-xin. Elevator Boot Fault Diagnosis Method Based on Gabor Wavelet Transform and Multi-coreSupport Vector Machine [J]. Computer Science, 2020, 47(12): 258-261.
[13]	ZHANG Zhou, HUANG Guo-rui, JIN Pei-quan. Task Scheduling on Storm:Current Situations and Research Prospects [J]. Computer Science, 2019, 46(9): 28-35.
[14]	LIANG Yuan, YUAN Jing-ling, CHEN Min-cheng. Prefetching Algorithm of Sarsa Learning Based on Space Optimization [J]. Computer Science, 2019, 46(3): 327-331.
[15]	SU Qing-hua, FU Jing-chao, GU Han, ZHANG Shan-shan, LI Yi-fei, JIANG Fang-zhou, BAI Han-lin, ZHAO Di. Parallel Algorithm Design for Assisted Diagnosis of Prostate Cancer [J]. Computer Science, 2019, 46(11A): 524-527.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Locality-aware Design Mechanism of State-of-the-art Parallel Programming Languages

PDF (PC)