Computer Science ›› 2024, Vol. 51 ›› Issue (9): 51-58.doi: 10.11896/jsjkx.230900079
• High Performance Computing • Previous Articles Next Articles
XU He1,2, ZHOU Tao1,2, LI Peng1,2, QIN Fangfang3, JI Yimu1,2
CLC Number:
[1]DEMMEL J.LAPACK:a portable linear algebra library for supercomputers[C]//IEEE Control Systems Society Workshop on Computer-Aided Control System Design.USA:IEEE,1989:1-7. [2]BULUÇ A,GILBERT J R.The Combinatorial BLAS:Design,implementation,and applications[J].The International Journal of High Performance Computing Applications,2011,25(4):496-509. [3]CHOI J,DONGARRA J J,POZO R,et al.ScaLAPACK:A sca-lable linear algebra library for distributed memory concurrent computers[C]//The Fourth Symposium on the Frontiers of Massively Parallel Computation.USA:IEEE Computer Society,1992:120-127. [4]CHOI J,DONGARRA J J,OSTROUCHOVL S,et al.Design and Implementation of the ScaLAPACK LU,QR,and Cholesky Factorization Routines [J].Scientific Programming,1996,5(3):173-184. [5]CHOI J,DONGARRAJ J,OSTROUCHOV L S,et al.A Pro-posal for a set of parallel basic linear algebra subprograms[C]//Applied Parallel Computing Computations in Physics,Chemistry and Engineering Science:Second International Workshop.Denmark:Springer,1996:107-114. [6]DONGARRAJ J,HAMMARLING S,HIGHAM N J,et al.The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems [J].Procedia Computer Science,2017,108:495-504. [7]CHEN G L,SUN G Z,ZHANG Y Q,et al.Study on ParallelComputing [J].Journal of Computer Science and Technology,2006,21(5):665-673. [8]CHOI J,WALKER D W,DONGARRA J J.Pumma:Paralleluniversal matrix multiplication algorithms on distributed memory concurrent computers [J].Concurrency Practice & Experience,1994,6(7):543-570. [9]LUO H W,WU Y J,SHANG H H.Many-core OptimizationMethod for the Calculation of Ab initio Polarizability[J].Computer Science,2023,50(6):1-9. [10]LIU L,LIU Q K,SONG X Y.Study on Parallel of GaussianElimination [J].Computer Engineering,2011,37(8):40-42. [11]MEAD J L,RENAUT R A,WELFERT B D.Stability of a Pivoting Strategy for Parallel Gaussian Elimination [J].BIT Nume-rical Mathematics,2001,41(3):633-639. [12]LI W Q.A Chasing Method for Solving Cyclic Tridiagonal Equations [J].Science & Technology Review,2009,27(14):69-72. [13]KONGHUA G,HU X,ZHANG L.A new iteration method for the matrix equation AX=B [J].Applied Mathematics and Computation,2007,187(2):1434-1441. [14]LIU J B,HE M.A Hybrid Parallelization Technique Based on OpenMP and VALU Acceleration for the Method of Moments Solution of the Surface Integral Equations [J].Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology,2014,34(1):50-55. [15]BILMES J,ASANOVIC K,CHIN C W,et al.Author retrospec-tive for optimizing matrix multiply using PHiPAC:a portable high-performance ANSI C coding methodology[C]//ACM International Conference on Supercomputing 25th Anniversary Volume.Germany:Association for Computing Machinery,2014:42-44. [16]WHALEY R C,PETITET A,DONGARRA J J.Automated empirical optimizations of software and the ATLAS project [J].Parallel Computing,2001,27(1):3-35. [17]KIM R,CHOI J,LEE M.Optimizing parallel GEMM routinesusing auto-tuning with Intel AVX-512[C]//Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region.China:Association for Computing Machine-ry,2019:101-110. [18]HASSAN S A,HEMEIDA A M,MAHMOUD M M M.Per-formance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions(AVX) [J].Microprocessors &Microsystems,2016:47(B):369-374. [19]LIM R,LEE Y,KIM R,et al.OpenMP-based parallel implemen-tation of matrix-matrix multiplication on the intel knights landing[C]//Proceedings of Workshops of HPC Asia.Tokyo:Association for Computing Machinery,2018:63-66. [20]GUNEY M E,GOTO K,COSTA T B,et al.Optimizing Matrix Multiplication on Intel© Xeon Phi TH x200 Architecture[C]//2017 IEEE 24th Symposium on Computer Arithmetic(ARITH).UK:IEEE,2017:144-145. [21]ZHAI X L,ZHANG Y G,JIN A Z,et al.Parallel DVB-RCS2Turbo Decoding on Multi-core CPU[J].Computer Science,2023,50(6):22-28. [22]ZHANG Y C,DU K,HUANG T J.Heuristic Tree-Partition-Based Parallel Method for Biophysically Detailed Neuron Simulation [J].Neural computation,2023,35(4):627-644. [23]GNANASEKARAN A,DARVE E.Scalable low-rank factorization using a task-based runtime system with distributed memory[C]//Proceedings of the Platform for Advanced Scientific Computing Conference.Switzerland:Association for Computing Machinery,2022:Article 8. [24]DONGARRA J J,FAVERGE M,LTAIEF H,et al.Achievingnumerical accuracy and high performance using recursive tile LU factorization with partial pivoting [J].Concurrency and Computation:Practice and Experience,2014,26(7):1408-1431. [25]KWASNIEWSKI G,KABIC M,BEN-NUN T,et al.On the pa-rallel I/O optimality of linear algebra kernels:near-optimal LU factorization[C]//Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.Republic of Korea:Association for Computing Machinery,2021:463-464. [26]YU T,WANG L S,QIN X L.Lock-free Parallel Semi-naive Algorithm Based on Multi-core CPU[J].Computer Science,2023,50(6):29-35. [27]XIA J,CHENG C,ZHOU X,et al.Kunpeng 920:The first 7-nm chiplet-based 64-core arm soc for cloud services [J].IEEE Micro,2021,41(5):67-75. [28]MOLDOVANOVA O V,KURNOSOV M G.Auto-vectorization of loops on Intel 64 and Intel Xeon Phi:Analysis and evaluation[C]//International Conference on Parallel Computing Techno-logies.Switzerland:Springer International Publishing,2017:143-150. [29]FAN L L,QIAO Y H,LI J F,et al.CP2K Software Porting and Optimization Based on Domestic c86 Processor[J].Computer Science,2023,50(6):58-65. |
[1] | ZHAI Xulun, ZHANG Yongguang, JIN Anzhao, QIANG Wei, LI Mengbing. Parallel DVB-RCS2 Turbo Decoding on Multi-core CPU [J]. Computer Science, 2023, 50(6): 22-28. |
[2] | DING Yue, XU Chuanfu, QIU Haozhong, DAI Weixi, WANG Qingsong, LIN Yongzhen, WANG Zhenghua. Study on Cross-platform Heterogeneous Parallel Computing for Lattice Boltzmann Multi-phase Flow Simulations Based on SYCL [J]. Computer Science, 2023, 50(11): 32-40. |
[3] | FENG Chen, GU Jingjing. Efficient Distributed Training Framework for Federated Learning [J]. Computer Science, 2023, 50(11): 317-326. |
[4] | CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107. |
[5] | ZHU Ruo-chen, YANG Chang-chun, ZHANG Deng-hui. EGOS-DST:Efficient Schema-guided Approach to One-step Dialogue State Tracking for Diverse Expressions [J]. Computer Science, 2022, 49(11A): 210900246-7. |
[6] | HUANG Jia-wei, LI Xiao-peng, LING Cheng. Scalable Parallel Computing Method for Conditional Likelihood Probability of Nucleotide Molecular Phylogenetic Tree Based on GPU [J]. Computer Science, 2022, 49(11A): 210800189-7. |
[7] | FU Tian-hao, TIAN Hong-yun, JIN Yu-yang, YANG Zhang, ZHAI Ji-dong, WU Lin-ping, XU Xiao-wen. Performance Skeleton Analysis Method Towards Component-based Parallel Applications [J]. Computer Science, 2021, 48(6): 1-9. |
[8] | HE Ya-ru, PANG Jian-min, XU Jin-long, ZHU Yu, TAO Xiao-han. Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform [J]. Computer Science, 2021, 48(6): 34-40. |
[9] | LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204. |
[10] | HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28. |
[11] | MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122. |
[12] | CHEN Guo-liang, ZHANG Yu-jie, . Development of Parallel Computing Subject [J]. Computer Science, 2020, 47(8): 1-4. |
[13] | YANG Wang-dong, WANG Hao-tian, ZHANG Yu-feng, LIN Sheng-le, CAI Qin-yun. Survey of Heterogeneous Hybrid Parallel Computing [J]. Computer Science, 2020, 47(8): 5-16. |
[14] | YANG Zong-lin, LI Tian-rui, LIU Sheng-jiu, YIN Cheng-feng, JIA Zhen, ZHU Jie. Streaming Parallel Text Proofreading Based on Spark Streaming [J]. Computer Science, 2020, 47(4): 36-41. |
[15] | DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429. |
|