Computer Science ›› 2024, Vol. 51 ›› Issue (12): 129-136.doi: 10.11896/jsjkx.231000110
• High Performance Computing • Previous Articles Next Articles
ZHONG Zhenyu, LIN Yongliang, WANG Haotian, LI Dongwen, SUN Yufei, ZHANG Yuzhi
CLC Number:
[1]BROWN T B,MANN B,RYDER N,et al.Language models arefewshot learners[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.2020:1877-1901. [2]FEDUS W,ZOPH B,SHAZEER N.Switch transformers:Sca-ling to trillion parameter models with simple and efficient sparsity[J].arXiv:2101.03961,2021. [3]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [4]HE K,ZHANG X,REN S,et al.Identity mappings in deep residual networks[C]//Computer Vision-ECCV 2016.Springer,2016:630-645. [5]NVIDIA.CUDA toolkit[EB/OL].https://developer.nvidia.com/cuda-toolkit. [6]LU K,WANG Y,GUO Y,et al.MT-3000:a heterogeneousmulti-zone processor for HPC[J].CCF Transactions on High Performance Computing,2022,4(2):150-164. [7]AWAN A A,CHU C,SUBRAMONI H,et al.OC-DNN:exploiting advanced unified memory capabilities in CUDA 9 and volta gpus for out-of-core DNN training[C]//25th IEEE International Conference on High Performance Computing.IEEE,2018:143-152. [8]MARKTHUB P,BELVIRANLI M E,LEE S,et al.DRAGON:breaking GPU memory capacity limits with direct NVM access[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage,and Analysis.IEEE/ACM,2018:32:1-32:13. [9]RHU M,GIMELSHEIN N,CLEMONS J,et al.vdnn:Virtua-lized deep neural networks for scalable,memory-efficient neural network design[C]//49th Annual IEEE/ACM International Symposium on Microarchitecture.IEEE Computer Society,2016:18:1-18:13. [10]SHOEYBI M,PATWARY M,PURI R,et al.Megatron-lm:Training multi-billion parameter language models using model parallelism[J].arXiv:1909.08053,2019. [11]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010. [12]HUANG Y,CHENG Y,BAPNA A,et al.Gpipe:Efficient trai-ning of giant neural networks using pipeline parallelism[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:103-112. [13]RASLEY J,RAJBHANDARI S,RUWASE O,et al.Deepspeed:System optimizations enable training deep learning models with over 100 billion parameters[C]//Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.ACM,2020:3505-3506. [14]RAJBHANDARI S,RASLEY J,RUWASE O,et al.Zero:me-mory optimizations toward training trillion parameter models[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE/ACM,2020. [15]REN J,RAJBHANDARI S,AMINABADI R Y,et al.Zero-offload:Democratizing billion-scale model training[C]//2021 USENIX Annual Technical Conference.USENIX Association,2021:551-564. [16]RAJBHANDARI S,RUWASE O,RASLEY J,et al.Zero-infinity:breaking the GPU memory wall for extreme scale deep lear-ning[C]//International Conference for High Performance Computing,Networking,Storage and Analysis.ACM,2021. [17]ZHAO Y,GU A,VARMA R,et al.PyTorch FSDP:Experiences on Scaling Fully Sharded Data Parallel[J].Proceedings of the VLDB Endowment,2023,16(12):3848-3860. [18]BI R,XU T,XU M,et al.PaddlePaddle:A Production-Oriented Deep Learning Platform Facilitating the Competency of Enterprises[C]//24th IEEE International Conference on High Performance Computing & Communications; 8th International Conference on Data Science & Systems; 20th Int Conf on Smart City;8th International Conference on Dependability in Sensor,Cloud & Big Data Systems & Application(HPCC/DSS/Smart-City/DependSys).IEEE,2022:92-99. [19]KIM T,KIM H,YU G,et al.BPipe:Memory-Balanced Pipeline Parallelism for Training Large Language Models[C]//Procee-dings of Machine Learning Research:International Conference on Machine Learning.PMLR,2023:16639-16653. [20]GONG C,LIU J,BAO W,et al.Review on Ecological Construction of Domestic High-performance Parallel Application Software in Post Moore Era[J].Journal of System Simulation,2022,34(10):2107-2118. [21]DENG L.The MNIST Database of Handwritten Digit Images for Machine Learning Research[Best of the Web][J].IEEE Signal Processing Magazine,2012,29(6):141-142. |
[1] | XU He, ZHOU Tao, LI Peng, QIN Fangfang, JI Yimu. LU Parallel Decomposition Optimization Algorithm Based on Kunpeng Processor [J]. Computer Science, 2024, 51(9): 51-58. |
[2] | ZHU Fukun, TENG Zhen, SHAO Wenze, GE Qi, SUN Yubao. Semantic-guided Neural Network Critical Data Routing Path [J]. Computer Science, 2024, 51(9): 155-161. |
[3] | HAN Bing, DENG Lixiang, ZHENG Yi, REN Shuang. Survey of 3D Point Clouds Upsampling Methods [J]. Computer Science, 2024, 51(7): 167-196. |
[4] | XU Xiaohua, ZHOU Zhangbing, HU Zhongxu, LIN Shixun, YU Zhenjie. Lightweight Deep Neural Network Models for Edge Intelligence:A Survey [J]. Computer Science, 2024, 51(7): 257-271. |
[5] | ZHU Jin, TAO Chuanqi, GUO Hongjing. Test Input Prioritization Approach Based on DNN Model Output Differences [J]. Computer Science, 2024, 51(6A): 230600121-8. |
[6] | LI Wenting, XIAO Rong, YANG Xiao. Improving Transferability of Adversarial Samples Through Laplacian Smoothing Gradient [J]. Computer Science, 2024, 51(6A): 230800025-6. |
[7] | LI Siyao, LI Shanglin, LUO Jingzhi. Parallel Computing of Reentry Vehicle Trajectory by Multiple Shooting Method Based onOPENMP [J]. Computer Science, 2024, 51(11A): 231000019-6. |
[8] | ZHAO Ruonan, LI Duo, SONG Jiangling, ZHANG Rui. Automatic Sleep Staging Based on Multimodal Data and Fusion Deep Network [J]. Computer Science, 2024, 51(11A): 231100160-6. |
[9] | HE Weilong, SU Lingli, GUO Bingxuan, LI Maosen, HAO Yan. Research and Implementation of Dynamic Scene 3D Perception Technology Based on BinocularEstimation [J]. Computer Science, 2024, 51(11A): 240300045-8. |
[10] | ZHANG Mingze, LI Yi, WU Wenyuan, SHI Mingquan, WANG Zhengjiang. FCTNet:Bus Arrival Time Prediction Method Based on Dual Domain Deep Learning [J]. Computer Science, 2024, 51(11A): 231000180-7. |
[11] | PENG Weidong, GUO Wei, WEI Lin. Reconfigurable Computing System for Parallel Implementation of SVM Training Based on FPGA [J]. Computer Science, 2024, 51(11A): 231100120-7. |
[12] | WANG Xiaozhong, ZHANG Zuyu. Multi Level Parallel Computing for SW26010 Discontinuous Galerkin Finite Element Algorithm [J]. Computer Science, 2024, 51(11A): 240700055-5. |
[13] | ZHAI Xulun, ZHANG Yongguang, JIN Anzhao, QIANG Wei, LI Mengbing. Parallel DVB-RCS2 Turbo Decoding on Multi-core CPU [J]. Computer Science, 2023, 50(6): 22-28. |
[14] | BAI Zhixu, WANG Hengjun, GUO Kexiang. Adversarial Examples Generation Method Based on Image Color Random Transformation [J]. Computer Science, 2023, 50(4): 88-95. |
[15] | YIN Haitao, WANG Tianyou. Image Denoising Algorithm Based on Deep Multi-scale Convolution Sparse Coding [J]. Computer Science, 2023, 50(4): 133-140. |
|