Computer Science ›› 2026, Vol. 53 ›› Issue (6): 102-116.doi: 10.11896/jsjkx.251000119
• High Performance Computing • Previous Articles Next Articles
ZHU Huming, LIU Huijie, DONG Ximiao, CHEN Zhipeng, GAO Tianqi, JIAO Licheng
CLC Number:
| [1]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [2]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27:2672-2680. [3]KINGMA D P,WELLING M.Auto-encoding variational bayes[M].Banff,2013. [4]LIU Y,ZHANG K,LI Y,et al.Sora:A review on background,technology,limitations,and opportunities of large vision models[J].arXiv:2402.17177,2024. [5]MA G,HUANG H,YAN K,et al.Step-video-t2v technical report:The practice,challenges,and future of video foundation model[J].arXiv:2502.10248,2025. [6]BAO F,XIANG C,YUE G,et al.Vidu:A highly consistent,dynamic and skilled text-to-video generator with diffusion models[J].arXiv:2405.04233,2024. [7]XUE J,DENG Y,GAO Y,et al.Auffusion:Leveraging the Po-wer of Diffusion and Large Language Models for Text-to-Audio Generation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2024,32:4700-4712. [8]KONG Z,PING W,HUANG J,et al.Diffwave:A versatile diffusion model for audio synthesis[J].arXiv:2009.09761,2020. [9]LUO S,HU W.Diffusion probabilistic models for 3D point cloud generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2837-2845. [10]VARADI M,ANYANGO S,DESHPANDE M,et al.AlphaFold Protein Structure Database:massively expanding the structural coverage of protein-sequence space with high-accuracy models[J].Nucleic Acids Research,2022,50(D1):D439-D444. [11]WANG J,PENG W,TANG J,et al.Act to See,See to Act:Diffusion-Driven Perception-Action Interplay for Adaptive Policies[J].arXiv:2509.25822,2025. [12]SONG J,MENG C,ERMON S.Denoising Diffusion ImplicitModels[C]//International Conference on Learning Representations(ICLR).2021. [13]LU C,ZHOU Y,BAO F,et al.DPM-Solver:A fast ODE solver for diffusion probabilistic model sampling in around 10 steps[J].Advances in Neural Information Processing Systems,2022,35:5775-5787. [14]SONG Y,SOHL-DICKSTEIN J,KINGMA D P,et al.Score-Based Generative Modeling through Stochastic Differential Equations[C]//International Conference on Learning Representations(ICLR).2021. [15]ESSER P,KULAL S,BLATTMANN A,et al.Scaling rectified flow transformers for high-resolution image synthesis[C]//Forty-first International Conference on Machine Learning.2024. [16]HO J,SALIMANS T.Classifier-free diffusion guidance[J].ar-Xiv:2207.12598,2022. [17]RUIZ N,LI Y,JAMPANI V,et al.Dreambooth:Fine tuning text-to-image diffusion models for subject-driven generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:22500-22510. [18]ZHANG L,RAO A,AGRAWALA M.Adding conditional control to text-to-image diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:3836-3847. [19]SONG Y,ERMON S.Generative modeling by estimating gradients of the data distribution[J].Advances in Neural Information Processing Systems,2019,32:11918-11930 [20]ROMBACH R,BLATTMANN A,LORENZ D,et al.High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10684-10695. [21]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255. [22]ZHU H,WU W,ZHU W,et al.Celebv-HQ:A large-scale video facial attributes dataset[C]//European Conference on Computer Vision.Springer,2022:650-667. [23]PENG X,ZHENG Z,SHEN C,et al.Open-Sora 2.0:Training a commercial-level video generation model in 200k GPU hours[J].arXiv:2503.09642,2025. [24]YANG Z,TENG J,ZHENG W,et al.CogVideoX:Text-to-video diffusion models with an expert transformer[J].arXiv:2408.06072,2024. [25]CAO H,TAN C,GAO Z,et al.A survey on generative diffusion models[J].IEEE Transactions on Knowledge and Data Engineering,2024,36(6):2607-2631. [26]YANG L,ZHANG Z,SONG Y,et al.Diffusion models:A comprehensive survey of methods and applications[J].ACM Computing Surveys,2023,56(4):1-39. [27]YE H,LIN H,HAN J,et al.TFG:Unified training-free guidance for diffusion models[J].Advances in Neural Information Processing Systems,2024,37:22370-22417. [28]MA Z,ZHANG Y,JIA G,et al.Efficient Diffusion Models:A Comprehensive Survey from Principles to Practices[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2025(1):1-20. [29]ZHU H M,LI P,JIAO L C,et al.Review of parallel deep neural network[J].Chinese Journal of Computers,2018,41(8):1861-1881. [30]LIANG P,TANG Y,ZHANG X,et al.A survey on auto-parallelism of large-scale deep learning training[J].IEEE Transactions on Parallel and Distributed Systems,2023,34(8):2377-2390. [31]ZHAO H Y,LI Z K,QIAN S Y,et al.GPU Performance Characterization in Distributed Systems:Survey and Research Directions[J].Journal of Chinese Computer Systems,2026,47(1):58-72. [32]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical image Computing and Computer-Assisted Intervention.Cham:Springer,2015:234-241. [33]ZAGORUYKO S,KOMODAKIS N.Wide residual networks[J].arXiv:1605.07146,2016. [34]DHARIWAL P,NICHOL A.Diffusion models beat GANs on image synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794. [35]RAMESH A,DHARIWAL P,NICHOL A,et al.Hierarchical text-conditional image generation with CLIP latents[J].arXiv:2204.06125,2022. [36]BETKER J,GOH G,JING L,et al.Improving image generation with better captions[EB/OL].https://cdn.openai.com/papers/dall-e-3.pdf. [37]SAHARIA C,CHAN W,SAXENA S,et al.Photorealistic text-to-image diffusion models with deep language understanding[J].Advances in Neural Information Processing Systems,2022,35:36479-36494. [38]PODELL D,ENGLISH Z,LACEY K,et al.SDXL:Improving latent diffusion models for high-resolution image synthesis[J].arXiv:2307.01952,2023. [39]BAI J,BAI S,CHU Y,et al.Qwen technical report[J].arXiv:2309.16609,2023. [40]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [41]BAO F,NIE S,XUE K,et al.All are worth words:A ViT backbone for diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:22669-22679. [42]PEEBLES W,XIE S.Scalable diffusion models with transfor-mers[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2023:4195-4205. [43]CHEN J,GE C,XIE E,et al.PixArt-σ:Weak-to-strong training of diffusion transformer for 4K text-to-image generation[C]//European Conference on Computer Vision.Springer,2024:74-91. [44]TEAM G.Mochi 1[EB/OL].https://github.com/genmoai/models. [45]KONG W,TIAN Q,ZHANG Z,et al.HunyuanVideo:A systematic framework for large video generative models[J].arXiv:2412.03603,2024. [46]WU B,ZOU C,LI C,et al.HunyuanVideo 1.5 Technical Report[J].arXiv:2511.18870,2025. [47]MA X,WANG Y,JIA G,et al.Latte:Latent diffusion trans-former for video generation[J].arXiv:2401.03048,2024. [48]WAN T,WANG A,AI B,et al.Wan:Open and advanced large-scale video generative models[J].arXiv:2503.20314,2025. [49]GAO Y,GUO H,HOANG T,et al.Seedance 1.0:Exploring the boundaries of video generation models[J].arXiv:2506.09113,2025. [50]BYTE DANCE SEED TEAM.Seedance 1.5 pro:A Native Audio-Visual Joint Generation Foundation Model[J].arXiv:2512.13507,2025. [51]MA N,GOLDSTEIN M,ALBERGO M S,et al.SIT:Exploring flow and diffusion-based generative models with scalable interpolant transformers[C]//European Conference on Computer Vision.Springer,2024:23-40. [52]LIN S,WANG A,YANG X.SDXL-Lightning:Progressive adversarial diffusion distillation[J].arXiv:2402.13929,2024. [53]LI Z,ZHANG J,LIN Q,et al.Hunyuan-DiT:A powerful multi-resolution diffusion transformer with fine-grained Chinese understanding[J].arXiv:2405.08748,2024. [54]LI S,ZHAO Y,VARMA R,et al.PyTorch Distributed:Experiences on accelerating data parallel training[J].arXiv:2006.15704,2020. [55]ZHAO Y,GU A,VARMA R,et al.PyTorch FSDP:Experiences on scaling fully sharded data parallel[J].arXiv:2304.11277,2023. [56]WILLIAMS S W,WATERMAN A,PATTERSON D A.Roofline:An insightful visual performance model for floating-point programs and multicore architectures:Technical Report UCB/EECS-2008-134 [R].Berkeley:EECS Department,University of California,2008. [57]YUAN Z,SHANG Y,ZHOU Y,et al.LLM inference unveiled:Survey and roofline model insights[J].arXiv:2402.16363,2024. [58]YUAN Z,ZHANG H,PU L,et al.DiT-FastAttn:Attentioncompression for diffusion transformer models[J].Advances in Neural Information Processing Systems,2024,37:1196-1219. [59]ZHAO X,JIN X,WANG K,et al.Real-time video generation with pyramid attention broadcast[J].arXiv:2408.12588,2024. [60]LI M,CAI T,CAO J,et al.DistriFusion:Distributed parallel inference for high-resolution diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:7183-7193. [61]FANG J,PAN J,SUN X,et al.XDiT:An inference engine for diffusion transformers(DiTs) with massive parallelism[J].ar-Xiv:2411.01738,2024. [62]TEAM V.VideoSys:An easy and efficient system for video ge-neration[EB/OL].https://github.com/NUS-HPC-AI-Lab/VideoSys. [63]DUAN J,ZHANG S,WANG Z,et al.Efficient training of large language models on distributed infrastructures:a survey[J].arXiv:2407.20018,2024. [64]ZHANG Z,ZHENG S,WANG Y,et al.MiCS:Near-linear sca-ling for training gigantic model on public cloud[J].Advances in Neural Information Processing Systems,2022,35:39708-39720. [65]RAJBHANDARI S,RASLEY J,RUWASE O,et al.ZeRO:Memory optimizations toward training trillion parameter mo-dels[C]//SC20:International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2020:1-16. [66]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J].Advances in Neural Information Processing Systems,2017,30:6626-6637. [67]SHOEYBI M,PATWARY M,PURI R,et al.Megatron-LM:Training multi-billion parameter language models using model parallelism[J].arXiv:1909.08053,2019. [68]XU Q,YOU Y.An efficient 2D method for training super-large deep learning models[C]//2023 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2023:222-232. [69]BIAN Z,XU Q,WANG B,et al.Maximizing parallelism in distributed training for huge neural networks[J].arXiv:2105.14450,2021. [70]HUANG Y,CHENG Y,BAPNA A,et al.GPipe:Efficient trai-ning of giant neural networks using pipeline parallelism[J].Advances in Neural Information Processing Systems,2019,32:103-112. [71]NARAYANAN D,HARLAP A,PHANISHAYEE A,et al.PipeDream:Generalized pipeline parallelism for DNN training[C]//Proceedings of the 27th ACM Symposium on Operating Systems Principles.2019:1-15. [72]NARAYANAN D,SHOEYBI M,CASPER J,et al.Efficientlarge-scale language model training onGPU clusters using megatron-lm[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-15. [73]LI Z,ZHUANG S,GUO S,et al.TeraPipe:Token-level pipeline parallelism for training large-scale language models[C]//International Conference on Machine Learning.PMLR,2021:6543-6552. [74]TIAN Y,JIA Z,LUO Z,et al.DiffusionPipe:Training large diffusion models with efficient pipelines[J].Proceedings of Machine Learning and Systems,2024,6:101-113. [75]KORTHIKANTI V A,CASPER J,LYM S,et al.Reducing activation recomuputation in large transformer models[J].Procee-dings of Machine Learning and Systems,2023,5:341-353. [76]LIU H,ZAHARIA M,ABBEEL P.Ring Attention with blockwise transformers for near-infinite context[J].arXiv:2310.01889,2023. [77]DAO T.FlashAttention-2:Faster attention with better paralle-lism and work partitioning[J].arXiv:2307.08691,2023. [78]DAO T,FU D,ERMON S,et al.FlashAttention:Fast and memory-efficient exact attention with IO-awareness[J].Advances in Neural Information Processing Systems,2022,35:16344-16359. [79]JACOBS S A,TANAKA M,ZHANG C,et al.DeepSpeed Ulysses:System optimizations for enabling training of extreme long sequence transformer models[J].arXiv:2309.14509,2023. [80]SHAZEER N.Fast transformer decoding:One write-head is all you need[J].arXiv:1911.02150,2019. [81]AINSLIE J,LEE-THORP J,DE JONG M,et al.GQA:Training generalized multi-query transformer models from multi-head checkpoints[J].arXiv:2305.13245,2023. [82]FANG J,ZHAO S.USP:A unified sequence parallelism ap-proach for long context generative AI[J].arXiv:2405.07719,2024. [83]ZHAO X,CHENG S,CHEN C,et al.DSP:Dynamic sequenceparallelism for multi-dimensional transformers[J].arXiv:2403.10266,2024. [84]SHIH A,BELKHALE S,ERMON S,et al.Parallel sampling of diffusion models[J].Advances in Neural Information Processing Systems,2023,36:4263-4276. [85]CHEN Z,MA X,FANG G,et al.AsyncDiff:Parallelizing diffusion models by asynchronous denoising[J].arXiv:2406.06911,2024. [86]FANG J,PAN J,LI A,et al.Pipefusion: Patch-level pipeline parallelism for diffusion transformers inference[C]//39th Conference on Neural Information Processing Systems.2025. [87]LIU H,ABBEEL P.Blockwise parallel transformers for large context models[J].Advances in Neural Information Processing Systems,2023,36:8828-8844. [88]RASLEY J,RAJBHANDARI S,RUWASE O,et al.DeepSpeed:System optimizations enable training deep learning models with over 100 billion parameters[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2020:3505-3506. [89]LI S,LIU H,BIAN Z,et al.Colossal-AI:A unified deep learning system for large-scale parallel training[C]//Proceedings of the 52nd International Conference on Parallel Processing.2023:766-775. [90]REN J,RAJBHANDARI S,AMINABADI R Y,et al.ZeRO-Offload:Democratizing billion-scale model training[C]//2021 USENIX Annual Technical Conference(USENIX ATC 21).2021:551-564. [91]PATEL Z,HE E,MANNAN P,et al.Training Video Foundation Models with NVIDIA NeMo[J].arXiv:2503.12964,2025. [92]FEI Z,FAN M,YU C,et al.Scaling diffusion transformers to 16 billion parameters[J].arXiv:2407.11633,2024. [93]YUAN Y,WANG Z,HUANG Z,et al.Expert Race:A flexible routing strategy for scaling diffusion transformer with mixture of experts[J].arXiv:2503.16057,2025. [94]SHENG G,ZHANG C,YE Z,et al. Hybridflow:A flexible and efficient rlhf framework[C]//Proceedings of the Twentieth European Conference on Computer Systems.2025:1279-1297. |
| [1] | CHEN Peng, HAO Junfeng, XIA Yunni, LI Xi. Novel Multi-task Federated Learning Based Approach for Detecting and Diagnosing Anomalies inCloud Microservices [J]. Computer Science, 2026, 53(5): 388-403. |
| [2] | KANG Jun, GAO Shengkai, LAI Jiabao. Fast Map Matching Method Based on Trajectory Micro-segment Model [J]. Computer Science, 2026, 53(4): 252-259. |
| [3] | ZHAO Haihua, TANG Rui, MO Xian. Review of Methods and Applications of Graph Diffusion Models [J]. Computer Science, 2026, 53(3): 115-128. |
| [4] | WANG Yiming, JIAO Min, ZHAO Suyun, CHEN Hong, LI Cuiping. Prompt-conditioned Representation Learning with Diffusion Models for Semi-supervised Clustering [J]. Computer Science, 2026, 53(3): 158-165. |
| [5] | ZHANG Manjing, HE Yulin, LI Xu, HUANG Zhexue. Distributed Two-stage Clustering Method Based on Node Sampling [J]. Computer Science, 2025, 52(2): 134-144. |
| [6] | WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai. Large-scale Network Community Detection Algorithm Based on MapReduce [J]. Computer Science, 2024, 51(4): 11-18. |
| [7] | GE Yinchi, ZHANG Hui, SUN Haohang. Differential Privacy Data Synthesis Method Based on Latent Diffusion Model [J]. Computer Science, 2024, 51(3): 30-38. |
| [8] | YAN Zhihao, ZHOU Zhangbing, LI Xiaocui. Survey on Generative Diffusion Model [J]. Computer Science, 2024, 51(1): 273-283. |
| [9] | HAN Qiqi, LIU Xin. Application of Air-Sea Coupled Mode in High-speed Interconnection Environment [J]. Computer Science, 2023, 50(11A): 221000136-5. |
| [10] | WANG Ru-bin, LI Rui-yuan, HE Hua-jun, LIU Tong, LI Tian-rui. Distributed Distance Join Algorithm for Massive Spatial Data [J]. Computer Science, 2022, 49(1): 95-100. |
| [11] | QIAN Tian-tian, ZHANG Fan. Emotion Recognition System Based on Distributed Edge Computing [J]. Computer Science, 2021, 48(6A): 638-643. |
| [12] | YUAN Chen-yu, XIE Zai-peng, ZHU Xiao-rui, QU Zhi-hao, XU Yuan-yuan. Convolutional Optimization Algorithm Based on Distributed Coding [J]. Computer Science, 2021, 48(2): 47-54. |
| [13] | LI Bo-jia, ZHANG Yang-sen, CHEN Ruo-yu. Method for Generating Massive Data with Assignable Distribution [J]. Computer Science, 2019, 46(8): 56-63. |
| [14] | ZHU Kun, HUANG Rui-zhang and ZHANG Na-na. Efficient Frequent Patterns Mining Algorithm Based on MapReduce Model [J]. Computer Science, 2017, 44(7): 31-37. |
| [15] | ZHU Kai-long, LU Yu-liang and YANG Bin. Study on Invulnerability of Router-level Internet Based on MapReduce [J]. Computer Science, 2017, 44(11): 168-174. |
|
||