Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240900125-5.doi: 10.11896/jsjkx.240900125
• Image Processing & Multimedia Technology • Previous Articles Next Articles
LIU Bingzhi1, CAO Yin2, ZHOU Yi1
CLC Number:
[1]HUANG R,HUANG J,YANG D,et al.Make-an-audio:Text-to-audio generation with prompt-enhanced diffusion models[J].arXiv:2301.12661,2023. [2]SCHNEIDER F,JIN Z,SCHÖLKOPF B.Moûsai:Text-to-music generation with long-context latent diffusion[J].arXiv:2301.11757,2023. [3]WANG Y,JU Z,TAN X,et al.Audit:Audio editing by following instructions with latent diffusion models[J].arXiv:2304.00830,2023. [4]YUAN Y,LIU H,LIU X,et al.Text-driven foley sound generation with latent diffusion model[J].arXiv:2306.10359,2023. [5]RUAN L,MA Y,YANG H,et al.MM-diffusion:Learningmulti-modal diffusion models for joint audio and video generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:10219-10228. [6]YANG D,YU J,WANG H,et al.:Diffsound:Discrete diffusion model for text-to-sound generation[J].arXiv:2207.09983,2022. [7]LIU H,CHEN Z,YUAN Y,et al.AudioLDM:Text-to-audiogeneration with latent diffusion models[J].arXiv:2301.12503,2023. [8]GHOSAL D,MAJUMDER N,MEHRISH A,et al.Text-to-audio generation using instruction-tuned LLM and latent diffusion model[J].arXiv:2304.13731,2023. [9]SALIMANS T,HO J.Progressive distillation for fast sampling of diffusion models[J].arXiv:2202.00512,2022. [10]HANG T,GU S,LI C,et al.Efficient diffusion training via min-SNR weighting strategy[J].arXiv:2303.09556,2023. [11]CHUNG H W ,HOU L,LONGPRE S,et al.Scaling instruction-finetuned language models[J].arXiv:2210.11416,2022. [12]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [13]OORD A,LI Y,BABUSCHKIN I,et al.Parallel WaveNet:Fast high-fidelity speech synthesis[C]//Proceedings of the International Conference on Machine Learning.2018:3918-3926. [14] MENG C,ROMBACH R,GAO R,et al.On distillation of guided diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14297-14306. [15]CHANG H,ZHANG H,BARBER J,et al.Muse:Text-to-image generation via masked generative transformers[J].arXiv:2301.00704,2023. [16]SONG Y,SOHL-DICKSTEIN J,KINGMA D P,et al.Score-based generative modeling through stochastic differential equations[J].arXiv:2011.13456,2020. [17]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional networks for biomedical image segmentation[C]//18th International Conference Medical Image Computing and Computer-Assisted Intervention(MICCAI 2015).2015:234-241. [18]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010. [19]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [20]KIMC D,KIM B,LEE H,et al.AudioCaps:Generating captions for audios in the wild[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2019:119-132. [21]KILGOUR K,ZULUAGA M,ROBLEK D,et al.Fréchet audio distance:A reference-free metric for evaluating music enhancement algorithms[C]//Proceedings of Interspeech.2019:2350-2354. |
[1] | LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7. |
[2] | SHI Xincheng, WANG Baohui, YU Litao, DU Hui. Study on Segmentation Algorithm of Lower Limb Bone Anatomical Structure Based on 3D CTImages [J]. Computer Science, 2025, 52(6A): 240500119-9. |
[3] | CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7. |
[4] | ZHANG Hang, WEI Shoulin, YIN Jibin. TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240900126-7. |
[5] | CHENG Yan, HE Huijuan, CHEN Yanying, YAO Nannan, LIN Guobo. Study on interpretable Shallow Class Activation Mapping Algorithm Based on Spatial Weights andInter Layer Correlation [J]. Computer Science, 2025, 52(6A): 240500140-7. |
[6] | JIANG Haolun, ZHU Jinxia, MENG Xiangfu. Next Point of Interest Recommendation Incorporating Dynamic Social Relationships [J]. Computer Science, 2025, 52(6A): 240600003-7. |
[7] | ZHENG Chuangrui, DENG Xiuqin, CHEN Lei. Traffic Prediction Model Based on Decoupled Adaptive Dynamic Graph Convolution [J]. Computer Science, 2025, 52(6A): 240400149-8. |
[8] | WANG Chundong, ZHANG Qinghua, FU Haoran. Federated Learning Privacy Protection Method Combining Dataset Distillation [J]. Computer Science, 2025, 52(6A): 240500132-7. |
[9] | LIAO Sirui, HUANG Feihu, ZHAN Pengxiang, PENG Jian, ZHANG Linghao. DCDAD:Differentiated Context Dependency for Time Series Anomaly Detection Method [J]. Computer Science, 2025, 52(6): 106-117. |
[10] | WANG Teng, XIAN Yunting, XU Hao, XIE Songqi, ZOU Quanyi. Ship License Plate Recognition Network Based on Pyramid Transformer in Transformer [J]. Computer Science, 2025, 52(6): 179-186. |
[11] | WEI Xiaohui, GUAN Zeyu, WANG Chenyang, YUE Hengshan, WU Qi. Hardware-Software Co-design Fault-tolerant Strategies for Systolic Array Accelerators [J]. Computer Science, 2025, 52(5): 91-100. |
[12] | WU Pengyuan, FANG Wei. Study on Graph Collaborative Filtering Model Based on FeatureNet Contrastive Learning [J]. Computer Science, 2025, 52(5): 139-148. |
[13] | CONG Yingnan, HAN Linrui, MA Jiayu, ZHU Jinqing. Research on Intelligent Judgment of Criminal Cases Based on Large Language Models [J]. Computer Science, 2025, 52(5): 248-259. |
[14] | LIU Tengfei, CHEN Liyue, FANG Jiangyi, WANG Leye. SCFNet:Fusion Framework of External Spatial Features for Spatio-temporal Prediction [J]. Computer Science, 2025, 52(4): 110-118. |
[15] | ZHOU Yi, MAO Kuanmin. Research on Individual Identification of Cattle Based on YOLO-Unet Combined Network [J]. Computer Science, 2025, 52(4): 194-201. |
|