Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241200164-10.doi: 10.11896/jsjkx.241200164
• Image Processing & Multimedia Technology • Previous Articles Next Articles
WANG Zhongyuan, WANG Baoshan, WANG Yongjun, YUAN Tianhao
CLC Number:
| [1]YUE Q,ZHANG C K.Application of AIGC in multimodal scenarios:A survey[J].Journal of Frontiers of Computer Science and Technology,2025,19(1):79-96. [2]BROWN T B.Language models are few-shot learners[C]//Neural Information Processing Systems.2020:1877-1901. [3]ZHANG X,ZHANG P,SHEN Y,et al.A Systematic Literature Review of Empirical Research on Applying Generative Artificial Intelligence in Education[J].Frontiers of Digital Education,2024,1(3):223-245. [4]WANG X,ZHAO J,MAROSTICA E,et al.A pathology foundation model for cancer diagnosis and prognosis prediction[J].Nature,2024,634(8035):970-978. [5]SHEN B,ZHANG J,CHEN T,et al.Pangu-coder2:Boostinglarge Language models for code with ranking feedback[J].ar-Xiv:2307.14936,2023. [6]YANG L N,LIU C S,LIU L L.Intelligent extraction model ofunstructured text key information based on blockchain technology[J].Information Technology,2024(2):154-159,165. [7]OpenAI.Video generation models as world simulators[EB/OL].https://openai.com/index/video-generation-models-as-world-simulators,2024. [8]LI C,HUANG D,LU Z,et al.A survey on long video generation:Challenges,methods,and prospects[J].arXiv:2403.16407,2024. [9]LEI W,WANG J,MA F,et al.A Comprehensive Survey on Human Video Generation:Challenges,Methods,and Insights[J].arXiv:2407.08428,2024. [10]HOCHREITER S.Long Short-term Memory[J].Neural Com-putation,1997,9:1735-1780. [11]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,30. [12]WANG Z.Optimization Study of Political News Information Extraction Based on the OneIE Model[D].Beijing:China Agricultural University,2024. [13]MA X,WANG Y,JIA G,et al.Latte:Latent diffusion trans-former for video generation[J].arXiv:2401.03048,2024. [14]VENUGOPALAN S,ROHRBACH M,DONAHUE J,et al.Sequence to sequence-video to text[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4534-4542. [15]GUPTA A,YU L,SOHN K,et al.Photorealistic video generation with diffusion models[C]//European Conference on Computer Vision.Cham:Springer,2025:393-411. [16]WANG X,ZHU Z,HUANG G,et al.Worlddreamer:Towardsgeneral world models for video generation via predicting masked tokens[J].arXiv:2401.09985,2024. [17]BECK M,PÖPPEL K,SPANRING M,et al.xLSTM:Extended Long Short-Term Memory[J].arXiv:2405.04517,2024. [18]ALKIN B,BECK M,PÖPPEL K,et al.Vision-LSTM:xLSTM as Generic Vision Backbone[J].arXiv:2406.04303,2024. [19]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Gene-rative adversarial nets[J].Advances in Neural Information Proces-sing Systems,2014,27. [20]YU L,ZHANG W,WANG J,et al.Seqgan:Sequence generative adversarial nets with policy gradient[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017. [21]TERO K,SAMULI L,MIIKA A,et al.Analyzing and improving the image quality of stylegan[C]//CVPR.2020:8110-8119. [22]KONG J,KIM J,BAE J.Hifi-gan:Generative adversarial networks for efficient and high fidelity speech synthesis[J].Advances in Neural Information Processing Systems,2020,33:17022-17033. [23]KARRAS T.Progressive Growing of GANs for Improved Quality,Stability,and Variation[C]//International Conference on Learning Representations.2018. [24]ALDAUSARI N,SOWMYA A,MARCUS N,et al.Video gene-rative adversarial networks:a review[J].ACM Computing Surveys(CSUR),2022,55(2):1-25. [25]VONDRICK C,PIRSIAVASH H,TORRALBA A.Generating videos with scene dynamics[J].Advances in Neural Information Processing Systems,2016,29. [26]SAITO M,MATSUMOTO E,SAITO S.Temporal generativeadversarial nets with singular value clipping[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2830-2839. [27]ZHANG Q,YANG C,SHEN Y,et al.Towards smooth video composition[C]//The Eleventh International Conference on Learning Representations(ICLR).2023. [28]KINGMA D P.Auto-encoding variational bayes[C]//International Conference on Learning Representations.2014. [29]LIN S,CLARK R,BIRKE R,et al.Anomaly detection for time series using vae-lstm hybrid model[C]//2020 IEEE InternationalConference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:4322-4326. [30]FENG L,WANG C,WU T,et al.Dimensionality ReductionMethod for Manifold Learning Based on Variational Autoencoder[J].Journal of Computer-Aided Design & Computer Graphics,2025,37(3):439. [31]CHEN L,LI Z,LIN B,et al.Od-vae:An omni-dimensional video compressor for improving latent video diffusion model[J].ar-Xiv:2409.01199,2024. [32]LEE Y,JEON J,YU J,et al.Context-aware multi-task learning for traffic scene recognition in autonomous vehicles[C]//2020 IEEE Intelligent Vehicles Symposium(IV).IEEE,2020:723-730. [33]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep unsupervised learning using nonequilibrium thermodynamics[C]//International Conference on Machine Lear-ning.PMLR,2015:2256-2265. [34]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [35]BETKER J,GOH G,JING L,et al.Improving image generation with better captions[J].Computer Science,2023,2(3):8. [36]HO J,SALIMANS T,GRITSENKO A,et al.Video diffusionmodels[J].Advances in Neural Information Processing Systems,2022,35:8633-8646. [37]YANG L,ZHANG Z,SONG Y,et al.Diffusion models:A comprehensive survey of methods and applications[J].ACM Computing Surveys,2023,56(4):1-39. [38]CEYLAN D,HUANG C H P,MITRA N J.Pix2video:Videoediting using image diffusion[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:23206-23217. [39]KHACHATRYAN L,MOVSISYAN A,TADEVOSYAN V,et al.Text2video-zero:Text-to-image diffusion models are zero-shot video generators[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:15954-15964. [40]MIRZA M.Conditional generative adversarial nets[C]//Neural Information Processing Systems.2014. [41]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-imagetranslation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232. [42]TULYAKOV S,LIU M Y,YANG X,et al.Mocogan:Decomposing motion and content for video generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1526-1535. [43]WANG T C,LIU M Y,ZHU J Y,et al.Video-to-video synthesis[C]//Neural Information Processing Systems.2018. [44]CLARK A,DONAHUE J,SIMONYAN K.Adversarial videogeneration on complex datasets[C]//International Conference on Learning Representations.2019. [45]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4401-4410. [46]WU J,HUANG Z,ACHARYA D,et al.Sliced wasserstein gene-rative models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3713-3722. [47]KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8110-8119. [48]KARRAS T,AITTALA M,LAINE S,et al.Alias-free generative adversarial networks[J].Advances in Neural Information Processing Systems,2021,34:852-863. [49]LI T,CHANG H,MISHRA S,et al.Mage:Masked generativeencoder to unify representation learning and image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2142-2152. [50]XU Y,PARK T,ZHANG R,et al.VideoGigaGAN:TowardsDetail-rich Video Super-Resolution[J].arXiv:2404.12388,2024. [51]VAN DEN OORD A,VINYALS O.Neural discrete representation learning[J].Advances in Neural Information Processing Systems,2017,30. [52]HE J,LEHRMANN A,MARINO J,et al.Probabilistic videogeneration using holistic attribute control[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:452-467. [53]HARVEY W,NADERIPARIZI S,MASRANI V,et al.Flexible diffusion modeling of long videos[J].Advances in Neural Information Processing Systems,2022,35:27953-27965. [54]YANG R,SRIVASTAVA P,MANDT S.Diffusion probabilistic modeling for video generation[J].Entropy,2023,25(10):1469. [55]WU Z,HU J,LU W,et al.Slotdiffusion:Object-centric generative modeling with diffusion models[J].Advances in Neural Information Processing Systems,2023,36:50932-50958. [56]PEEBLES W,XIE S.Scalable diffusion models with transfor-mers[C]//Proceedings of the IEEE/CVF International Con-ference on Computer Vision.2023:4195-4205. [57]BAR-TAL O,CHEFER H,TOV O,et al.Lumiere:A space-time diffusion model for video generation[C]//SIGGRAPH Asia 2024 Conference.2024. [58]BLATTMANN A,ROMBACH R,LING H,et al.Align your la-tents:High-resolution video synthesis with latent diffusion mo-dels[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2023:22563-22575. [59]CHEN H,XIA M,HE Y,et al.Videocrafter1:Open diffusionmodels for high-quality video generation[J].arXiv:2310.19512,2023. [60]ZHANG D J,WU J Z,LIU J W,et al.Show-1:Marrying pixel and latent diffusion models for text-to-video generation[J].International Journal of Computer Vision,2024,133(4):1879-1893. [61]BLATTMANN A,DOCKHORN T,KULAL S,et al.Stablevideo diffusion:Scaling latent video diffusion models to large datasets[J].arXiv:2311.15127,2023. [62]HUANG Z,HE Y,YU J,et al.Vbench:Comprehensive benchmark suite for video generative models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:21807-21818. [63]HUANG Z,ZHANG F,XU X,et al.VBench++:Comprehensive and Versatile Benchmark Suite for Video Generative Models[J].arXiv:2411.13503,2024. [64]LIU Y,ZHANG K,LI Y,et al.Sora:A review on background,technology,limitations,and opportunities of large vision models[J].arXiv:2402.17177,2024. [65]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//18th International Conference Medical Image Computing and Computer-assisted Intervention(MICCAI 2015).Munich,Germany,Springer International Publishing,2015:234-241. [66]POLYAK A,ZOHAR A,BROWN A,et al.Movie gen:A cast of media foundation models[J].arXiv:2410.13720,2024. [67]DUBEY A,JAUHRI A,PANDEY A,et al.The llama 3 herd of models[J].arXiv:2407.21783,2024. [68]ROMBACH R,BLATTMANN A,LORENZ D,et al.High-reso-lution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10684-10695. [69]AI P.Pika is the idea-to-video platform that sets your creativity in motion[EB/OL].https://pika.art/home. [70]BAO F,XIANG C,YUE G,et al.Vidu:a highly consistent,dynamic and skilled text-to-video generator with diffusion models[J].arXiv:2405.04233,2024. [71]TIAN Y,YANG L,YANG H,et al.VideoTetris:TowardsCompositional Text-to-Video Generation [C]//Neural Information Processing Systems.2024. [72]YANG Z,TENG J,ZHENG W,et al.Cogvideox:Text-to-video diffusion models with an expert transformer[J].arXiv:2408.06072,2024. [73]HUANG L,CHEN D,LIU Y,et al.Composer:Creative andcontrollable image synthesis with composable conditions[C]//International Conference on Learning Representations(ICLR).2023. [74]HONG X,ZHANG H.LSTM-CBAM-based audio and videosynchronization face video generation[J].Intelligent Computer and Applications,2023,13(5):151-155. [75]TANG Z,YANG Z,ZHU C,et al.Any-to-any generation viacomposable diffusion[J].Advances in Neural Information Processing Systems,2024,36. [76]PERRAULT R,CLARK J.Artificial Intelligence Index Report 2024[R].2024. [77]ZHENG Z,LV J,WANG L,et al.Cross-scale systematic lear-ning for social big data:theory and methods[J].Scientia Sinica(Informationis),2024,54(9):2083-2097. [78]WANG D,YU Y,YAO S,et al.Construction of generative artificial intelligence security assessment system[C]//Proceedings of the Artificial Intelligence Security Governance Theme Forum of the 2024 World Intelligent Industry Expo.2024. [79]LI X,HU Y,WANG M,et al.A Review of AI-generated Content Research:Applications,Risks,and Governance[J].Library and Information Service,2024,68(17):136-149. [80]HUANG X,LIU H,YAN X.The Employment Impact of Generative AI and Policy Responses[J].Contemporary Economy & Management,2025,47(4):73-87. |
| [1] | YUAN Tianhao, WANG Yongjun, WANG Baoshan, WANG Zhongyuan. Review of Artificial Intelligence Generated Content Applications in Natural Language Processing [J]. Computer Science, 2025, 52(11A): 241200156-12. |
| [2] | XU Jun, ZHOU Peijin, ZHANG Haijing, ZHANG Hao, XU Yuzhong. Analysis of User Evaluation Indicator for AIGC Digital Illustration Design Principles [J]. Computer Science, 2024, 51(11): 47-53. |
| [3] | SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78. |
| [4] | GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70. |
| [5] | . Comparison on Covering-based Rough Set Models [J]. Computer Science, 2012, 39(7): 229-231. |
|
||