Computer Science ›› 2025, Vol. 52 ›› Issue (10): 231-238.doi: 10.11896/jsjkx.240800147

• Artificial Intelligence • Previous Articles     Next Articles

Novel Discrete Diffusion Text Generation Model with Convex Loss Function

LI Sihui1, CAI Guoyong2, JIANG Hang2, WEN Yimin1,3   

  1. 1 College of Computer and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China
    2 Key Laboratory of Guangxi Trusted Software,Guilin,Guangxi 541004,China
    3 Guangxi Key Laboratory of Culture and Tourism Smart Technology,Guilin Tourism University,Guilin,Guangxi 541006,China
  • Received:2024-08-27 Revised:2024-11-17 Online:2025-10-15 Published:2025-10-14
  • About author:LI Sihui,born in 1999,postgraduate,is a member of CCF(No.W1626G).Her main research interests include natural language processing and non-autoregressive language modelling.
    CAI Guoyong,born in 1971,Ph.D,professor,Ph.D supervisor,is a distinguished member of CCF(No.12524D).His main research interests include multimodal affective computing,trus-table AI theory and techniques.
  • Supported by:
    National Natural Science Foundation of China(62366010) and Key R&D Program of Guangxi(AB21220023).

Abstract: Diffusion language models adopt a non-autoregressive generation approach that improves inference speed.Additionally,continuous refinement through an iterative refinement enhances the quality of the generated text,making it promising for text generation tasks.However,since diffusion language models are often trained using cross-entropy loss based on maximum likelihood estimation,even if the model generates a correct sentence,it may be penalized for not strictly aligning with the reference sentence,resulting in a serious multimodality problem,significantly reducing the quality of text generation.To alleviate the multimodality problem,a discrete diffusion language model ConvexDiffusion based on convex loss function training is proposed.The mo-del leverages the property of convex functions to sharpen the optimal distribution so that the model focuses more on high-probability outputs.To further improve the quality and reduce the repetition rate of generated words,a hybrid-aware noise schedule that enabled the noise labelling to vary non-linearly is designed,along with a high-confidence deterministic denoising strategy employed during the decoding process.Experimental results on the three text generation tasks-machine translation,question gene-ration,and question paraphrasing demonstrate that ConvexDiffusion achieves a performance improvement of 1~7 BLEU points and faster generation speed compared to leading diffusion models such as RDM and non-autoregressive models like CMLM.Especially on two large datasets,WMT16 EN-RO and WMT14 EN-DE,ConvexDiffusion surpasses the leading autoregressive models in text generation.

Key words: Diffusion model,Text generation,Multimodality problem,Loss function,Convex loss function

CLC Number: 

  • TP391
[1]YANZ H,ZHOU C B,LI X C.A review of research on generative diffusion models[J].Computer Science,2024,51(1):273-283.
[2]DEMIRAG Y,LIU D,NIEHUES J.Benchmarking DiffusionModels for Machine Translation[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics:Student Research Workshop.2024:313-324.
[3]ZHENG L,YUAN J,YU L,et al.A reparameterized discrete dif-fusion model for text generation[J].arXiv:2302.05737,2023.
[4]LI Y,CUI L,YIN Y,et al.Multi-granularity optimization fornon-autoregressive translation[J].arXiv:2210.11017,2022.
[5]SHAO C,ZHANG J,ZHOU J,et al.Rephrasing the referencefor non-autoregressive machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13538-13546.
[6]GHAZVININEJAD M,KARPUKHIN V,ZETTLEMOYER L,et al.Aligned cross entropy for non-autoregressive machine translation[C]//International Conference on Machine Learning.PMLR,2020:3515-3523.
[7]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep unsupervised learning using nonequilibrium thermodynamics[C]//International Conference on Machine Lear-ning.PMLR,2015:2256-2265.
[8]HOOGEBOOM E,NIELSEN D,JAINI P,et al.Argmax flowsand multinomial diffusion:Learning categorical distributions[J].Advances in Neural Information Processing Systems,2021,34:12454-12465.
[9]AUSTIN J,JOHNSON D D,HO J,et al.Structured denoisingdiffusion models in discrete state-spaces[J].Advances in Neural Information Processing Systems,2021,34:17981-17993.
[10]LIN S,LIU B,LI J,et al.Common diffusion noise schedules and sample steps are flawed[C]//Proceedings of the IEEE/CVF Winter Confe-rence on Applications of Computer Vision.2024:5404-5411.
[11]GHAZVININEJAD M,LEVY O,LIU Y,et al.Mask-predict:Parallel decoding of conditional masked language models[C]//Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proces-sing.2019:6112-6121.
[12]SHAO C,MA Z,ZHANG M,et al.Beyond MLE:convex learning for text generation[J].Advances in Neural Information Processing Systems,2023,36:8913-8936.
[13]GONG S,LI M,FENG J,et al.Diffuseq:Sequence to sequence text generation with diffusion models[J].arXiv:2210.08933,2022.
[14]CETTOLO M,NIEHUES J,STÜKER S,et al.Report on the11th IWSLT evaluation campaign[C]//Proceedings of the 11th International Workshop on Spoken Language Translation:Eva-luation Campaign.2014:2-17.
[15]BOJAR O,CHATTERJEE R,FEDERMANN C,et al.Findings of the 2016 conference on machine translation(wmt16)[C]//First Conference on Machine Translation.Association for Computational Linguistics.2016:131-198.
[16]BOJAR O,BUCK C,FEDERMANN C,et al.Findings of the 2014 workshop on statistical machine translation[C]//Procee-dings of the 9th Workshop on Statistical Machine Translation.2014:12-58.
[17]HUANG X S,PEREZ F,VOLKOVS M.Improving non-autoregressive translation models without distillation[C]//International Conference on Learning Representations.2022.
[18]HUANG S,DONG L,WANG W,et al.Language is not all you need:Aligning perception with language models[M]//Advances in Neural Information Processing Systems.2024.
[19]KASAI J,CROSS J,GHAZVININEJAD M,et al.Non-autore-gressive machine translation with disentangled context transformer[C]//International Conference on Machine Learning.PMLR,2020:5144-5155.
[20]DIELEMAN S,SARTRAN L,ROSHANNAI A,et al.Continuous diffusion for categorical data[J].arXiv:2211.15089,2022.
[21]AUSTIN J,JOHNSON D D,HO J,et al.Structured denoising diffusion models in discrete state-spaces[J].Advances in Neural Information Processing Systems,2021,34:17981-17993.
[22]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on in Neural Information Processing Systems.2017:6000-6010.
[23]PAPINENIK,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318.
[24]DHINGRA B,MAZAITIS K,COHEN W W.Quasar:Datasets for question answering by search and reading[J].arXiv:1707.03904,2017.
[25]SHARMA L,GRAESSER L,NANGIA N,et al.Natural lan-guage understanding with the quora question pairs dataset[J].arXiv:1907.01041,2019.
[26]ZHANG B,XIONG D,SU J.A GRU-gated attention model for neural machine translation[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(11):4688-4698.
[27]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring thelimits of transfer learning with a unified text-to-text transfor-mer[J].Journal of Machine Learning Research,2020,21(140):1-67.
[28]GU J,WANG C,ZHAO J.Levenshtein transformer[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2019:11181-11191.
[29]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.ACL,2004:74-81.
[30]ZHANG T,KISHORE V,WU F,et al.Bertscore:Evaluatingtext generation with bert[J].arXiv:1904.09675,2019.
[31]DESHPANDE A,ANEJA J,WANG L,et al.Fast,diverse and accurate image captioning guided by part-of-speech[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10695-10704.
[1] WANG Baocai, WU Guowei. Interpretable Credit Risk Assessment Model:Rule Extraction Approach Based on AttentionMechanism [J]. Computer Science, 2025, 52(10): 50-59.
[2] ZHENG Hanyuan, GE Rongjun, HE Shengji, LI Nan. Direct PET to CT Attenuation Correction Algorithm Based on Imaging Slice Continuity [J]. Computer Science, 2025, 52(10): 115-122.
[3] XU Hengyu, CHEN Kun, XU Lin, SUN Mingzhai, LU Zhou. SAM-Retina:Arteriovenous Segmentation in Dual-modal Retinal Image Based on SAM [J]. Computer Science, 2025, 52(10): 123-133.
[4] WEN Jing, ZHANG Songsong, LI Xufeng. Target Tracking Method Based on Cross Scale Fusion of Features and Trajectory Prompts [J]. Computer Science, 2025, 52(10): 144-150.
[5] SHENG Xiaomeng, ZHAO Junli, WANG Guodong, WANG Yang. Immediate Generation Algorithm of High-fidelity Head Avatars Based on NeRF [J]. Computer Science, 2025, 52(10): 159-167.
[6] ZHENG Dichen, HE Jikai, LIU Yi, GAO Fan, ZHANG Dengyin. Low Light Image Adaptive Enhancement Algorithm Based on Retinex Theory [J]. Computer Science, 2025, 52(10): 168-175.
[7] RUAN Ning, LI Chun, MA Haoyue, JIA Yi, LI Tao. Review of Quantum-inspired Metaheuristic Algorithms and Its Applications [J]. Computer Science, 2025, 52(10): 190-200.
[8] XIONG Zhuozhi, GU Zhouhong, FENG Hongwei, XIAO Yanghua. Subject Knowledge Evaluation Method for Language Models Based on Multiple ChoiceQuestions [J]. Computer Science, 2025, 52(10): 201-207.
[9] WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory [J]. Computer Science, 2025, 52(10): 208-216.
[10] CHEN Yuyan, JIA Jiyuan, CHANG Jingwen, ZUO Kaiwen, XIAO Yanghua. SPEAKSMART:Evaluating Empathetic Persuasive Responses by Large Language Models [J]. Computer Science, 2025, 52(10): 217-230.
[11] ZHANG Jiawei, WANG Zhongqing, CHEN Jiali. Multi-grained Sentiment Analysis of Comments Based on Text Generation [J]. Computer Science, 2025, 52(10): 239-246.
[12] CHEN Jiahao, DUAN Liguo, CHANG Xuanwei, LI Aiping, CUI Juanjuan, HAO Yuanbin. Text Sentiment Classification Method Based on Large-batch Adversarial Strategy and EnhancedFeature Extraction [J]. Computer Science, 2025, 52(10): 247-257.
[13] WANG Ye, WANG Zhongqing. Text Simplification for Aspect-based Sentiment Analysis Based on Large Language Model [J]. Computer Science, 2025, 52(10): 258-265.
[14] ZHAO Jinshuang, HUANG Degen. Summary Faithfulness Evaluation Based on Data Augmentation and Two-stage Training [J]. Computer Science, 2025, 52(10): 266-274.
[15] SUN Liangxu, LI Linlin, LIU Guoli. Sub-problem Effectiveness Guided Multi-objective Evolution Algorithm [J]. Computer Science, 2025, 52(10): 296-307.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!