计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 231-238.doi: 10.11896/jsjkx.240800147

• 人工智能 • 上一篇    下一篇

一种新的基于凸损失函数的离散扩散文本生成模型

李思慧1, 蔡国永2, 蒋航2, 文益民1,3   

  1. 1 桂林电子科技大学计算机与信息安全学院 广西 桂林 541004
    2 广西可信软件重点实验室 广西 桂林 541004
    3 桂林旅游学院广西文化和旅游智慧技术重点实验室 广西 桂林 541006
  • 收稿日期:2024-08-27 修回日期:2024-11-17 出版日期:2025-10-15 发布日期:2025-10-14
  • 通讯作者: 蔡国永(ccgycai@guet.edu.cn)
  • 作者简介:(22032201020@mails.guet.edu.cn)
  • 基金资助:
    国家自然科学基金(62366010);广西重点研发计划(桂科AB21220023)

Novel Discrete Diffusion Text Generation Model with Convex Loss Function

LI Sihui1, CAI Guoyong2, JIANG Hang2, WEN Yimin1,3   

  1. 1 College of Computer and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China
    2 Key Laboratory of Guangxi Trusted Software,Guilin,Guangxi 541004,China
    3 Guangxi Key Laboratory of Culture and Tourism Smart Technology,Guilin Tourism University,Guilin,Guangxi 541006,China
  • Received:2024-08-27 Revised:2024-11-17 Online:2025-10-15 Published:2025-10-14
  • About author:LI Sihui,born in 1999,postgraduate,is a member of CCF(No.W1626G).Her main research interests include natural language processing and non-autoregressive language modelling.
    CAI Guoyong,born in 1971,Ph.D,professor,Ph.D supervisor,is a distinguished member of CCF(No.12524D).His main research interests include multimodal affective computing,trus-table AI theory and techniques.
  • Supported by:
    National Natural Science Foundation of China(62366010) and Key R&D Program of Guangxi(AB21220023).

摘要: 扩散语言模型采用的非自回归生成方式能显著提高推理速度,通过迭代重建过程持续优化能提高生成文本质量,因此它在文本生成任务中具有极大潜力。然而,扩散语言模型训练多采用基于极大似然估计的交叉熵损失,即便生成了正确句,也可能因为没有与参考句严格对齐被惩罚,使扩散语言模型面临严重的多模态问题,进而大大降低了文本生成质量。为了缓解多模态问题,提出了一种基于凸损失函数训练的离散扩散语言模型ConvexDiffusion,该模型利用凸函数可以锐化最优分布这一特性,使模型更专注于高概率输出;为了进一步提高文本生成质量,降低生成词的重复率,设计了一种使噪声标记非线性变化的混合感知噪声表,并在解码过程中采用高置信度确定性去噪策略。在机器翻译、问题生成、问题阐述这3类文本生成任务上的实验结果表明,ConvexDiffusion相比现有领先的扩散模型RDM和非自回归模型CMLM等,性能提升了1~7个BLEU,且具有更快的生成速度。特别是在WMT16'EN-RO和WMT14'EN-DE这两个大型数据集上,ConvexDiffusion的表现超越了目前主导文本生成领域的自回归语言模型。

关键词: 扩散模型, 文本生成, 多模态问题, 损失函数, 凸损失函数

Abstract: Diffusion language models adopt a non-autoregressive generation approach that improves inference speed.Additionally,continuous refinement through an iterative refinement enhances the quality of the generated text,making it promising for text generation tasks.However,since diffusion language models are often trained using cross-entropy loss based on maximum likelihood estimation,even if the model generates a correct sentence,it may be penalized for not strictly aligning with the reference sentence,resulting in a serious multimodality problem,significantly reducing the quality of text generation.To alleviate the multimodality problem,a discrete diffusion language model ConvexDiffusion based on convex loss function training is proposed.The mo-del leverages the property of convex functions to sharpen the optimal distribution so that the model focuses more on high-probability outputs.To further improve the quality and reduce the repetition rate of generated words,a hybrid-aware noise schedule that enabled the noise labelling to vary non-linearly is designed,along with a high-confidence deterministic denoising strategy employed during the decoding process.Experimental results on the three text generation tasks-machine translation,question gene-ration,and question paraphrasing demonstrate that ConvexDiffusion achieves a performance improvement of 1~7 BLEU points and faster generation speed compared to leading diffusion models such as RDM and non-autoregressive models like CMLM.Especially on two large datasets,WMT16 EN-RO and WMT14 EN-DE,ConvexDiffusion surpasses the leading autoregressive models in text generation.

Key words: Diffusion model,Text generation,Multimodality problem,Loss function,Convex loss function

中图分类号: 

  • TP391
[1]YANZ H,ZHOU C B,LI X C.A review of research on generative diffusion models[J].Computer Science,2024,51(1):273-283.
[2]DEMIRAG Y,LIU D,NIEHUES J.Benchmarking DiffusionModels for Machine Translation[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics:Student Research Workshop.2024:313-324.
[3]ZHENG L,YUAN J,YU L,et al.A reparameterized discrete dif-fusion model for text generation[J].arXiv:2302.05737,2023.
[4]LI Y,CUI L,YIN Y,et al.Multi-granularity optimization fornon-autoregressive translation[J].arXiv:2210.11017,2022.
[5]SHAO C,ZHANG J,ZHOU J,et al.Rephrasing the referencefor non-autoregressive machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13538-13546.
[6]GHAZVININEJAD M,KARPUKHIN V,ZETTLEMOYER L,et al.Aligned cross entropy for non-autoregressive machine translation[C]//International Conference on Machine Learning.PMLR,2020:3515-3523.
[7]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep unsupervised learning using nonequilibrium thermodynamics[C]//International Conference on Machine Lear-ning.PMLR,2015:2256-2265.
[8]HOOGEBOOM E,NIELSEN D,JAINI P,et al.Argmax flowsand multinomial diffusion:Learning categorical distributions[J].Advances in Neural Information Processing Systems,2021,34:12454-12465.
[9]AUSTIN J,JOHNSON D D,HO J,et al.Structured denoisingdiffusion models in discrete state-spaces[J].Advances in Neural Information Processing Systems,2021,34:17981-17993.
[10]LIN S,LIU B,LI J,et al.Common diffusion noise schedules and sample steps are flawed[C]//Proceedings of the IEEE/CVF Winter Confe-rence on Applications of Computer Vision.2024:5404-5411.
[11]GHAZVININEJAD M,LEVY O,LIU Y,et al.Mask-predict:Parallel decoding of conditional masked language models[C]//Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proces-sing.2019:6112-6121.
[12]SHAO C,MA Z,ZHANG M,et al.Beyond MLE:convex learning for text generation[J].Advances in Neural Information Processing Systems,2023,36:8913-8936.
[13]GONG S,LI M,FENG J,et al.Diffuseq:Sequence to sequence text generation with diffusion models[J].arXiv:2210.08933,2022.
[14]CETTOLO M,NIEHUES J,STÜKER S,et al.Report on the11th IWSLT evaluation campaign[C]//Proceedings of the 11th International Workshop on Spoken Language Translation:Eva-luation Campaign.2014:2-17.
[15]BOJAR O,CHATTERJEE R,FEDERMANN C,et al.Findings of the 2016 conference on machine translation(wmt16)[C]//First Conference on Machine Translation.Association for Computational Linguistics.2016:131-198.
[16]BOJAR O,BUCK C,FEDERMANN C,et al.Findings of the 2014 workshop on statistical machine translation[C]//Procee-dings of the 9th Workshop on Statistical Machine Translation.2014:12-58.
[17]HUANG X S,PEREZ F,VOLKOVS M.Improving non-autoregressive translation models without distillation[C]//International Conference on Learning Representations.2022.
[18]HUANG S,DONG L,WANG W,et al.Language is not all you need:Aligning perception with language models[M]//Advances in Neural Information Processing Systems.2024.
[19]KASAI J,CROSS J,GHAZVININEJAD M,et al.Non-autore-gressive machine translation with disentangled context transformer[C]//International Conference on Machine Learning.PMLR,2020:5144-5155.
[20]DIELEMAN S,SARTRAN L,ROSHANNAI A,et al.Continuous diffusion for categorical data[J].arXiv:2211.15089,2022.
[21]AUSTIN J,JOHNSON D D,HO J,et al.Structured denoising diffusion models in discrete state-spaces[J].Advances in Neural Information Processing Systems,2021,34:17981-17993.
[22]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on in Neural Information Processing Systems.2017:6000-6010.
[23]PAPINENIK,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318.
[24]DHINGRA B,MAZAITIS K,COHEN W W.Quasar:Datasets for question answering by search and reading[J].arXiv:1707.03904,2017.
[25]SHARMA L,GRAESSER L,NANGIA N,et al.Natural lan-guage understanding with the quora question pairs dataset[J].arXiv:1907.01041,2019.
[26]ZHANG B,XIONG D,SU J.A GRU-gated attention model for neural machine translation[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(11):4688-4698.
[27]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring thelimits of transfer learning with a unified text-to-text transfor-mer[J].Journal of Machine Learning Research,2020,21(140):1-67.
[28]GU J,WANG C,ZHAO J.Levenshtein transformer[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2019:11181-11191.
[29]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.ACL,2004:74-81.
[30]ZHANG T,KISHORE V,WU F,et al.Bertscore:Evaluatingtext generation with bert[J].arXiv:1904.09675,2019.
[31]DESHPANDE A,ANEJA J,WANG L,et al.Fast,diverse and accurate image captioning guided by part-of-speech[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10695-10704.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!