计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 231-238.doi: 10.11896/jsjkx.240800147
李思慧1, 蔡国永2, 蒋航2, 文益民1,3
LI Sihui1, CAI Guoyong2, JIANG Hang2, WEN Yimin1,3
摘要: 扩散语言模型采用的非自回归生成方式能显著提高推理速度,通过迭代重建过程持续优化能提高生成文本质量,因此它在文本生成任务中具有极大潜力。然而,扩散语言模型训练多采用基于极大似然估计的交叉熵损失,即便生成了正确句,也可能因为没有与参考句严格对齐被惩罚,使扩散语言模型面临严重的多模态问题,进而大大降低了文本生成质量。为了缓解多模态问题,提出了一种基于凸损失函数训练的离散扩散语言模型ConvexDiffusion,该模型利用凸函数可以锐化最优分布这一特性,使模型更专注于高概率输出;为了进一步提高文本生成质量,降低生成词的重复率,设计了一种使噪声标记非线性变化的混合感知噪声表,并在解码过程中采用高置信度确定性去噪策略。在机器翻译、问题生成、问题阐述这3类文本生成任务上的实验结果表明,ConvexDiffusion相比现有领先的扩散模型RDM和非自回归模型CMLM等,性能提升了1~7个BLEU,且具有更快的生成速度。特别是在WMT16'EN-RO和WMT14'EN-DE这两个大型数据集上,ConvexDiffusion的表现超越了目前主导文本生成领域的自回归语言模型。
中图分类号:
[1]YANZ H,ZHOU C B,LI X C.A review of research on generative diffusion models[J].Computer Science,2024,51(1):273-283. [2]DEMIRAG Y,LIU D,NIEHUES J.Benchmarking DiffusionModels for Machine Translation[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics:Student Research Workshop.2024:313-324. [3]ZHENG L,YUAN J,YU L,et al.A reparameterized discrete dif-fusion model for text generation[J].arXiv:2302.05737,2023. [4]LI Y,CUI L,YIN Y,et al.Multi-granularity optimization fornon-autoregressive translation[J].arXiv:2210.11017,2022. [5]SHAO C,ZHANG J,ZHOU J,et al.Rephrasing the referencefor non-autoregressive machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13538-13546. [6]GHAZVININEJAD M,KARPUKHIN V,ZETTLEMOYER L,et al.Aligned cross entropy for non-autoregressive machine translation[C]//International Conference on Machine Learning.PMLR,2020:3515-3523. [7]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep unsupervised learning using nonequilibrium thermodynamics[C]//International Conference on Machine Lear-ning.PMLR,2015:2256-2265. [8]HOOGEBOOM E,NIELSEN D,JAINI P,et al.Argmax flowsand multinomial diffusion:Learning categorical distributions[J].Advances in Neural Information Processing Systems,2021,34:12454-12465. [9]AUSTIN J,JOHNSON D D,HO J,et al.Structured denoisingdiffusion models in discrete state-spaces[J].Advances in Neural Information Processing Systems,2021,34:17981-17993. [10]LIN S,LIU B,LI J,et al.Common diffusion noise schedules and sample steps are flawed[C]//Proceedings of the IEEE/CVF Winter Confe-rence on Applications of Computer Vision.2024:5404-5411. [11]GHAZVININEJAD M,LEVY O,LIU Y,et al.Mask-predict:Parallel decoding of conditional masked language models[C]//Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proces-sing.2019:6112-6121. [12]SHAO C,MA Z,ZHANG M,et al.Beyond MLE:convex learning for text generation[J].Advances in Neural Information Processing Systems,2023,36:8913-8936. [13]GONG S,LI M,FENG J,et al.Diffuseq:Sequence to sequence text generation with diffusion models[J].arXiv:2210.08933,2022. [14]CETTOLO M,NIEHUES J,STÜKER S,et al.Report on the11th IWSLT evaluation campaign[C]//Proceedings of the 11th International Workshop on Spoken Language Translation:Eva-luation Campaign.2014:2-17. [15]BOJAR O,CHATTERJEE R,FEDERMANN C,et al.Findings of the 2016 conference on machine translation(wmt16)[C]//First Conference on Machine Translation.Association for Computational Linguistics.2016:131-198. [16]BOJAR O,BUCK C,FEDERMANN C,et al.Findings of the 2014 workshop on statistical machine translation[C]//Procee-dings of the 9th Workshop on Statistical Machine Translation.2014:12-58. [17]HUANG X S,PEREZ F,VOLKOVS M.Improving non-autoregressive translation models without distillation[C]//International Conference on Learning Representations.2022. [18]HUANG S,DONG L,WANG W,et al.Language is not all you need:Aligning perception with language models[M]//Advances in Neural Information Processing Systems.2024. [19]KASAI J,CROSS J,GHAZVININEJAD M,et al.Non-autore-gressive machine translation with disentangled context transformer[C]//International Conference on Machine Learning.PMLR,2020:5144-5155. [20]DIELEMAN S,SARTRAN L,ROSHANNAI A,et al.Continuous diffusion for categorical data[J].arXiv:2211.15089,2022. [21]AUSTIN J,JOHNSON D D,HO J,et al.Structured denoising diffusion models in discrete state-spaces[J].Advances in Neural Information Processing Systems,2021,34:17981-17993. [22]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on in Neural Information Processing Systems.2017:6000-6010. [23]PAPINENIK,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002:311-318. [24]DHINGRA B,MAZAITIS K,COHEN W W.Quasar:Datasets for question answering by search and reading[J].arXiv:1707.03904,2017. [25]SHARMA L,GRAESSER L,NANGIA N,et al.Natural lan-guage understanding with the quora question pairs dataset[J].arXiv:1907.01041,2019. [26]ZHANG B,XIONG D,SU J.A GRU-gated attention model for neural machine translation[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(11):4688-4698. [27]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring thelimits of transfer learning with a unified text-to-text transfor-mer[J].Journal of Machine Learning Research,2020,21(140):1-67. [28]GU J,WANG C,ZHAO J.Levenshtein transformer[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2019:11181-11191. [29]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.ACL,2004:74-81. [30]ZHANG T,KISHORE V,WU F,et al.Bertscore:Evaluatingtext generation with bert[J].arXiv:1904.09675,2019. [31]DESHPANDE A,ANEJA J,WANG L,et al.Fast,diverse and accurate image captioning guided by part-of-speech[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10695-10704. |
|