基于多奖励强化学习的半监督文本风格迁移方法

doi:10.11896/jsjkx.230600184

计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 263-271.doi: 10.11896/jsjkx.230600184

基于多奖励强化学习的半监督文本风格迁移方法

李静文, 叶琪, 阮彤, 林宇翩, 薛万东

华东理工大学计算机科学与工程学院上海 200237

收稿日期:2023-06-22 修回日期:2023-11-20 出版日期:2024-08-15 发布日期:2024-08-13
通讯作者: 阮彤(ruantong@ecust.edu.cn)
作者简介:(y30211022@mail.ecust.edu.cn)
基金资助:
上海市促进产业高质量发展专项资金(2021-GZL-RGZN-01018);国家重点研发计划(2021YFC2701800,2021YFC2701801)

Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning

LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong

School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China

Received:2023-06-22 Revised:2023-11-20 Online:2024-08-15 Published:2024-08-13
About author:LI Jingwen,born in 1998,postgraduate.Her main research interests include text generation and text style transfer.
RUAN Tong,born in 1973,professor,Ph.D supervisor.Her main research interests include medical big data,natural language processing and text generation.
Supported by:
Shanghai Municipal Special Fund for Promoting High-quality Development of Industries(2021-GZL-RGZN-01018) and National Key Research and Development Program of China(2021YFC2701800,2021YFC2701801).

摘要/Abstract

摘要： 文本风格迁移是自然语言处理中的一项重要任务,其主要目的在于改变文本的风格属性,同时保留必要的语义信息。然而,在许多任务缺乏大规模平行语料库的情况下,现有的无监督方法存在文本多样性不足和语义一致性较差的问题。针对这些问题,文中提出了一种半监督的多阶段训练框架。该框架首先利用风格标注模型和掩码语言模型构造伪平行语料库,以有监督的方式引导模型学习多样性的迁移方式。其次,设计了对抗性相似奖励、Mis奖励和风格奖励,从未标记的数据中进行强化学习以增强模型的语义一致性、逻辑一致性和风格准确性。在基于YELP数据集的情感极性转换任务中,该方法的BLEURT分数提升了3.1%,Mis分数提升了2.5%,BLEU分数提升了9.5%;在基于GYAFC数据集的正式文体转换实验中,该方法的BLEURT分数提高了6.2%,BLEU分数提高了3%。

关键词: 文本生成, 文本风格迁移, 多阶段训练, 风格标注模型, 强化学习

Abstract: Text style transfer is an important task in natural language processing that aims to change the stylistic attributes of text while preserving necessary semantic information.However,in many tasks where large-scale parallel corpora are lacking,existing unsupervised methods suffer from issues such as insufficient text diversity and poor semantic consistency.To address these problems,this paper proposes a semi-supervised multi-stage training framework.It first constructs a pseudo-parallel corpus using a style labeling model and a masked language model to guide the model to learn diverse transfer styles in a supervised manner.Then,adversarial similarity reward,Mis reward,and style reward are designed to conduct reinforcement learning on unlabeled data to enhance the model’s semantic consistency,logical consistency,and accuracy of style transfer.In the sentiment polarity conversion task based on the YELP dataset,the proposed method’s BLEURT score increases by 3.1%,the Mis score increases by 2.5%,and the BLEU score increases by 9.5%.In the formal style conversion experiment based on the GYAFC dataset,its BLEURT score increases by 6.2%,and the BLEU score increases by 3%.

Key words: Text generation, Text style transfer, Multi-stage training, Style labeling model, Reinforcement learning

中图分类号:

TP391

李静文, 叶琪, 阮彤, 林宇翩, 薛万东. 基于多奖励强化学习的半监督文本风格迁移方法[J]. 计算机科学, 2024, 51(8): 263-271. https://doi.org/10.11896/jsjkx.230600184

LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning[J]. Computer Science, 2024, 51(8): 263-271. https://doi.org/10.11896/jsjkx.230600184

参考文献

[1]HU Z,LEE R K W,AGGARWAL C C,et al.Text style transfer:A review and experimental evaluation[J].ACM SIGKDD Explorations Newsletter,2022,24(1):14-45.
[2]TOSHEVSKA M,GIEVSKA S.A review of text style transfer using deep learning[J].IEEE Transactions on Artificial Intelligence,2021,3(5):669-684.
[3]JIN D,JIN Z,HU Z,et al.Deep learning for text style transfer:A survey[J].Computational Linguistics,2022,48(1):155-205.
[4]LI J,JIA R,HE H,et al.Delete,retrieve,generate:a simple approach to sentiment and style transfer[J].arXiv:1804.06437,2018.
[5]LYU Y,LIANG P P,PHAM H,et al.StylePTB:A Compositional Benchmark for Fine-grained Controllable Text Style Transfer[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:2116-2138.
[6]KASHYAP A R,HAZARIKA D,KAN M Y,et al.So Different Yet So Alike! Constrained Unsupervised Text Style Transfer[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:416-431.
[7]LIU D,FU J,ZHANG Y,et al.Revision incontinuous space:Unsupervised text style transfer without adversarial learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8376-8383.
[8]RILEY P,CONSTANT N,GUO M,et al.TextSETTR:Few-Shot Text Style Extraction and Tunable Targeted Restyling[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:3786-3800.
[9]NARASIMHAN S,DEY S,DESARKAR M.Towards Robustand Semantically Organised Latent Representations for Unsupervised Text Style Transfer[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2022:456-474.
[10]LUO F,LI P,ZHOU J,et al.A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.International Joint Conferences on Artificial Intelligence Organization,2019.
[11]LAI H,TORAL A,NISSIM M.Generic resources are what you need:Style transfer tasks without task-specific parallel training data[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:4241-4254.
[12]LEE J.Stable Style Transformer:Delete and Generate Approachwith Encoder-Decoder for Text Style Transfer[C]//Proceedings of the 13th International Conference on Natural Language Ge-neration.2020:195-204.
[13]LEE D,TIAN Z,XUE L,et al.Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:93-102.
[14]WANG J,ZHANG R,CHEN J,et al.Text Style Transferringvia Adversarial Masking and Styled Filling[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:7654-7663.
[15]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting of the Association for Computational Linguistics.2002:311-318.
[16]LIN C Y.ROUGE:A Package for Automatic Evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization Branches Out(WAS 2004).2004.
[17]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[18]SELLAM T,DAS D,PARIKH A.BLEURT:Learning Robust Metrics for Text Generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7881-7892.
[19]BABAKOV N,DALE D,LOGACHEVA V,et al.A large-scale computational study of content preservation measures for text style transfer and paraphrase generation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics:Student Research Workshop.2022:300-321.
[20]TOKPO E K,CALDERS T.Text Style Transfer for Bias Mitigation using Masked Language Modeling[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies:Student Research Workshop.2022:163-171.
[21]REID M,ZHONG V.LEWIS:Levenshtein Editing for Unsuper-vised Text Style Transfer[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:3932-3944.
[22]LI Z,QU L,XU Q,et al.Variational autoencoder with disentanglement priors for low-resource task-specific natural language generation[C]//2022 Conference on Empirical Methods in Na-tural Language Processing(EMNLP 2022).Association for Computational Linguistics,2022:10335-10356.
[23]YI X,LIU Z,LI W,et al.Text style transfer via learning style instance supported latent space[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Confe-rences on Artificial Intelligence.2021:3801-3807.
[24]NOURIN.Text Style Transfer via Optimal Transport.[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.Abu Dhabi,United Arab Emi-rates:Association for Computational Linguistics,2022:2532-2541.
[25]DENG M,WANG J,HSIEH C P,et al.RLPrompt:Optimizing Discrete Text Prompts with Reinforcement Learning[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:3369-3391.
[26]LIU Z,CHEN N.Learning from Bootstrapping and StepwiseReinforcement Reward:A Semi-Supervised Framework for Text Style Transfer[C]//Findings of the Association for Computational Linguistics:NAACL 2022.2022:2633-2648.
[27]KRISHNA K,WIETING J,IYYER M.Reformulating Unsupervised Style Transfer as Paraphrase Generation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:737-762.
[28]LAFFERTY J,MCCALLUM A,PEREIRA F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//ICML.2001.
[29]CHEN K J,FEI Z Y,CHEN J Q,et al.A survey on text style transfer[J].Journal of Software,2022,33(12):20.
[30]ZHAO J,KIM Y,ZHANG K,et al.Adversarially regularized autoencoders[C]//International Conference on Machine Lear-ning.PMLR,2018:5902-5911.
[31]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.2014:2672-2680.
[32]HUANG Y,ZHU W,XIONG D,et al.Cycle-Consistent Adversarial Autoencoders for Unsupervised Text Style Transfer[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:2213-2223.
[33]WILLIAMS R J.Simple statistical gradient-following algorithmsfor connectionist reinforcement learning[J].Machine Learning,1992,8:229-256.
[34]RAO S,TETREAULT J.Dear sir or madam,may I introduce the GYAFC dataset:corpus,benchmarks and metrics for forma-lity style transfer[C]//Proceedings of the ACL.2018:129-140.
[35]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.2020:38-45.
[36]KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014.
[37]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[38]HE J,WANG X,NEUBIG G,et al.A Probabilistic Formulationof Unsupervised Text Style Transfer[C]//International Confe-rence on Learning Representations.2019.
[39]XU P,CHEUNG J C K,CAO Y.On variational learning of controllable representations for text without supervision[C]//International Conference on Machine Learning.PMLR,2020:10534-10543.
[40]HUANG F,CHEN Z,WU C H,et al.NAST:A Non-Auto-regressive Generator with Word Alignment for Unsupervised Text Style Transfer[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:1577-1590.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于多奖励强化学习的半监督文本风格迁移方法

Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0