基于多奖励强化学习的半监督文本风格迁移方法

doi:10.11896/jsjkx.230600184

Abstract

Abstract: Text style transfer is an important task in natural language processing that aims to change the stylistic attributes of text while preserving necessary semantic information.However,in many tasks where large-scale parallel corpora are lacking,existing unsupervised methods suffer from issues such as insufficient text diversity and poor semantic consistency.To address these problems,this paper proposes a semi-supervised multi-stage training framework.It first constructs a pseudo-parallel corpus using a style labeling model and a masked language model to guide the model to learn diverse transfer styles in a supervised manner.Then,adversarial similarity reward,Mis reward,and style reward are designed to conduct reinforcement learning on unlabeled data to enhance the model’s semantic consistency,logical consistency,and accuracy of style transfer.In the sentiment polarity conversion task based on the YELP dataset,the proposed method’s BLEURT score increases by 3.1%,the Mis score increases by 2.5%,and the BLEU score increases by 9.5%.In the formal style conversion experiment based on the GYAFC dataset,its BLEURT score increases by 6.2%,and the BLEU score increases by 3%.

Key words: Text generation, Text style transfer, Multi-stage training, Style labeling model, Reinforcement learning

CLC Number:

TP391

LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning[J].Computer Science, 2024, 51(8): 263-271.

References

[1]HU Z,LEE R K W,AGGARWAL C C,et al.Text style transfer:A review and experimental evaluation[J].ACM SIGKDD Explorations Newsletter,2022,24(1):14-45.
[2]TOSHEVSKA M,GIEVSKA S.A review of text style transfer using deep learning[J].IEEE Transactions on Artificial Intelligence,2021,3(5):669-684.
[3]JIN D,JIN Z,HU Z,et al.Deep learning for text style transfer:A survey[J].Computational Linguistics,2022,48(1):155-205.
[4]LI J,JIA R,HE H,et al.Delete,retrieve,generate:a simple approach to sentiment and style transfer[J].arXiv:1804.06437,2018.
[5]LYU Y,LIANG P P,PHAM H,et al.StylePTB:A Compositional Benchmark for Fine-grained Controllable Text Style Transfer[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:2116-2138.
[6]KASHYAP A R,HAZARIKA D,KAN M Y,et al.So Different Yet So Alike! Constrained Unsupervised Text Style Transfer[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:416-431.
[7]LIU D,FU J,ZHANG Y,et al.Revision incontinuous space:Unsupervised text style transfer without adversarial learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8376-8383.
[8]RILEY P,CONSTANT N,GUO M,et al.TextSETTR:Few-Shot Text Style Extraction and Tunable Targeted Restyling[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:3786-3800.
[9]NARASIMHAN S,DEY S,DESARKAR M.Towards Robustand Semantically Organised Latent Representations for Unsupervised Text Style Transfer[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2022:456-474.
[10]LUO F,LI P,ZHOU J,et al.A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.International Joint Conferences on Artificial Intelligence Organization,2019.
[11]LAI H,TORAL A,NISSIM M.Generic resources are what you need:Style transfer tasks without task-specific parallel training data[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:4241-4254.
[12]LEE J.Stable Style Transformer:Delete and Generate Approachwith Encoder-Decoder for Text Style Transfer[C]//Proceedings of the 13th International Conference on Natural Language Ge-neration.2020:195-204.
[13]LEE D,TIAN Z,XUE L,et al.Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:93-102.
[14]WANG J,ZHANG R,CHEN J,et al.Text Style Transferringvia Adversarial Masking and Styled Filling[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:7654-7663.
[15]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting of the Association for Computational Linguistics.2002:311-318.
[16]LIN C Y.ROUGE:A Package for Automatic Evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization Branches Out(WAS 2004).2004.
[17]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[18]SELLAM T,DAS D,PARIKH A.BLEURT:Learning Robust Metrics for Text Generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7881-7892.
[19]BABAKOV N,DALE D,LOGACHEVA V,et al.A large-scale computational study of content preservation measures for text style transfer and paraphrase generation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics:Student Research Workshop.2022:300-321.
[20]TOKPO E K,CALDERS T.Text Style Transfer for Bias Mitigation using Masked Language Modeling[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies:Student Research Workshop.2022:163-171.
[21]REID M,ZHONG V.LEWIS:Levenshtein Editing for Unsuper-vised Text Style Transfer[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:3932-3944.
[22]LI Z,QU L,XU Q,et al.Variational autoencoder with disentanglement priors for low-resource task-specific natural language generation[C]//2022 Conference on Empirical Methods in Na-tural Language Processing(EMNLP 2022).Association for Computational Linguistics,2022:10335-10356.
[23]YI X,LIU Z,LI W,et al.Text style transfer via learning style instance supported latent space[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Confe-rences on Artificial Intelligence.2021:3801-3807.
[24]NOURIN.Text Style Transfer via Optimal Transport.[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.Abu Dhabi,United Arab Emi-rates:Association for Computational Linguistics,2022:2532-2541.
[25]DENG M,WANG J,HSIEH C P,et al.RLPrompt:Optimizing Discrete Text Prompts with Reinforcement Learning[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:3369-3391.
[26]LIU Z,CHEN N.Learning from Bootstrapping and StepwiseReinforcement Reward:A Semi-Supervised Framework for Text Style Transfer[C]//Findings of the Association for Computational Linguistics:NAACL 2022.2022:2633-2648.
[27]KRISHNA K,WIETING J,IYYER M.Reformulating Unsupervised Style Transfer as Paraphrase Generation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:737-762.
[28]LAFFERTY J,MCCALLUM A,PEREIRA F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//ICML.2001.
[29]CHEN K J,FEI Z Y,CHEN J Q,et al.A survey on text style transfer[J].Journal of Software,2022,33(12):20.
[30]ZHAO J,KIM Y,ZHANG K,et al.Adversarially regularized autoencoders[C]//International Conference on Machine Lear-ning.PMLR,2018:5902-5911.
[31]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.2014:2672-2680.
[32]HUANG Y,ZHU W,XIONG D,et al.Cycle-Consistent Adversarial Autoencoders for Unsupervised Text Style Transfer[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:2213-2223.
[33]WILLIAMS R J.Simple statistical gradient-following algorithmsfor connectionist reinforcement learning[J].Machine Learning,1992,8:229-256.
[34]RAO S,TETREAULT J.Dear sir or madam,may I introduce the GYAFC dataset:corpus,benchmarks and metrics for forma-lity style transfer[C]//Proceedings of the ACL.2018:129-140.
[35]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.2020:38-45.
[36]KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014.
[37]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[38]HE J,WANG X,NEUBIG G,et al.A Probabilistic Formulationof Unsupervised Text Style Transfer[C]//International Confe-rence on Learning Representations.2019.
[39]XU P,CHEUNG J C K,CAO Y.On variational learning of controllable representations for text without supervision[C]//International Conference on Machine Learning.PMLR,2020:10534-10543.
[40]HUANG F,CHEN Z,WU C H,et al.NAST:A Non-Auto-regressive Generator with Word Alignment for Unsupervised Text Style Transfer[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:1577-1590.

Related Articles 15

[1]	ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[2]	WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326.
[3]	GUI Haitao, WANG Zhongqing. Personalized Dialogue Response Generation Combined with Conversation State Information [J]. Computer Science, 2024, 51(6A): 230800055-7.
[4]	HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[5]	LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[6]	GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[7]	ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5.
[8]	WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[9]	XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.
[10]	YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[11]	SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290.
[12]	ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352.
[13]	WANG Yan, WANG Tianjing, SHEN Hang, BAI Guangwei. Optimal Penetration Path Generation Based on Maximum Entropy Reinforcement Learning [J]. Computer Science, 2024, 51(3): 360-367.
[14]	WANG Yao, LUO Junren, ZHOU Yanzhong, GU Xueqiang, ZHANG Wanpeng. Review of Reinforcement Learning and Evolutionary Computation Methods for StrategyExploration [J]. Computer Science, 2024, 51(3): 183-197.
[15]	LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0