计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230300072-8.doi: 10.11896/jsjkx.230300072

• 人工智能 • 上一篇    下一篇

翻译错误类分布加权的专利译文自动后编辑集成模型

赵三元, 王裴岩, 叶娜, 赵欣瑜, 蔡东风, 张桂平   

  1. 沈阳航空航天大学人机智能研究中心 沈阳 110136
  • 发布日期:2023-11-09
  • 通讯作者: 王裴岩(wangpy@sau.edu.cn)
  • 作者简介:(17393890485@163.com)
  • 基金资助:
    国家自然科学基金(U1908216);教育部人文社会科学研究青年基金(19YJC740107);沈阳市科学技术计划(20-202-1-28)

Automatic Post-editing Ensemble Model of Patent Translation Based on Weighted Distribution of Translation Errors

ZHAO Sanyuan, WANG Peiyan, YE Na, ZHAO Xinyu, CAI Dongfeng, ZHANG Guiping   

  1. Human-Computer Intelligence Research Center,Shenyang Aerospace University,Shenyang 110136,China
  • Published:2023-11-09
  • About author:ZHAO Sanyuan,born in 1997,postgraduate.His main research interests include NLP and machine translation.
    WANG Peiyan,born in 1983,Ph.D,senior engineer,is a member of China Computer Federation.His main research interests include NLP,machine learning and knowledge engineering.
  • Supported by:
    National Natural Science Foundation of China(U1908216),Education of Humanities and Social Science Research on Youth Fund Project(19YJC740107) and Shenyang Science and Technology Plan(20-202-1-28).

摘要: 自动后编辑(APE)是一种自动修改机器译文错误的方法,能够改善机器翻译系统的译文质量。目前,APE研究主要集中于通用领域,然而对于专业性强和译文质量要求较高的专利译文的APE则鲜有研究。文中研究了专利译文自动后编辑,提出了翻译错误类分布加权的专利译文自动后编辑集成模型。首先,提出术语加权翻译编辑率(WTER)计算方法,在翻译编辑率(TER)中加入了每个词的术语概率因子,使术语错误较多的样本WTER值较高。然后,通过WTER从3个机器翻译系统构造的训练数据中选择错译、漏译、增译与移位错误样本子集分别构建错误修正偏向性APE子模型。最后,通过翻译错误类分布加权错误修正偏向性APE子模型。该方法针对专利专业性、强术语较多的特点,每个子模型分别面向一类错误,考虑了错误修正的偏向性,通过模型集成兼顾了译文错误多样性,在英中专利摘要数据集上的实验结果表明,相比3个基线系统,所提方法的BLEU值分别平均提升了2.52,2.28和2.27。

关键词: 自动后编辑, 专利译文, 翻译错误类分布, 集成, 翻译编辑率

Abstract: Automatic post-editing(APE) is a method of automatically modifying errors in machine translation,which can improve the quality of machine translation system.Currently,APE research mainly focuses on general domains.However,there is little research on APE for patent translations,which requires high translation quality due to their strong professionalism.This paper proposes an ensemble model of APE of patent translation based on the weighted distribution of translation errors.Firstly,the term weighted translation edit rate(WTER) calculation method is proposed,which introduces the concept of term probability factor in translation edit rate(TER),and improves the WTER value of samples with more term errors.Then,the proposed WTER model is used to select subsets of mistranslation,missing translation,additional tralslation and shift error samples from the training data constructed by the three machine translation systems to construct the error correction biased APE sub-model,respectively.Finally,the biased APE sub-model is corrected by the weighted distribution of translation errors.The proposed method considers the strong professionalism and numerous technical terms in patent translations.Based on the consideration of error-correction bias,it integrates multiple sub-models to balance the diversity of translation errors.Experimental results on an English-Chinese patent abstract dataset show that,compared with the three baseline systems,the proposed method improves the BLEU values by an average of 2.52,2.28,and 2.27,respectively.

Key words: Automatic post-editing, Patent translation, Distribution of translation errors, Ensemble, Translation edit rate

中图分类号: 

  • TP391
[1]GUAN F X,FEI Y N.Prospect Analysis of Patent Translation in Man-Machine Age [J].China Invention & Patent,2019,16(11):64-67.
[2]SIMARD M,UEFFIFING N,ISABELLE P,et al.Rule-Based Translation with Statistical Phrase-Based Post-Editing[C]//Proceedings of the Second Workshop on Statistical Machine Translation.2007:203-206.
[3]MENG F Y,TANG X R.Efficiency First:Reviewing Technologies of Machine Translation Post-Editing [J].Computer Engineering and Applications,2020,56(22):25-32.
[4]DONG Z H,REN W P,YOU X D,et al.Machine Translation Method Integrating New Energy Terminology Knowledge [J].Computer Science,2022,49(6):305-312.
[5]XU P W,LENG B B.Common Difficulties and Practical Strategies in English Translation of Patent Terms[J].Chinese Science &Technology Translators Journal,2019,32(4):28-31.
[6]SNOVER M G,DORR B J,SCHWARTZ R M,et al.A Study of Translation Edit Rate with Targeted Human Annotation[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas.2006:223-231.
[7]DO CARMO F,SHTERIONOV D,MOORKENS J,et al.A review of the state-of-the-art in automatic post-editing[J].Machine Translation,2020(2):1-43.
[8]SHISH V,NOAM S,NIKI P,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[9]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[J].Advances in Neural Information Processing Systems,2014,20:3104-3112.
[10]CORREIA G M,MARTINS A.A Simple and Effective Ap-proach to Automatic Post-Editing with Transfer Learning[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.2019:3050-3056.
[11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2018.
[12]LEE W K,SHIN J,JUNG B,et al.Noising scheme for data augmentation in automatic post-editing[C]//Proceedings of the Fifth Conference on Machine Translation.2020:783-788.
[13]CAI Z L,YANG M M,XIONG D Y.Data Augmentation for Neural Machine Translation [J].Journal of Chinese Information Processing,2018,32(7):30-36.
[14]MATTEO N,MARCO T,RAJEN C,et al.eSCAPE:a Large-scale Synthetic Corpus for Automatic Post-Editing[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation in Proceedings of LREC,2018.
[15]HANSEN L K,SALAMON P.Neural network ensembles [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2002,12(10):993-1001.
[16]LI B,WANG Q,XIAO T,et al.On Ensemble Learning of Neural Machine Translation [J].Journal of Chinese Information Processing,2019,33(3):42-51.
[17]QIU Y M,YANG N S,LIU Z,et al.Cleaning,Filling and Plugging Device for Glass Bottle or Analogous Container:CN200943034Y[P].2007.
[18]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].Computer Science,2013.
[19]BALCÁZAR J,DAI Y,WATANABE O.A random samplingTechnique for training support vector machines[C]//International Conference on Algorithmic Learning Theory.Berlin,Heidelberg:Springer,2001.
[20]SONG Y,SHI S,LI J,et al.Directional Skip-Gram:ExplicitlyDistinguishing Left and Right Context for Word Embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 2(Short Papers).2018.
[21]KISHORE P,SALIM R,TODD W,et al.Bleu:a Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association forComputa-tional Linguistics.2002:311-318.
[22]KUANGS H,XIONGD Y.The Influence of Different Use of Training Corpus on Neural Machine Translation Model[J].Journal of Chinese Information Processing,2018,32(8):53-59,67.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!