计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230300072-8.doi: 10.11896/jsjkx.230300072
赵三元, 王裴岩, 叶娜, 赵欣瑜, 蔡东风, 张桂平
ZHAO Sanyuan, WANG Peiyan, YE Na, ZHAO Xinyu, CAI Dongfeng, ZHANG Guiping
摘要: 自动后编辑(APE)是一种自动修改机器译文错误的方法,能够改善机器翻译系统的译文质量。目前,APE研究主要集中于通用领域,然而对于专业性强和译文质量要求较高的专利译文的APE则鲜有研究。文中研究了专利译文自动后编辑,提出了翻译错误类分布加权的专利译文自动后编辑集成模型。首先,提出术语加权翻译编辑率(WTER)计算方法,在翻译编辑率(TER)中加入了每个词的术语概率因子,使术语错误较多的样本WTER值较高。然后,通过WTER从3个机器翻译系统构造的训练数据中选择错译、漏译、增译与移位错误样本子集分别构建错误修正偏向性APE子模型。最后,通过翻译错误类分布加权错误修正偏向性APE子模型。该方法针对专利专业性、强术语较多的特点,每个子模型分别面向一类错误,考虑了错误修正的偏向性,通过模型集成兼顾了译文错误多样性,在英中专利摘要数据集上的实验结果表明,相比3个基线系统,所提方法的BLEU值分别平均提升了2.52,2.28和2.27。
中图分类号:
[1]GUAN F X,FEI Y N.Prospect Analysis of Patent Translation in Man-Machine Age [J].China Invention & Patent,2019,16(11):64-67. [2]SIMARD M,UEFFIFING N,ISABELLE P,et al.Rule-Based Translation with Statistical Phrase-Based Post-Editing[C]//Proceedings of the Second Workshop on Statistical Machine Translation.2007:203-206. [3]MENG F Y,TANG X R.Efficiency First:Reviewing Technologies of Machine Translation Post-Editing [J].Computer Engineering and Applications,2020,56(22):25-32. [4]DONG Z H,REN W P,YOU X D,et al.Machine Translation Method Integrating New Energy Terminology Knowledge [J].Computer Science,2022,49(6):305-312. [5]XU P W,LENG B B.Common Difficulties and Practical Strategies in English Translation of Patent Terms[J].Chinese Science &Technology Translators Journal,2019,32(4):28-31. [6]SNOVER M G,DORR B J,SCHWARTZ R M,et al.A Study of Translation Edit Rate with Targeted Human Annotation[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas.2006:223-231. [7]DO CARMO F,SHTERIONOV D,MOORKENS J,et al.A review of the state-of-the-art in automatic post-editing[J].Machine Translation,2020(2):1-43. [8]SHISH V,NOAM S,NIKI P,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [9]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[J].Advances in Neural Information Processing Systems,2014,20:3104-3112. [10]CORREIA G M,MARTINS A.A Simple and Effective Ap-proach to Automatic Post-Editing with Transfer Learning[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.2019:3050-3056. [11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2018. [12]LEE W K,SHIN J,JUNG B,et al.Noising scheme for data augmentation in automatic post-editing[C]//Proceedings of the Fifth Conference on Machine Translation.2020:783-788. [13]CAI Z L,YANG M M,XIONG D Y.Data Augmentation for Neural Machine Translation [J].Journal of Chinese Information Processing,2018,32(7):30-36. [14]MATTEO N,MARCO T,RAJEN C,et al.eSCAPE:a Large-scale Synthetic Corpus for Automatic Post-Editing[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation in Proceedings of LREC,2018. [15]HANSEN L K,SALAMON P.Neural network ensembles [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2002,12(10):993-1001. [16]LI B,WANG Q,XIAO T,et al.On Ensemble Learning of Neural Machine Translation [J].Journal of Chinese Information Processing,2019,33(3):42-51. [17]QIU Y M,YANG N S,LIU Z,et al.Cleaning,Filling and Plugging Device for Glass Bottle or Analogous Container:CN200943034Y[P].2007. [18]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].Computer Science,2013. [19]BALCÁZAR J,DAI Y,WATANABE O.A random samplingTechnique for training support vector machines[C]//International Conference on Algorithmic Learning Theory.Berlin,Heidelberg:Springer,2001. [20]SONG Y,SHI S,LI J,et al.Directional Skip-Gram:ExplicitlyDistinguishing Left and Right Context for Word Embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 2(Short Papers).2018. [21]KISHORE P,SALIM R,TODD W,et al.Bleu:a Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association forComputa-tional Linguistics.2002:311-318. [22]KUANGS H,XIONGD Y.The Influence of Different Use of Training Corpus on Neural Machine Translation Model[J].Journal of Chinese Information Processing,2018,32(8):53-59,67. |
|