翻译错误类分布加权的专利译文自动后编辑集成模型

doi:10.11896/jsjkx.230300072

Abstract

Abstract: Automatic post-editing(APE) is a method of automatically modifying errors in machine translation,which can improve the quality of machine translation system.Currently,APE research mainly focuses on general domains.However,there is little research on APE for patent translations,which requires high translation quality due to their strong professionalism.This paper proposes an ensemble model of APE of patent translation based on the weighted distribution of translation errors.Firstly,the term weighted translation edit rate(WTER) calculation method is proposed,which introduces the concept of term probability factor in translation edit rate(TER),and improves the WTER value of samples with more term errors.Then,the proposed WTER model is used to select subsets of mistranslation,missing translation,additional tralslation and shift error samples from the training data constructed by the three machine translation systems to construct the error correction biased APE sub-model,respectively.Finally,the biased APE sub-model is corrected by the weighted distribution of translation errors.The proposed method considers the strong professionalism and numerous technical terms in patent translations.Based on the consideration of error-correction bias,it integrates multiple sub-models to balance the diversity of translation errors.Experimental results on an English-Chinese patent abstract dataset show that,compared with the three baseline systems,the proposed method improves the BLEU values by an average of 2.52,2.28,and 2.27,respectively.

Key words: Automatic post-editing, Patent translation, Distribution of translation errors, Ensemble, Translation edit rate

CLC Number:

TP391

ZHAO Sanyuan, WANG Peiyan, YE Na, ZHAO Xinyu, CAI Dongfeng, ZHANG Guiping. Automatic Post-editing Ensemble Model of Patent Translation Based on Weighted Distribution of Translation Errors[J].Computer Science, 2023, 50(11A): 230300072-8.

References

[1]GUAN F X,FEI Y N.Prospect Analysis of Patent Translation in Man-Machine Age [J].China Invention & Patent,2019,16(11):64-67.
[2]SIMARD M,UEFFIFING N,ISABELLE P,et al.Rule-Based Translation with Statistical Phrase-Based Post-Editing[C]//Proceedings of the Second Workshop on Statistical Machine Translation.2007:203-206.
[3]MENG F Y,TANG X R.Efficiency First:Reviewing Technologies of Machine Translation Post-Editing [J].Computer Engineering and Applications,2020,56(22):25-32.
[4]DONG Z H,REN W P,YOU X D,et al.Machine Translation Method Integrating New Energy Terminology Knowledge [J].Computer Science,2022,49(6):305-312.
[5]XU P W,LENG B B.Common Difficulties and Practical Strategies in English Translation of Patent Terms[J].Chinese Science &Technology Translators Journal,2019,32(4):28-31.
[6]SNOVER M G,DORR B J,SCHWARTZ R M,et al.A Study of Translation Edit Rate with Targeted Human Annotation[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas.2006:223-231.
[7]DO CARMO F,SHTERIONOV D,MOORKENS J,et al.A review of the state-of-the-art in automatic post-editing[J].Machine Translation,2020(2):1-43.
[8]SHISH V,NOAM S,NIKI P,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[9]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[J].Advances in Neural Information Processing Systems,2014,20:3104-3112.
[10]CORREIA G M,MARTINS A.A Simple and Effective Ap-proach to Automatic Post-Editing with Transfer Learning[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.2019:3050-3056.
[11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2018.
[12]LEE W K,SHIN J,JUNG B,et al.Noising scheme for data augmentation in automatic post-editing[C]//Proceedings of the Fifth Conference on Machine Translation.2020:783-788.
[13]CAI Z L,YANG M M,XIONG D Y.Data Augmentation for Neural Machine Translation [J].Journal of Chinese Information Processing,2018,32(7):30-36.
[14]MATTEO N,MARCO T,RAJEN C,et al.eSCAPE:a Large-scale Synthetic Corpus for Automatic Post-Editing[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation in Proceedings of LREC,2018.
[15]HANSEN L K,SALAMON P.Neural network ensembles [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2002,12(10):993-1001.
[16]LI B,WANG Q,XIAO T,et al.On Ensemble Learning of Neural Machine Translation [J].Journal of Chinese Information Processing,2019,33(3):42-51.
[17]QIU Y M,YANG N S,LIU Z,et al.Cleaning,Filling and Plugging Device for Glass Bottle or Analogous Container:CN200943034Y[P].2007.
[18]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].Computer Science,2013.
[19]BALCÁZAR J,DAI Y,WATANABE O.A random samplingTechnique for training support vector machines[C]//International Conference on Algorithmic Learning Theory.Berlin,Heidelberg:Springer,2001.
[20]SONG Y,SHI S,LI J,et al.Directional Skip-Gram:ExplicitlyDistinguishing Left and Right Context for Word Embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 2(Short Papers).2018.
[21]KISHORE P,SALIM R,TODD W,et al.Bleu:a Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association forComputa-tional Linguistics.2002:311-318.
[22]KUANGS H,XIONGD Y.The Influence of Different Use of Training Corpus on Neural Machine Translation Model[J].Journal of Chinese Information Processing,2018,32(8):53-59,67.

Related Articles 15

[1]	ZHANG Desheng, CHEN Bo, ZHANG Jianhui, BU Youjun, SUN Chongxin, SUN Jia. Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm [J]. Computer Science, 2023, 50(7): 317-324.
[2]	YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[3]	HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
[4]	WANG Xiaoxiao, BA Jing, CHEN Jianjun, SONG Jingjing, YANG Xibei. Searching Super-reduct:Improvement on Efficiency and Effectiveness [J]. Computer Science, 2023, 50(2): 166-172.
[5]	DING Xuhui, ZHANG Linlin, ZHAO Kai, WANG Xusheng. Android Application Privacy Disclosure Detection Method Based on Static and Dynamic Combination [J]. Computer Science, 2023, 50(10): 327-335.
[6]	YAN Yuanting, MA Yingao, REN Yanping, ZHANG Yanping. Imbalanced Undersampling Based on Constructive Neural Network and Global Density Information [J]. Computer Science, 2023, 50(10): 48-58.
[7]	HE Yulin, ZHU Penghui, HUANG Zhexue, Fournier-Viger PHILIPPE. Classification Uncertainty Minimization-based Semi-supervised Ensemble Learning Algorithm [J]. Computer Science, 2023, 50(10): 88-95.
[8]	YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[9]	LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[10]	KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[11]	WANG Wen-qiang, JIA Xing-xing, LI Peng. Adaptive Ensemble Ordering Algorithm [J]. Computer Science, 2022, 49(6A): 242-246.
[12]	WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[13]	HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[14]	XIA Yuan, ZHAO Yun-long, FAN Qi-lin. Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight [J]. Computer Science, 2022, 49(3): 92-98.
[15]	REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Automatic Post-editing Ensemble Model of Patent Translation Based on Weighted Distribution of Translation Errors

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0