计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200007-9.doi: 10.11896/jsjkx.250200007
王雪妮, 叶娜, 张桂平
WANG Xueni, YE Na, ZHANG Guiping
摘要: 译文质量估计(QE)指的是在无参考译文的情况下评估机器译文的质量。现有的QE系统在通用领域中表现良好,但在包含大量专业术语的特定领域(如工程、医学、法律)中表现不佳,因为其侧重于评估原文和译文的语义相似度,缺乏对专业术语翻译偏差的敏感性。为了解决这一问题,提出基于跨语言术语注意力机制的译文质量估计方法。首先,设计提示模板来指导GPT完成双语术语的识别;其次,使用句子编码模块得到句子表示,再通过显式融合双语术语信息得到增强型句子表示;然后,利用交叉注意力机制生成双语的跨语言表示,并计算其语义相似度值作为术语特征;最后,在QE模型中引入知识增强层(KEL),将术语特征与模型输出的神经特征进行融合,通过前馈神经网络处理,得到模型的预测分数。在英中工程文献数据集上的实验结果表明,所提方法与当前最先进的基线方法相比,主要指标Spearman相关系数提高3.77个百分点,辅助指标Pearson相关系数提高3.07个百分点,Kendall相关系数提高4.45个百分点。
中图分类号:
| [1]SPECIAL L,RAJ D,TURCHI M.Machine translation evaluation versus quality estimation[J].Machine Translation,2010,24(1):39-50. [2]RANASINGHE T,ORĂSAN C,MITKOV R.TransQuest:Translation Quality Estimation with Cross-lingual Transfor-mers[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:5070-5081. [3]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale[J].arXiv:1911.02116,2019. [4]REI R,STEWART C,FARINHA A C,et al.COMET:A Neural Framework for MT Evaluation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2020:2685-2702. [5]ZERVA C,BLAIN F,REI R,et al.Findings of the WMT 2022 Shared Task on Quality Estimation[C]//Proceedings of the Seventh Conference on Machine Translation(WMT).2022:69-99. [6]FREITAG M,REI R,MATHUR N,et al.Results of theWMT21 metrics shared task:Evaluating metrics with expert-based human evaluations on TED and news domain[C]//Proceedings of the Sixth Conference on Machine Translation.2021:733-774. [7]RUO L,HE X,LI M.A review of research on sentence level machine translation quality estimation [J].Intelligence Engineering,2022,8(2):34-50. [8]SPECIA L,SHAH K,DE SOUZA J G C,et al.QuEst-A translation quality estimation framework[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics:System Demonstrations.2013:79-84. [9]HARDMEIER C,NIVRE J,TIEDEMANN J.Tree kernels formachine translation quality estimation [C]//Seventh Workshop on Statistical Machine Translation,Montréal,Canada,June 7-8,2012.Association for Computational Linguistics,2012:109-113. [10]RUBINO R,FOSTER J,WAGNER J,et al.DCU-Symantecsubmission for the WMT 2012 Quality Estimation Task [C]//Proceedings of the Seventh Workshop on Statistical Machine Translation.2012:138-144. [11]SPECIA L,GIMÉNEZ J.Combining confidence estimation and reference-based metrics for segment-level MT evaluation [C]//The Ninth Conference of the Association for Machine Translation in the Americas.2010. [12]DENG H,XIONG D.A Review of Quality Estimation in Ma-chine Translation Translation [J].Chinese Journal of Information Science,2022,36(11):20-37. [13]SHAH K,BOUGARES F,BARRAULT L,et al.Shef-lium-nn:Sentence level quality estimation with neural network features [C]//Proceedings of the First Conference on Machine Translation:Volume 2,Shared Task Papers.2016:838-842. [14]SCHWENK H.Continuous space translation models for phrase-based statistical machine translation [C]//Proceedings of COLING 2012:Posters.2012:1071-1080. [15]CHEN Z,LI M,WANG M.Sentence level translation quality estimation based on neural network features [J].Computer Research and Development,2017,54(8):1804-1812. [16]YE N,LI T,CAI D,et al.Improving Neural Translation Quality Estimation by Using Dependency Syntax Relationships [J].Chinese Journal of Information Science,2021,35(9):46-57. [17]KIM H,LEE J H.A recurrent neural networks approach for estimating the quality of machine translation output [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:494-498. [18]SUN X,ZHU C,ZHAO T.Machine Translation Quality Estimation Algorithm Integrating Translation Knowledge [J].Intelligent Computers and Applications,2019,9(2):279-283. [19]LI P,LI M,QIU B,et al.Research on Translation QualityEstimation Method Integrating BERT Context Word Vectors [J].Chinese Journal of Information Science,2020,34(3):56-63. [20]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding [J].arXiv:1810.04805,2018. [21]KIM H,LEE J H,NA S H.Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation [C]//Proceedings of the Second Conference on Machine Translation.2017:562-568. [22]LIU L,FUJITA A,UTIYAMA M,et al.Translation quality estimation using only bilingual corpora [J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2017,25(9):1762-1772. [23]LI S,BI X,LIU T,et al.Information Dropping Data Augmentation for Machine Translation Quality Estimation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2024,32:2112-2124. [24]WU H,YANG M,WANG J,et al.Target oriented data generation for quality estimation of machine translation [C]//Natural Language Processing and Chinese Computing:8th CCF International Conference(NLPCC 2019).Springer International Publishing,2019:393-405. [25]KEPLER F,TRÉNOZ J,TRIVISO M,et al.Unbabel’s Partici-pation in the WMT19 Translation Quality Estimation Shared Task [J].arXiv:1907.10352,2019. [26]SNOVER M,DORR B,SCHWARTZ R,et al.A study of translation edit rate with targeted human annotation [C]//Procee-dings of the 7th Conference of the Association for Machine Translation in the Americas:Technical Papers.2006:223-231. [27]YE N,LI J.A k-Nearest Neighbor Approach for Domain-Speci-fic Translation Quality Estimation [C]//China Conference on Machine Translation.Singapore:Springer Nature Singapore,2023:69-80. [28]DE SOUZA J G C,TURCHI M,NEGRI M.Machine translation quality estimation across domains [C]//The 25th International Conference on Computational Linguistics:Technical Papers(COLING 2014).2014:409-420. [29]DE SOUZA J G C,NEGRI M,RICCI E,et al.Online multitask learning for machine translation quality estimation [C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2015:219-228. [30]SHARAMI J P R,SHTERIONOV D,BLAIN F,et al.Tailoring Domain Adaptation for Machine Translation Quality Estimation [C]//Proceedings of the 24th Annual Conference of the European Association for Machine Translation.2023:9-20. [31]WEI X,ZHANG T,LI Y,et al.Multi-modality Cross Attention Network for Image and Sentence Matching [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10941-10950. [32]LI J,SELVARAJU R R,GOTMARE A D,et al.Align beforeFuse:Vision and Language Representation Learning with Momentum Distillation[J].arXiv:2107.07651,2021. [33]LAI S,YANG Z,MENG F,et al.Cross-Align:Modeling Deep Cross-lingual Interactions for Word Alignment[C]//Procee-dings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:3715-3725. [34]OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[J].arXiv:2203.02155,2022. [35]AMRHEIN C,SENNRICH R.Identifying weaknesses in ma-chine translation metrics through minimum Bayes risk decoding:A case study for COMET [J].arXiv:2202.05148,2022. [36]GLUSHKOVA T,ZERVA C,MARTINS A F T.BLEU Meets COMET:Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation [C]//Proceedings of the 24th Annual Conference of the European Association for Machine Translation.2023:47-58. [37]GUO L,WANG L,DANG J,et al.A feature fusion method based on extreme learning machine for speech emotion recognition [C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2018).IEEE,2018:2666-2670. [38]DING N,TIAN S,YU L.A multimodal fusion method for sarcasm detection based on late fusion [J].Multimedia Tools and Applications,2022,81(6):8597-8616. [39]MOURA J,VERA M,VAN STIGT D,et al.Ist-unbabel partici-pation in the WMT20 quality estimation shared task [C]//Proceedings of the Fifth Conference on Machine Translation.2020:1029-1036. [40]SHIMANAKA H,KAJIWARA T,KOMACHI M.RUSE:Regressor using sentence embeddings for automatic machine translation evaluation [C]//Proceedings of the Third Conference on Machine Translation:Shared Task Papers.2018:751-758. [41]HUANG H,DI H,LI C,et al.BJTU-Toshiba’s Submission toWMT22 Quality Estimation Shared Task [C]//Proceedings of the Seventh Conference on Machine Translation(WMT).2022:621-626. [42]LAMPLE G,CONNEAU A.Cross-lingual language model pretraining [J].arXiv:1901.07291,2019. [43]TANG Y,TRAN C,LI X,et al.Multilingual translation with extensible multilingual pretraining and finetuning [J].arXiv:2008.00401,2020. [44]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.2020:38-45. [45]GENG X,ZHANG Y,HUANG S,et al.Njunlp’s participation for the wmt2022 quality estimation shared task[C]//Procee-dings of the Seventh Conference on Machine Translation(WMT).2022:615-620. |
|
||