Computer Science ›› 2022, Vol. 49 ›› Issue (6): 305-312.doi: 10.11896/jsjkx.210500117

• Artificial Intelligence • Previous Articles     Next Articles

Machine Translation Method Integrating New Energy Terminology Knowledge

DONG Zhen-heng1, REN Wei-ping2, YOU Xin-dong1, LYU Xue-qiang1   

  1. 1 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University, Beijing 100101,China
    2 School of Foreign Languages,Beijing Information Science and Technology University,Beijing 100192,China
  • Received:2021-05-17 Revised:2021-12-10 Online:2022-06-15 Published:2022-06-08
  • About author:DONG Zhen-heng,born in 1995,postgraduate.His main research interests include natural language processing and machine translation.
    REN Wei-ping,born in 1962,professor.Her main research interests include applied linguistics and so on.
  • Supported by:
    Natural Science Foundation of Beijing,China(4212020),National Natural Science Foundation of China(61671070),Qin Xin Talents Cultivation Program of Beijing Information Science & Technology University(QXTCPB201908) and Research Planning of Beijing Municipal Commission of Education (KM202111232001).

Abstract: In domain machine translation,whether domain terms can be translated correctly plays a decisive role in translation quality.It is of practical significance to effectively integrate domain terms into neural machine translation model and improve the translation quality of domain terms.This paper proposes a method to integrate the term information in the field of new energy into neural machine translation as a priori knowledge.Taking the term dictionary constructed by the bilingual term knowledge base in the field of new energy as the medium,this paper puts forward and compares two different ways of knowledge integration:1)term replacement,that is,replacing the source term with the target term at the source language end;2)term addition refers to the splicing of source side terms and target side terms at the source language side,the identifier as special external knowledge is used to identify the beginning and end of the target term at both the source language end and the target language end.Experiments are carried out based on the Chinese and English bilingual alignment corpus in the field of new energy and the constructed Chinese and English alignment corpus.The results show that on the test set,the Bleu value of the proposed method is 6.38 and 6.55 higher than that of the baseline experiment respectively,which proves that the proposed method can effectively integrate the domain term knowledge into the translation model and improve the translation quality of domain terms.

Key words: Domain machine translation, Domain terms, Prior knowledge, Special identification, Term append, Term replacement

CLC Number: 

  • TP391
[1] JUNCZYS-DOWMUNT M,DWOJAK T,HOANG H.Is neural machine translation ready for deployment? A case study on 30 translation directions[J].arXiv:1610.01108,2016.
[2] WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[3] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.PMLR,2017:1243-1252.
[4] BRITZ D,GOLDIE A,LUONG M T,et al.Massive exploration of neural machine translation architectures[J].arXiv:1703.03906,2017.
[5] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[6] LIU F,LU H,NEUBIG G.Handling homographs in neural machine translation[J].arXiv:1708.06510,2017.
[7] QIN W J,XIONG D Y.Neural machine translation with rule information[J].Journal of Xiamen University(Natural Science),2020,59(2):185-191.
[8] FENG Y,SHAOCH Z.Review on the frontier of neuralmachinetranslation[J].Journal of Chinese Information Processing,2020,34(7):1-18.
[9] LI Y,XIONG D,ZHANG M.review of neural machine translation[J].Chinese Journal of Computers,2018,41(12):2734-2755.
[10] TANG Y,MENG F,LU Z,et al.Neural machine translationwith external phrase memory[J].arXiv:1606.01792,2016.
[11] ARTHUR P,NEUBIG G,NAKAMURA S.Incorporating discrete-translation lexicons into neural machine translation[J].arXiv:1606.02006,2016.
[12] WANG X,TU Z,XIONG D,et al.Translating phrasesin neural machine translation[J].arXiv:1708.01980,2017.
[13] ZHANG J,LIU Y,LUAN H,et al.Prior Knowledge In-tegration for Neural Machine Translation using Posterior Regularization[J].arXiv:1811.01100,2018.
[14] HAN D,LI J H,ZHOU G D.Neural machine translation based on word translation[J].Journal of Chinese Information Proces-sing,2019,33(7):40-45.
[15] DINU G,MATHUR P,FEDERICO M,et al.Training neural machine translation to apply terminology constraints[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3063-3068.
[16] QIAO B W,LI J H.Neural machine translation with semantic roles[J].Computer Science,2020,47(2):163-168.
[17] CAO Q,XIONG D Y.Fusion method of translation Memory and neural machine translation based on data expansion[J].Journal of Chinese Information Processing,2020,34(5):36-43.
[18] ZHANG T,HUANG H,FENG C,et al.Self-supervised bilingual syntactic alignment for neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021,35(16):14454-14462.
[19] CHEN G,CHEN Y,LI V O K.Lexically constrained neural machine translation with explicit alignment guidance[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:12630-12638.
[20] SUN T,CHEN H T,LV X Q,et al.Research on Term Extraction of New Energy Patent Text[J/OL].Journal of Chinese Computer Systems.[2021-07-16].http://kns.cnki.net/kcms/detail/21.1106.TP.20210511.1556.002.html.
[21] OTT M,EDUNOV S,BAEVSKI A,et al.fairseq:A fast,extensible toolkit for sequence modeling[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:48-53.
[22] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a methodfor automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318.
[1] LIU Xin, YUAN Jia-bin, WANG Tian-xing. Interior Human Action Recognition Method Based on Prior Knowledge of Scene [J]. Computer Science, 2022, 49(1): 225-232.
[2] TIAN Zhen-kun, FU Ying-ying, LIU Su-hong. Remote Sensing Image Classification Based on Heterogeneous Machine Learning Algorithm Fusion [J]. Computer Science, 2019, 46(5): 235-240.
[3] ZHAO Jia-min,FENG Ai-min,CHEN Song-can and PAN Zhi-song. Maximum Constrained Density One-class Classifier [J]. Computer Science, 2014, 41(2): 59-63.
[4] YU Xu,YANG Jing,XIE Zhi-qiang. Research on Virtual Sample Generation Technology [J]. Computer Science, 2011, 38(3): 16-19.
[5] LI Lin-na,CHEN Hai-rui,WANG Ying-long. Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic [J]. Computer Science, 2009, 36(9): 196-200.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!