计算机科学 ›› 2017, Vol. 44 ›› Issue (12): 216-220.doi: 10.11896/j.issn.1002-137X.2017.12.039
龚慧敏,段湘煜,张民
GONG Hui-min, DUAN Xiang-yu and ZHANG Min
摘要: 词对齐是统计机器翻译系统的重要一环,但词对齐的获得往往基于序列模型的计算,而没有考虑语言的结构化信息及语言特征,从而造成词对齐中出现一些不符合语言特征的结果。文中提出一种词对齐的自纠正机制,以纠正词对齐中的错误部分。该机制使用一些语言学上的先验知识,对词对齐结果进行由粗颗粒度到细颗粒度的纠正。首先采用基于标点的方法对句对进行粗粒度化纠正,然后采用基于统计特征的方法对子句对进行细粒度化纠正。该自纠正过程不需要借助任何其他词对齐工具和新语料。实验结果显示,自纠正词对齐显著提高了词对齐的准确率,并提高了机器翻译的质量,其中粗粒度的纠正方法对翻译质量的提高最为显著,细粒度的纠正方法也提升了翻译质量,最终通过结合粗颗粒度和细颗粒度的纠正方法,使翻译结果相对基准系统取得了显著的提高。
[1] KOEHN P,OCH F J,MARCU D.Statistical phrase-basedtranslation[C]∥Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Association for Computational Linguistics,2003:127-133. [2] LIU Y,LIU Q,LIN S.Tree-to-string alignment template for statistical machine translation[C]∥International Conference on Computational Linguistics and,Meeting of the Association for Computational Linguistics(ACL 2006).Sydney,2006:609-616. [3] GALLEY M,GRAEHL J,KNIGH K,et al.Scalable inference and training of context-rich syntactic translation models[C]∥International Conference on Computational Linguistics and the,Meeting of the Association for Computational Linguistics.2012:961-968. [4] CHIANG D.Hierarchical Phrase-Based Translation[J].Computational Linguistics,2007,33(2):201-228. [5] BROWN P F,PIETRA V J D,PIETRA S A D,et al.The ma-thematics of statistical machine translation:parameter estimation[J].Computational Linguistics,1993,19(2):263-311. [6] LIANG P,TASKAR B,KLEIN D.Alignment by agreement[C]∥North American Association for Computational Linguistics (NAACL).2006. [7] XU J,ZENS R,NEY H.Partitioning parallel documents using binary segmentation[C]∥The Workshop on Statistical Machine Translation.Association for Computational Linguistics,2006:78-85. [8] BLUNSOM P,COHN T,GOLDWATER S,et al.A Note on the Implementation of Hierarchical Dirichlet Processes[C]∥ International Joint Conference on Natural Language Processing of the Afnlp.DBLP,2009:337-340. [9] GAO Q,VOGEL S.Parallel implementations of word alignment tool[C]∥Association for Computational Linguistics.2008:49-57. [10] STOLCKE A.SRILM-an extensible language modeling toolkit[C]∥Proceedings of the 7th International Conference on Spoken Language Processing.2002:901-905. [11] OCH F J,NEY H.A systematic comparison of various statistical alignment models[J].Computational Linguistics,2003,29(1):19-51. [12] OCH F J.Minimum error rate training in statistical machinetranslation[C]∥ Meeting on Association for Computational Liguistics.1973:160-167. |
No related articles found! |
|