计算机科学 ›› 2017, Vol. 44 ›› Issue (12): 216-220.doi: 10.11896/j.issn.1002-137X.2017.12.039

• 人工智能 • 上一篇    下一篇

自纠正词对齐

龚慧敏,段湘煜,张民   

  1. 苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金:面向统计机器翻译的同步短语树结构规约机制研究(61273319)资助

Self-correction of Word Alignments

GONG Hui-min, DUAN Xiang-yu and ZHANG Min   

  • Online:2018-12-01 Published:2018-12-01

摘要: 词对齐是统计机器翻译系统的重要一环,但词对齐的获得往往基于序列模型的计算,而没有考虑语言的结构化信息及语言特征,从而造成词对齐中出现一些不符合语言特征的结果。文中提出一种词对齐的自纠正机制,以纠正词对齐中的错误部分。该机制使用一些语言学上的先验知识,对词对齐结果进行由粗颗粒度到细颗粒度的纠正。首先采用基于标点的方法对句对进行粗粒度化纠正,然后采用基于统计特征的方法对子句对进行细粒度化纠正。该自纠正过程不需要借助任何其他词对齐工具和新语料。实验结果显示,自纠正词对齐显著提高了词对齐的准确率,并提高了机器翻译的质量,其中粗粒度的纠正方法对翻译质量的提高最为显著,细粒度的纠正方法也提升了翻译质量,最终通过结合粗颗粒度和细颗粒度的纠正方法,使翻译结果相对基准系统取得了显著的提高。

关键词: 自纠正,词对齐,粗颗粒度到细颗粒度

Abstract: Word alignment is an important part of statistical machine translation systems.Previous works obtain word alignment through sequential models,which do not take into account the structure information and linguistic features of the language,leading to bad word alignments violating linguistic characteristics.This paper proposed a novel self-correction method for word alignments,aiming to correct the alignment errors which violate linguistic characteristics by exploiting linguistic prior knowledge.First,we conducted a coarse correction on short alignments obtained by binary segmentation based on punctuation method.Second,we proposed a fine-grained correction method for each short alignment based on statistical features.Third,corrected short alignments were merged to original alignments.This process does not rely on any third-party word aligner and additional parallel corpus.Experimental results show that our method significantly improves the accuracy machine translation results.

Key words: Self-correction,Word alignment,Coarse-to-fine

[1] KOEHN P,OCH F J,MARCU D.Statistical phrase-basedtranslation[C]∥Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Association for Computational Linguistics,2003:127-133.
[2] LIU Y,LIU Q,LIN S.Tree-to-string alignment template for statistical machine translation[C]∥International Conference on Computational Linguistics and,Meeting of the Association for Computational Linguistics(ACL 2006).Sydney,2006:609-616.
[3] GALLEY M,GRAEHL J,KNIGH K,et al.Scalable inference and training of context-rich syntactic translation models[C]∥International Conference on Computational Linguistics and the,Meeting of the Association for Computational Linguistics.2012:961-968.
[4] CHIANG D.Hierarchical Phrase-Based Translation[J].Computational Linguistics,2007,33(2):201-228.
[5] BROWN P F,PIETRA V J D,PIETRA S A D,et al.The ma-thematics of statistical machine translation:parameter estimation[J].Computational Linguistics,1993,19(2):263-311.
[6] LIANG P,TASKAR B,KLEIN D.Alignment by agreement[C]∥North American Association for Computational Linguistics (NAACL).2006.
[7] XU J,ZENS R,NEY H.Partitioning parallel documents using binary segmentation[C]∥The Workshop on Statistical Machine Translation.Association for Computational Linguistics,2006:78-85.
[8] BLUNSOM P,COHN T,GOLDWATER S,et al.A Note on the Implementation of Hierarchical Dirichlet Processes[C]∥ International Joint Conference on Natural Language Processing of the Afnlp.DBLP,2009:337-340.
[9] GAO Q,VOGEL S.Parallel implementations of word alignment tool[C]∥Association for Computational Linguistics.2008:49-57.
[10] STOLCKE A.SRILM-an extensible language modeling toolkit[C]∥Proceedings of the 7th International Conference on Spoken Language Processing.2002:901-905.
[11] OCH F J,NEY H.A systematic comparison of various statistical alignment models[J].Computational Linguistics,2003,29(1):19-51.
[12] OCH F J.Minimum error rate training in statistical machinetranslation[C]∥ Meeting on Association for Computational Liguistics.1973:160-167.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!