计算机科学 ›› 2010, Vol. 37 ›› Issue (4): 215-.

• 人工智能 • 上一篇    下一篇

多策略汉维句子对齐

田生伟,吐尔根·依布拉音,禹龙,加米拉·吾守尔,杨飞宇   

  1. (新疆大学信息科学与工程学院 乌鲁木齐830046);(新疆大学网络中心 乌鲁木齐830046);(新疆大学国际文化交流学院 乌鲁木齐830046)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(60663006,60963017),新疆维吾尔自治区高等学校科学研究计划(XJEDU2009I05)资助。

Chinese-Uyhur Sentence Alignment Based on Hybrid Strategy

TIAN Sheng-wei,TURGUN Ibrahim,YU Long,JAMILA Wushouer,YANG Fei-yu   

  • Online:2018-12-01 Published:2018-12-01

摘要: 提出了一种错误抑制的多策略算法对齐汉维语句子。针对长度对齐算法无法避免错误蔓延的特点,提出了一种新的错误蔓延抑制策略:利用双语语料的词汇共现信息,自动抽取汉维语词汇搭配,结合句子长度特征,寻找1:1模式的句对作为锚点,将错误蔓延抑制在锚点内;在锚点之间,利用标点符号和长度混合方法进行句子对齐。算法实验结果验证了该多策略算法寻找的锚点的精度高,有效抑制了对齐错误的蔓延;采用的混合对齐算法,避免了基于词汇对齐算法的高时间复杂度的弱点,比传统的对齐算法性能有了较大提高,对齐准确率由95. 0%提高到97. 6%

关键词: 双语语料,错误抑制,句子对齐,混合策略,汉维句子

Abstract: This paper proposed a hybrid algorithm of sentence alignment in Chinese-Uyhur parallel corpora. Aiming at the shortcoming of mistake spread in alignment algorithm based on length, this paper presented a new kind of suppression strategy for mistake spread. By using csentence length and ChinescUyhur correspondence information, the anchor points with 1:1 pattern sentence pairs are identify to suppress mistakes spread. Among anchor points,a approach based on both length and punctuation is used to align sentences. Experimental results verify the high precision of identifying anchor points and the effective restraint of the spread of mistakes; Hybrid alignmentd algorithm avoids the weakness of high time complexity algorithms based on words. In addition, its performance is improved more compare with traditional alignment algorithms, and increase alignment accuarcy from 95. 0% to 97. 6% and recall from 96. 8% to 98. 2% , and the validity evaluation method can find the noised alignment efficently.

Key words: Bilingual corpora, Error curb, Hybrid strategy, Sentence alignment, ChinescUyhur sentence

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!