计算机科学 ›› 2010, Vol. 37 ›› Issue (2): 171-174.

• 人工智能 • 上一篇    下一篇

基于词条组合的军事类文本分词方法

黄魏,高兵,刘异,杨克巍   

  1. (国防科学技术大学信息系统与管理学院 长沙410073);(湖南师范大学文学院 长沙410081)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受“十一五”武器装备预先研究项目(513300102)资助。

Word Segmentation Approach in Military Text on the Basis of Word Combination

HUANG Wei,GAO Bing,LIU Yi,YANG Ke-wei   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对传统的分词方法切分军事类文本存在未登录词多和部分词条特征信息不完整的问题,提出把整个分词过程分解为若干子过程,以词串为分词单位对军事类文本进行分词。首先基于词典对文本进行双向扫描,标识歧义切分字段,对切分结果一致的字段进行停用词消除,计算第一次分词得到的词条间的互信息和相部共现频次,根据计算结果判定相应的词条组合成词串并标识,最后提取所标识的歧义字段和词串由人工对其进行审核处理。实验结果表明,词条组合后的词串的特征信息更丰富,分词效果更好。

关键词: 军事,文本,分词,词条

Abstract: Since the unknown word in military texts is excessive, and the feature of some words is incomplete, the word segmentation method which is based on lexical chunk as the unit was provided. word segmentation was divided into some sections:bidirectional scanning in the text in the base of dictionary,marking the various and segment the words;deleting the stop-words which share the same segmentation results, then count words mutual information and adjacency frequency by the first times word segmentation, according to this counting result, the lexical chunk with relevant words can be judged and signed. At last,picked up the signed various segment and lexical chunks to audit and deal with them artificially. The experimentation shows that after the word combination, the lexical chunk bears much more feature information which shares a better effect of the process.

Key words: Military, Text, Word segmentation, Words

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!