字标注汉语词法分析中上文和下文孰重孰轻

计算机科学 ›› 2012, Vol. 39 ›› Issue (11): 201-203.

字标注汉语词法分析中上文和下文孰重孰轻

于江德,王希杰,樊孝忠

(安阳师范学院计算机与信息工程学院安阳455000);(北京理工大学计算机科学技术学院北京100081)

出版日期:2018-11-16 发布日期:2018-11-16

Which is More Effective for Chinese Lexical Analysis via Character Tagging:Above-context Versus Below-context

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 汉语词法分析是中文信息处理的基础，现阶段汉语词法分析的主流技术是基于统计的方法，这类方法的本质都是把词法分析过程看作序列数据标注问题。上下文是统计方法中获取语言知识和解决自然语言处理中多种实际应用问题必须依靠的资源和基础。汉语词法分析时需要从上下文获取相关的语言知识，但上文和下文是否同样重要呢? 为克服仅凭主观经验给出猜测结果的不足，对基于字标注汉语词法分析的分词、词性标注、命名实体识别这3项子任务进行了深入研究，对比了上文和下文对各个任务性能的影响;在国际汉语语言处理评测13akcof f多种语料上进行了封闭测试，采用分别表征上文和下文的特征模板集进行了对比实验。结果表明，在字标注框架下，下文对汉语词法分析性能的贡献比上文的贡献高出6个百分点以上。

关键词: 汉语词法分析，字标注，上下文，分词，词性标注，命名实体识别

Abstract: Chinese lexical analysis is a foundational task for Chinese information processing. At the current, the main- stream technology of Chinese lexical analysis is based on statistical methods. These methods treat the analysis process as a sectuence data tagging problem. Context is the necessary resource not only for obtaining linguistic knowledge in sta- tistical linguistics but also for solving the problem in natural language processing. Chinese lexical analysis needs the help of correlative context. However, are above and below the same important? To overcome the lack of giving the result by the subjective experience,we studied the contribution of above and below for character-based tagging Chinese lexical a- nalysis via the large number of experiments about word segmentation, PUS tagging and named entity recognition. Closed evaluations were performed on many kinds of corpus from the international Chinese language processing 13akeoff, and comparative experiments were performed on different feature templates which describe above-context and below-con- text. Experimental results show that the performance by the below-context increases 6 percentage points than by the a- bovccontcxt.

Key words: Chinese lexical analysis, Character tagging, Context, Word segmentation, POS tagging, Named entity recogtion

于江德,王希杰,樊孝忠. 字标注汉语词法分析中上文和下文孰重孰轻[J]. 计算机科学, 2012, 39(11): 201-203. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed