汉语统计语言模型训练样本容量的定量化度量

Computer Science ›› 2009, Vol. 36 ›› Issue (10): 222-224.

Statistical Language Model

ZHANG Yang-sen

Online:2018-11-16 Published:2018-11-16

Abstract

Abstract: The training of statistical language model parameter is the key of language modeling. Chooseing how many training samples to meet the demand of the model parameter estimation error is one of concern problems of language modeling theory. We applied mathematical statistics theory to give the estimating method for training samples lower bound capability for Chinese model, the quantification estimation formula was suggested. By using this formula, the corpus sample capability needed to train model parameters can be calculated according to the demand of parameter estimation error.

Key words: Chinese statistical language model, Training corpus sample, Sample capacity, Relative error

ZHANG Yang-sen. Statistical Language Model[J].Computer Science, 2009, 36(10): 222-224.