计算机科学 ›› 2008, Vol. 35 ›› Issue (5): 127-130.

• • 上一篇    下一篇

一种基于逐层扫描的频繁字串快速提取算法

  

  • 出版日期:2018-11-16 发布日期:2018-11-16

  • Online:2018-11-16 Published:2018-11-16

摘要: 串频统计是一种简便有效的抽取未登录词方法。本文提出了一种快速的频繁字串提取和计频方法,通过逐层扫描快速发现频繁字串,修正字串有效出现频次,最后抽取平均互信息量达到阈值的字串。实验结果显示该方法有效可行。

关键词: 频繁字串 中文抽词 逐层扫描 互信息

Abstract: String frequency statistics is a simple and effective method of extraction unlisted word. This paper presents an effective algorithm of extracting frequent strings. It uses a level-wise scan for finding rapidly frequent strings and modifies the valid freq

Key words: Frequent string,Chinese automatic word extraction, Level-wise scan, Mutual information

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!