基于HMM的Web信息抽取算法的研究与应用

计算机科学 ›› 2010, Vol. 37 ›› Issue (2): 203-206.

基于HMM的Web信息抽取算法的研究与应用

祝伟华,卢熠,刘斌斌

(重庆大学软件学院重庆400044)

出版日期:2018-12-01 发布日期:2018-12-01
基金资助:
本文受国家自然科学基金项目(No. 101022820080079)资助。

Improvement of Web Information Extraction Algorithm Based on

ZHU Wei-hua,LU Yi,LIU Bin-bin

Online:2018-12-01 Published:2018-12-01

摘要/Abstract

摘要： 随着因特网技术的迅速发展，网上信息成几何级数增长，如何从这些海量联机非结构化文本中自动抽取出结构化信息成为目前重要的研究课题。研究了基于隐马尔可夫模型的Web信息抽取算法，着重探讨了隐马尔可夫模型在文本信息抽取中应该如何应用，数据应该如何标记，并对隐马尔可夫模型在文本信息抽取中的应用提出了几个改进的方法，建立了基于HMM的Web信息抽取模型，并对信息抽取后的数据进行了分析对比，验证了改进算法的有效性。

关键词: 隐马尔可夫模型，信息抽取，机器学习

Abstract: With the development of the Internet technologies, the information on the Internet increases exponentially.One important research focuses on how to extract structured data from these great capacities of online documents in unstructured texts. This thesis mainly studied relative algorithms on Web information extraction based on hidden Markov model ( HMM) , discussed how to use HMM and how to mark data in text information extraction, offered several methods to improve the hidden Markov model in information extraction, introduced the establishment of Web information extraction model based on HMM, Comparatively analysed the output data of information extraction, verified the validity of the algorithm through experiments.

Key words: HMM, Information extraction, Machine learning

祝伟华,卢熠,刘斌斌. 基于HMM的Web信息抽取算法的研究与应用[J]. 计算机科学, 2010, 37(2): 203-206. https://doi.org/

ZHU Wei-hua,LU Yi,LIU Bin-bin. Improvement of Web Information Extraction Algorithm Based on[J]. Computer Science, 2010, 37(2): 203-206. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed