一种基于未知结构网页抽取本体的方法

计算机科学 ›› 2009, Vol. 36 ›› Issue (2): 186-189.

一种基于未知结构网页抽取本体的方法

出版日期:2018-11-16 发布日期:2018-11-16

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 在Web上数据大多是结构化的，但事先并不熟知数据的结构，因此不能有效地查询感兴趣的数据。提出了一种独立于文本抽取本体的方法，其过程包括表的理解、数据集成和本体生成，其中表理解是搜寻定位兴趣表、识别及匹配属性和值，并形成记录；数据集成是匹配源记录和目标模式；本体卷积是将源记录的数据抽取到目标模式。结果表明这种方法可以通过已知的目标模式有效地抽取未知结构的数据。

关键词: 异质数据集成语义对应表理解本体抽取

Abstract: To the user,the structure of the data in HTML tables on the Web is usually unknown,thus,the data ot interest can＇t be queried directly. We presented a solution to this problem. The solution entails the understand of table element,data integration and wrap

Key words: Hetero-data integration, Semantic correspondence,Table understanding,Ontology extraction

. 一种基于未知结构网页抽取本体的方法[J]. 计算机科学, 2009, 36(2): 186-189. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed