计算机科学 ›› 2010, Vol. 37 ›› Issue (3): 156-158174.

• 软件工程与数据库技术 • 上一篇    下一篇

基于语义支持的Deep Web数据抽取

高明,王继成,李江峰   

  1. (同济大学电子与信息工程学院 上海201804)
  • 出版日期:2018-12-01 发布日期:2018-12-01

WDB Data Extraction Based on Semantic Support

GAO Ming,WANG Ji-cheng,LI Jiang-feng   

  • Online:2018-12-01 Published:2018-12-01

摘要: 在分析Dccp Wcb查询实现机制的基础上,给出了在语义本体的支持下,通过机器学习来实现自动填充查询接口,以实现自动数据抽取的算法:构造二维表,表的列为通过Deep Web查询接口页面提取到的各个控件,通过为各控件赋值的方式来为表中添加相应的元组,根据返回结果的情况,即数据抽取成功或抽取失败,作为指导进行分类学习,最终依照学习的结果来自动构造请求字符串完成数据的抽取。实验表明算法具有较好的效果。

关键词: 数据抽取,语义,机器学习,深网

Abstract: The paper presented an algorithm which fills the ctuery interface by using machine learning based on the analysis of mechanism of Deep Web query. The algorithm is able to extract data automatically. Firstly, a 2D table is constructed. The columns of the table are controllers extracted from pages of the Deep Web query interface. Then values of the table are filled by giving values to all the controllers. Next, a learning of classification is going to be achieved according to the result whether the extraction of data successfully or not. Finally, the data is extracted by constructing request string automatically through the results of the learning. The experiment shows that the algorithm runs effective1y.

Key words: Data extraction,Semantic,Machine learning,Deep Web

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!