计算机科学 ›› 2011, Vol. 38 ›› Issue (10): 199-201.

• 人工智能 • 上一篇    下一篇

一种有效的深网入口识别方法

吴春明,谢德体   

  1. (西南大学计算机与信息科学学院 重庆400715)1(西南大学资源环境学院 重庆400715)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Effective Approach to Deep Web Entries Identification

WU Chun-ming,XIE De-ti   

  • Online:2018-11-16 Published:2018-11-16

摘要: 深网入口自动识别是深网数据集成的前提和基础。由于表单在设计上具有较大的随意性,使得深网入口缺 乏统一的构建标准,难以利用确定性的规则对其进行判断。首先基于统计特征,抽取了部分表单属性作为深网入口与 非深网入口的可区分特征,在此基础上,提出了一种利用神经网络进行深网入口自动识别的方法。不同于基于规则的 判断方法,神经网络是被训练的,不需要任何先验知识,这种特性使其非常适合于对具有复杂表现形式的深网入口进 行判定。实验结果表明了该方法的有效性。

关键词: 深网入口,神经网络,特征抽取,机器学习

Abstract: Automatic identification of deep Web entries is the basis of deep Web data integration. Owing to the subjec- tivity of form design,deep Web entries lack unified standard and it is difficult to judge whether the form is a deep Web entry by the definite rules. Based on the statistics, this paper first chose several form attributes as the defining features, which can distinguish searchable forms from non-searchable forms. Then, an entry identification algorithm was proposed by using neural network. Unlike previous approaches, neural network can be trained, which is very suitable for entry i- dentification of the deep Web. I}he experimental results show that our proposed algorithm can be an effective way in au- tomatic identification of the deep Web.

Key words: Deep Web cntrics,Ncural nctwork,Fcaturc cxtraction,Machinc learning

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!