计算机科学 ›› 2012, Vol. 39 ›› Issue (4): 177-180.

• 数据库与数据挖掘 • 上一篇    下一篇

基于领域特征文本的Deep Web分类研究

吴春明,谢德体   

  1. (西南大学计算机与信息科学学院 重庆400715);(西南大学资源环境学院 重庆400715)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Research on Deep Web Classification Based on Domain Feature Text

  • Online:2018-11-16 Published:2018-11-16

摘要: Deep Web自动分类是建立深网数据集成系统的前提和基础。提出了一种基于领域特征文本的Deep Web 分类方法。首先借助本体知识对表达同一语义的不同词汇进行了概念抽象,进而给出了领域相关度的定义,并将其作 为特征文本选择的量化标准,避免了人为选取的主观性和不确定性;在接口向量模型构建中,考虑了不同特征文本对 于分类作用的差异,提出了一种改进的W-"I'FIDF权重计算方法;最后采用KNN算法对接口向量进行了分类。对比 实验证明,利用所提方法选择的特征文本是准确有效的,新的特征文本权重计算方法能显著地提高分类精度,且在 KNN算法中表现出较好的稳定性。

关键词: 特征文本,领域分类,向量空间模型,Deep Web

Abstract: Automatic Decp Web classification is the basis of building Decp Web data intergration system. An approach was proposed to classify the Deep Web based on domain feature text. Using the ontology knowledge, the concepts which express the same semantics were firstly extracted from different texts. Then the definition of domain correlation was given as the quantitative criteria for feature text selection, in order to avoid the subjectivity and uncertainty of manual selection. In the process of the interface vector space model construction, an improved weighting method namedw I}FIDF was proposed to evaluate the different roles of feature text. At last, a KNN algorithm was used to classify these interface vectors. Comparative experiments indicate that the feature text selected by our method is accurate and effec- tive, and the new weighting method can improve the classification precision significantly and shows good stability in KNN classification.

Key words: Fcaturc tcxt, Domain classification, Vcctor space model, Dccp Web

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!