计算机科学 ›› 2013, Vol. 40 ›› Issue (11): 228-230.

• 软件与数据库技术 • 上一篇    下一篇

基于潜在语义分析的Deep Web查询接口聚类研究

强保华,李巍,邹显春,汪天天,吴春明   

  1. 桂林电子科技大学计算机科学与工程学院 广西541000;桂林电子科技大学计算机科学与工程学院 广西541000;西南大学计算机与信息科学学院 重庆400715;桂林电子科技大学计算机科学与工程学院 广西541000;西南大学计算机与信息科学学院 重庆400715
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金(61163057),广西自然科学基金(2012jjAAG0063),广西可信软件重点实验室开放基金(KX201117),广西研究生科研创新项目(YCSZ2012070)资助

Research on Deep Web Query Interface Clustering Based on Latent Semantic Analysis

QIANG Bao-hua,LI Wei,ZOU Xian-chun,WANG Tian-tian and WU Chun-ming   

  • Online:2018-11-16 Published:2018-11-16

摘要: 集成查询接口的生成是Deep Web数据集成的重要组成环节。如何对不同领域的查询接口进行有效的聚类是生成集成查询接口时需要解决的核心问题之一。针对传统的向量空间模型在Deep Web查询接口聚类时单纯依赖关键词匹配的缺点,引入潜在语义分析(LSA)的方法来发掘查询接口之间的语义关系,并给出了基于潜在语义分析的Deep Web查询接口聚类算法,最后采用UIUC的Web集成资源库提供的数据进行了实验。结果表明,潜在语义分析的方法提高了同一领域查询接口之间的相似度,明显改善了Deep Web查询接口聚类的质量。

关键词: 潜在语义分析,奇异值分解,Deep Web,查询接口聚类

Abstract: Generation of integrated query interfaces is the important issue of Deep Web data integration.How to cluster different query interfaces effectively is one of the most core issues when generating integrated query interface.Due to the traditional vector space model can’t solve the shortage of relying on keyword maching in the Deep Web query interface clustering,the Latent Semantic Analysis (LSA) method was introduced and then the algorithm of Deep Web query interface clustering based on Latent Semantic Analysis was proposed.The experimental results on UIUC Web integration repository show that LSA method can significantly improve the performance of Deep Web query interface clustering.

Key words: Latent semantic analysis,Singular value decomposition,Deep Web,Query interface clustering

[1] 刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,0(9):1475-1489
[2] Olney,Andrew M.Generalizing Latent Semantic Analisis[C]∥2009IEEE International Conference on Semantic Computing.2009:40-46
[3] Liu Yun-feng,Qi Huan.Latent Semantic Analysis of Chinese Information[J].Journal of South China University of Technology (Natural Science),2004(32):107-111
[4] Li Ya-xiong,Zhang Jian-qiang,Dan Hu.Text Clustering Based on Domain Ontology and Latent Semantic Analysis[C]∥2010International Conference on Asian Language Processing.2010:219-222
[5] Thomas H.Unsupervised Learning by Probabilistic Latent Semantic Analysis[J].Machine Learning,2001,42(2):177-196
[6] 黄承慧,印鉴,侯昉.一种结合词项目TF-IDF方法的文本相似度量方法[J].计算机学报,2011,4(5):857-864
[7] Mao Qin-jiao,Feng Bao-qin,Pan Shan-lang.Latent Semantic Analysis for Query Ierfaces of Deep Web Site[J].Journal of SouthEast University (English Edition), 2008,4(3):312-314
[8] 盖杰,王怡,武港山.基于潜在语义分析的信息检索[J].计算机工程,2004,0(2):58-60
[9] Wu Chen,Vidyasagar P,Chang E.Latent Semantic analysis-The Dynamics of Semantics Web Services Discovery[J].Lecture Notes in Computer Science,2008,1:346-373

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!