计算机科学 ›› 2010, Vol. 37 ›› Issue (11): 184-189.

• 数据库与数据挖掘 • 上一篇    下一篇

基于Zipf分布与属性相关性的选择性估计

姜芳艽   

  1. (徐州师范大学智能信息处理研究所 徐州221116);(中国人民大学信息学院 北京100872)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(60773216)资助。

Selectivity Estimation Based on 7ipf Distribution and Attribute Correlation

JIANG Fang-jiao   

  • Online:2018-12-01 Published:2018-12-01

摘要: 在Deep Web数据集成中,集成查询接口和很多W cb数据库查询接口用合取谓词表达查询,但是也有相当一部分Web数据库的查询接口用互斥谓词表达查询,这意味着查询转换时每次只能选择一个谓词。因此,准确、高效地佑计每个互斥查询的选择性是优化查询转换的关键。提出了基于Zipf分布与属性相关性的选择性佑计方法。通过属性之间的相关性从Web数据库上获取该属性近似随机的属性级样本,在此基础上计算属性值的Zipf分布方程,进而推断该无限值属性的任意值的选择性。实验表明,该方法可以准确、高效地估计各互斥查询的选择性。

关键词: Zipf分布,属性相关性,选择性估计

Abstract: In Deep Web data integration,some Web database interfaces express exclusive predicates,which permit only one predicate to be selected. Accurately and efficiently estimating the selectivity of each exclusive query is of critical importance to optimal query translation. In this paper, we proposed a novel selectivity estimation method. Firstly, we computed the Attribute Correlation and access approximately random attributclevel sample through submitting the query on the least correlative attribute to the real Web database. hhen we computed Zipf equation aided by the information of word rank from the sample and the actual selectivity of several words from the real Web database. Finally, the selectivity of any word on the infinitcvaluc attribute was derived by the Zipf equation. An experimental evaluation of the proposed selectivity estimation method was provided and experimental results are highly accurate.

Key words: Zipf distribution, Attribute correlation, Selectivity estimation

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!