计算机科学 ›› 2017, Vol. 44 ›› Issue (2): 279-282.doi: 10.11896/j.issn.1002-137X.2017.02.047

• 人工智能 • 上一篇    下一篇

基数估计算法参数的分析与优化

刘绍记,曹阳,崔梦天   

  1. 华南师范大学计算机学院 广州510000,华南师范大学计算机学院 广州510000,西南民族大学计算机科学与技术学院 成都610041
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受2015年度广东省高等教育教学改革项目:基于敏捷开发的软件项目实践课程迭代式教学模式设计与实践,广东省水利科技创新项目成果(2014-16),国家自然科学基金项目(61379019),四川省科技计划项目(2015JY002)资助

Parameter Analysis and Optimization of Cardinality Estimation Algorithm

LIU Shao-ji, CAO Yang and CUI Meng-tian   

  • Online:2018-11-13 Published:2018-11-13

摘要: 基数估计算法(Cardinality Estimation Algorithm)是基于概率统计理论的估算给定数据集中不重复元素基数的算法。算法中的Hash函数和相关参数的设置是影响算法性能的两个关键因素。针对这两个问题展开研究,提出了一种基数估计的优化算法,它可以根据数据规模和数据类型动态调整Hash函数和分桶参数,以提高算法的精度和稳定性。实验结果表明,改进的基数估计算法在经过训练之后,相比传统估计算法,其估计精度和稳定性均有所提高。

关键词: 基数估计,Hash函数,训练

Abstract: Cardinality estimation algorithm is an algorithm based on statistics,which can estimate the cardinality of the given data set.In this algorithm,hash function and some parameters are the key factors to the performance of the algorithm.Based on the related research,an algorithm was proposed which selects hash function and parameters based on the scale and type of data.The experimental results show that both accuracy and stabilization of this algorithm have significant improvement than the traditional estimation algorithm.

Key words: Cardinality estimation,Hash function,Training

[1] HEULE S,NUNKESSER M,HALL A .HyperLogLog in Practice:Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm [C]∥Proceedings of the 16th International Conference on Extending Database Technology(EDBT’13).2013:683-692.
[2] 张洋.解读Cardinality Estimation算法[EB/OL].[2014-12].http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-parti.html.
[3] FLAJOLET P,MARTIN G N.Probabilistic Counting Algo-rithms for Data Base[J].Journal of Computer and System Scien-ces,1985,31(2):182-209.
[4] DURAND M,FLAJOLET P.Loglog Counting of Large Cardinalities [J].Lecture Notes in Computer Science,2003,2:605-617.
[5] FLAJOLETL P,FUSYL E,GANDOUET O,et al.HyperLog-Log:The analysis of a near-optimal cardinality estimation algorithm [C]∥Conference on Analysis of Algorithms,AofA 07,DMTCS proc.AH,2007:127-146.
[6] WHANG K Y,VANDER-ZANDEN B T,TAYLOR H M.ALinear-Time Probabilistic Counting Algorithm for Database Applications [J].ACM Transactions on Database Systems,1990,15(2):208-229.
[7] CAI M,PA J P,YU K,et al.Fast and Accurate Traffic MatrixMeasurement Using Adaptive Cardinality Counting [C]∥Procee-dings of the 2005 ACM SIGCOMM Workshop on Mining Network Data(MineNet’05).2005:205-206.
[8] CHEN Q,LIU Y H,NI,et al.Parallel and Distributed Cardinality Estimation for Large-scale RFID Systems [J].Parallel and Distributed Systems,2011,22(9):1441-1454.
[9] ZHANG N,ABOULNAGE,et al.Accurate and Fast Cardinality Estimation for XPath Queries [C]∥Proceedings of the 22nd International Conference on Data Engineering,2006(ICDE ’06).2006.
[10] GAO W W.An Algorithm Based on Virtual Vector for Measu-ring Host Cardinality Distribution[D].Dalian:Dalian Maritime University,2013.(in Chinese) 高文文.基于虚拟向量的主机基数分布测量算法[D].大连:大连海事大学,2013.
[11] KNUTH D E.The Art of Computer Programming vol.3-Sorting and Searching(Second Edition)[M].Addison-Wesley Professional,1998.
[12] 字符串Hash函数对比[EB/OL].http://blog.csdn.net/icefireelf/article/details/5796529.
[13] FNV哈希算法[EB/OL].http://blog.csdn.net/hustfoxy/article/details/23687239.
[14] 更快更好的哈希函数[EB/OL].http://blog.csdn.net/wisage/article/details/7104866.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!