Computer Science ›› 2017, Vol. 44 ›› Issue (2): 279-282.doi: 10.11896/j.issn.1002-137X.2017.02.047

Previous Articles     Next Articles

Parameter Analysis and Optimization of Cardinality Estimation Algorithm

LIU Shao-ji, CAO Yang and CUI Meng-tian   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Cardinality estimation algorithm is an algorithm based on statistics,which can estimate the cardinality of the given data set.In this algorithm,hash function and some parameters are the key factors to the performance of the algorithm.Based on the related research,an algorithm was proposed which selects hash function and parameters based on the scale and type of data.The experimental results show that both accuracy and stabilization of this algorithm have significant improvement than the traditional estimation algorithm.

Key words: Cardinality estimation,Hash function,Training

[1] HEULE S,NUNKESSER M,HALL A .HyperLogLog in Practice:Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm [C]∥Proceedings of the 16th International Conference on Extending Database Technology(EDBT’13).2013:683-692.
[2] 张洋.解读Cardinality Estimation算法[EB/OL].[2014-12].http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-parti.html.
[3] FLAJOLET P,MARTIN G N.Probabilistic Counting Algo-rithms for Data Base[J].Journal of Computer and System Scien-ces,1985,31(2):182-209.
[4] DURAND M,FLAJOLET P.Loglog Counting of Large Cardinalities [J].Lecture Notes in Computer Science,2003,2:605-617.
[5] FLAJOLETL P,FUSYL E,GANDOUET O,et al.HyperLog-Log:The analysis of a near-optimal cardinality estimation algorithm [C]∥Conference on Analysis of Algorithms,AofA 07,DMTCS proc.AH,2007:127-146.
[6] WHANG K Y,VANDER-ZANDEN B T,TAYLOR H M.ALinear-Time Probabilistic Counting Algorithm for Database Applications [J].ACM Transactions on Database Systems,1990,15(2):208-229.
[7] CAI M,PA J P,YU K,et al.Fast and Accurate Traffic MatrixMeasurement Using Adaptive Cardinality Counting [C]∥Procee-dings of the 2005 ACM SIGCOMM Workshop on Mining Network Data(MineNet’05).2005:205-206.
[8] CHEN Q,LIU Y H,NI,et al.Parallel and Distributed Cardinality Estimation for Large-scale RFID Systems [J].Parallel and Distributed Systems,2011,22(9):1441-1454.
[9] ZHANG N,ABOULNAGE,et al.Accurate and Fast Cardinality Estimation for XPath Queries [C]∥Proceedings of the 22nd International Conference on Data Engineering,2006(ICDE ’06).2006.
[10] GAO W W.An Algorithm Based on Virtual Vector for Measu-ring Host Cardinality Distribution[D].Dalian:Dalian Maritime University,2013.(in Chinese) 高文文.基于虚拟向量的主机基数分布测量算法[D].大连:大连海事大学,2013.
[11] KNUTH D E.The Art of Computer Programming vol.3-Sorting and Searching(Second Edition)[M].Addison-Wesley Professional,1998.
[12] 字符串Hash函数对比[EB/OL].http://blog.csdn.net/icefireelf/article/details/5796529.
[13] FNV哈希算法[EB/OL].http://blog.csdn.net/hustfoxy/article/details/23687239.
[14] 更快更好的哈希函数[EB/OL].http://blog.csdn.net/wisage/article/details/7104866.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!