Computer Science ›› 2017, Vol. 44 ›› Issue (2): 279-282.doi: 10.11896/j.issn.1002-137X.2017.02.047

Previous Articles     Next Articles

Parameter Analysis and Optimization of Cardinality Estimation Algorithm

LIU Shao-ji, CAO Yang and CUI Meng-tian   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Cardinality estimation algorithm is an algorithm based on statistics,which can estimate the cardinality of the given data set.In this algorithm,hash function and some parameters are the key factors to the performance of the algorithm.Based on the related research,an algorithm was proposed which selects hash function and parameters based on the scale and type of data.The experimental results show that both accuracy and stabilization of this algorithm have significant improvement than the traditional estimation algorithm.

Key words: Cardinality estimation,Hash function,Training

[1] HEULE S,NUNKESSER M,HALL A .HyperLogLog in Practice:Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm [C]∥Proceedings of the 16th International Conference on Extending Database Technology(EDBT’13).2013:683-692.
[2] 张洋.解读Cardinality Estimation算法[EB/OL].[2014-12].http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-parti.html.
[3] FLAJOLET P,MARTIN G N.Probabilistic Counting Algo-rithms for Data Base[J].Journal of Computer and System Scien-ces,1985,31(2):182-209.
[4] DURAND M,FLAJOLET P.Loglog Counting of Large Cardinalities [J].Lecture Notes in Computer Science,2003,2:605-617.
[5] FLAJOLETL P,FUSYL E,GANDOUET O,et al.HyperLog-Log:The analysis of a near-optimal cardinality estimation algorithm [C]∥Conference on Analysis of Algorithms,AofA 07,DMTCS proc.AH,2007:127-146.
[6] WHANG K Y,VANDER-ZANDEN B T,TAYLOR H M.ALinear-Time Probabilistic Counting Algorithm for Database Applications [J].ACM Transactions on Database Systems,1990,15(2):208-229.
[7] CAI M,PA J P,YU K,et al.Fast and Accurate Traffic MatrixMeasurement Using Adaptive Cardinality Counting [C]∥Procee-dings of the 2005 ACM SIGCOMM Workshop on Mining Network Data(MineNet’05).2005:205-206.
[8] CHEN Q,LIU Y H,NI,et al.Parallel and Distributed Cardinality Estimation for Large-scale RFID Systems [J].Parallel and Distributed Systems,2011,22(9):1441-1454.
[9] ZHANG N,ABOULNAGE,et al.Accurate and Fast Cardinality Estimation for XPath Queries [C]∥Proceedings of the 22nd International Conference on Data Engineering,2006(ICDE ’06).2006.
[10] GAO W W.An Algorithm Based on Virtual Vector for Measu-ring Host Cardinality Distribution[D].Dalian:Dalian Maritime University,2013.(in Chinese) 高文文.基于虚拟向量的主机基数分布测量算法[D].大连:大连海事大学,2013.
[11] KNUTH D E.The Art of Computer Programming vol.3-Sorting and Searching(Second Edition)[M].Addison-Wesley Professional,1998.
[12] 字符串Hash函数对比[EB/OL].http://blog.csdn.net/icefireelf/article/details/5796529.
[13] FNV哈希算法[EB/OL].http://blog.csdn.net/hustfoxy/article/details/23687239.
[14] 更快更好的哈希函数[EB/OL].http://blog.csdn.net/wisage/article/details/7104866.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .