计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 145-151.doi: 10.11896/jsjkx.230300065
陈云亮, 刘浩, 朱桂水, 黄晓辉, 陈小岛, 王力哲
CHEN Yunliang, LIU Hao, ZHU Guishui, HUANG Xiaohui, CHEN Xiaodao, WANG Lizhe
摘要: 最优直方图是一类重要的直方图技术,目前用于实现最优直方图的动态规划分组算法存在时间复杂度过高的问题。因此,提出了一种基于概率稀疏自注意力的监督学习模型来学习动态规划分组算法,该监督学习模型可作为动态规划分组算法的替代方案,主要包括3个部分:1)通过Embedding层与位置编码层将输入数值序列映射为对应的向量序列;2)通过概率稀疏的自注意力层捕获输入序列之间的依赖关系;3)通过前馈神经网络层将依赖关系映射到分组“桶”边界下标信息。实验结果表明,基于概率稀疏自注意力的监督学习模型在6个数据集上的准确率超过了83.47%,且其在预测阶段的时间消耗不超过动态规划分组算法的1/3。
中图分类号:
[1]IOANNIDIS Y.The history of histograms(abridged) [C]//Proceedings 2003 VLDB Conference.Elsevier,2003:19-30. [2]IOANNIDIS Y E,POOSALA V J A S R.Balancing histogram optimality and practicality for query result size estimation [J].ACM Sigmod Record,1995,24(2):233-244. [3]NAVAS-PALENCIA G.Optimal binning:mathe-matical pro-gramming formulation [J].arXiv:2001.08025,2020. [4]IOANNIDIS Y E.Universality of serial histograms [C]//Proceedings of the VLDB.1993:256-267. [5]JAGADISH H V,KOUDAS N,MUTHUKRISHNAN S,et al.Optimal histograms with quality guarantees [C]//Proceedings of the VLDB.1998:24-27. [6]BELLMAN R.On the approximation of curves by line segmentsusing dynamic programming [J].Communications of the ACM,1961,4(6):284. [7]GUHA S,KOUDAS N,SHIM K.Approximation and streaming algorithms for histogram construction problems [J].ACM Transactions on Database Systems(TODS),2006,31(1):396-438. [8]KAUSHIK R,SUCIU D.Consistent histograms in the presence of distinct value counts[J].Proceedings of the VLDB Endowment,2009,2(1):850-861. [9]GREENWALD M,KHANNA S.Space-efficient online computation of quantile summaries[J].ACM SIGMOD Record,2001,30(2):58-66. [10]FANG M,SHIVAKUMAR N,GARCIA-MOLINA H,et al.Computing Iceberg Queries Efficiently[C]//Proceedings of the 1998 VLDB Conference.New York:Citeseer,1998. [11]MIRONCHYK P,TCHISTIAKOV V.Monotone optimal bin-ning algorithm for credit risk modeling [J/OL].https://www.researchgate.net/profile/Viktor-Tchistiakov/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling/links/5a5dd1a8458515c03edf9a97/Monotone-optimal-binning-algorithm-for-credit-risk-modeling.pdf. [12]SHEVERTALOV M,STEHLE E,MANCORIDIS S.A genetic algorithm for solving the binning problem in networked applications detection [C]// Proceedings of the 2007 IEEE Congress on Evolutionary Computation.IEEE,2007:713-720. [13]POOSALA V,HAAS P J,IOANNIDIS Y E,et al.Improved histograms for selectivity estimation of range predicates [J].ACM Sigmod Record,1996,25(2):294-305. [14]IOANNIDIS Y E,KANG Y.Randomized algorithms for optimizing large join queries [J].ACM Sigmod Record,1990,19(2):312-321. [15]GUHA S,KOUDAS N,SHIM K.Data-streamsand histograms [C]//Proceedings of the thirty-third Annual ACM Symposium on Theory of Computing.2001:471-475. [16]GUHA S,INDYK P,MUTHUKRISHNAN S,et al.Histogramming data streams with fast per-item processing [C]//Procee-dings of the International Colloquium on Automata,Languages,and Programming.Springer,2002:681-692. [17]GUHA S,KOUDAS N.Approximating a data stream for que-rying and estimation:Algorithms and performance evaluation[C]//Proceedings 18th International Conference on Data Engineering.IEEE,2002:567-576. [18]GUHA S,SHIM K,WOO J.REHIST:Relative error histogram construction algorithms [C]//Proceedings of the VLDB.2004:300-311. [19]NOMER H A,ALNOWIBET K A,ELSAYED A,et al.Neural knapsack:A neural network based solver for the knapsack pro-blem[J].IEEE Access,2020,8:224200-224210. [20]ZAREMBA W,SUTSKEVER I.Learning to execute [J].ar-Xiv:14104615,2014. [21]GRAVES A,WAYNE G,DANIHELKA I.Neural turing ma-chines[J].arXiv:1410.5401,2014. [22]CHEN Y,LI V O,CHO K,et al.A stable and effective learning strategy for trainable greedy decoding [J].arXiv:1804.07915,2018. [23]KOEHN P.Pharaoh:a beam search decoder for phrase-basedstatistical machine translation models[C]//Proceedings of the Conference of the Association for Machine Translation in the Americas.Springer,2004:115-124. [24]GRAVES A,WAYNE G,REYNOLDS M,et al.Hybrid compu-ting using a neural network with dynamic external memory [J].Nature,2016,538(7626):471-476. [25]ZHOU H,ZHANG S,PENG J,et al.Informer:Beyond efficient transformer for long sequence time-series forecasting [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:11106-11115. [26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [J].arXiv.1706.03762,2017. [27]QIU J,MA H,LEVY O,et al.Blockwise self-attention for long document understanding [J].arXiv:1911.02972,2019. [28]LI S,JIN X,XUAN Y,et al.Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting [J].arXiv:1907.00235,2019. [29]DAI Z,YANG Z,YANG Y,et al.Transformer-xl:Attentivelanguage models beyond a fixed-length context [J].arXiv:1901.02860,2019. [30]ELEKES Á,ENGLHARDT A,SCHÄLER M,et al.Towardmeaningful notions of similarity in NLP embedding models [J].International Journal on Digital Libraries,2020,21(2):109-128. [31]KE G,HE D,LIU T Y.Rethinking positional encoding in language pre-training [J].arXiv:2006.15595,2020. |
|