Computer Science ›› 2023, Vol. 50 ›› Issue (9): 145-151.doi: 10.11896/jsjkx.230300065

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study on Supervised Learning Model for Optimal Histogram Solution

CHEN Yunliang, LIU Hao, ZHU Guishui, HUANG Xiaohui, CHEN Xiaodao, WANG Lizhe   

  1. School of Computer Science,China University of Geosciences,Wuhan 430070,China
  • Received:2023-03-08 Revised:2023-06-18 Online:2023-09-15 Published:2023-09-01
  • About author:CHEN Yunliang,born in 1979,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include cloud computing and data mining.
    HUANG Xiaohui,born in 1994,lecturer,is a member of China Computer Federation.His main research interests include geoscience data management,processing and analysis.
  • Supported by:
    National Natural Science Foundation of China(62076224),National Science Fund for Distinguished Young Scho-lars of China(41925007),Joint Funds of the National Natural Science Foundation of China(U21A2013) and Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing(KLIGIP-2022B16).

Abstract: The dynamic programming binning algorithm is currently used to realize the optimal histogram.However,its time complexity is too high.A supervised learning model based on ProbSparse self-attention is proposed in this paper to learn the dynamic programming binning algorithm.It can be used as an alternative to the dynamic programming binning algorithm.The proposed model consists of three parts:1)mapping the numerical input sequence into the corresponding vector sequence through the embedding and position coding layer;2)capturing the dependence between input sequences through the ProbSparse self-attention;3)the dependency is mapped to the subscript information of the binning “bucket” boundary through the feedforward neural network.Experimental results on six data sets indicate that the proposed model based on ProbSparse self-attention outperforms the dynamic programming binning algorithm.The accuracy of the proposed method is greater than 83.47%.Meanwhile,its time cost in the prediction stage is no more than 1/3 of the compared method.

Key words: Optimal histogram, Dynamic programming binning algorithm, Supervised learning model

CLC Number: 

  • TP391
[1]IOANNIDIS Y.The history of histograms(abridged) [C]//Proceedings 2003 VLDB Conference.Elsevier,2003:19-30.
[2]IOANNIDIS Y E,POOSALA V J A S R.Balancing histogram optimality and practicality for query result size estimation [J].ACM Sigmod Record,1995,24(2):233-244.
[3]NAVAS-PALENCIA G.Optimal binning:mathe-matical pro-gramming formulation [J].arXiv:2001.08025,2020.
[4]IOANNIDIS Y E.Universality of serial histograms [C]//Proceedings of the VLDB.1993:256-267.
[5]JAGADISH H V,KOUDAS N,MUTHUKRISHNAN S,et al.Optimal histograms with quality guarantees [C]//Proceedings of the VLDB.1998:24-27.
[6]BELLMAN R.On the approximation of curves by line segmentsusing dynamic programming [J].Communications of the ACM,1961,4(6):284.
[7]GUHA S,KOUDAS N,SHIM K.Approximation and streaming algorithms for histogram construction problems [J].ACM Transactions on Database Systems(TODS),2006,31(1):396-438.
[8]KAUSHIK R,SUCIU D.Consistent histograms in the presence of distinct value counts[J].Proceedings of the VLDB Endowment,2009,2(1):850-861.
[9]GREENWALD M,KHANNA S.Space-efficient online computation of quantile summaries[J].ACM SIGMOD Record,2001,30(2):58-66.
[10]FANG M,SHIVAKUMAR N,GARCIA-MOLINA H,et al.Computing Iceberg Queries Efficiently[C]//Proceedings of the 1998 VLDB Conference.New York:Citeseer,1998.
[11]MIRONCHYK P,TCHISTIAKOV V.Monotone optimal bin-ning algorithm for credit risk modeling [J/OL].https://www.researchgate.net/profile/Viktor-Tchistiakov/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling/links/5a5dd1a8458515c03edf9a97/Monotone-optimal-binning-algorithm-for-credit-risk-modeling.pdf.
[12]SHEVERTALOV M,STEHLE E,MANCORIDIS S.A genetic algorithm for solving the binning problem in networked applications detection [C]// Proceedings of the 2007 IEEE Congress on Evolutionary Computation.IEEE,2007:713-720.
[13]POOSALA V,HAAS P J,IOANNIDIS Y E,et al.Improved histograms for selectivity estimation of range predicates [J].ACM Sigmod Record,1996,25(2):294-305.
[14]IOANNIDIS Y E,KANG Y.Randomized algorithms for optimizing large join queries [J].ACM Sigmod Record,1990,19(2):312-321.
[15]GUHA S,KOUDAS N,SHIM K.Data-streamsand histograms [C]//Proceedings of the thirty-third Annual ACM Symposium on Theory of Computing.2001:471-475.
[16]GUHA S,INDYK P,MUTHUKRISHNAN S,et al.Histogramming data streams with fast per-item processing [C]//Procee-dings of the International Colloquium on Automata,Languages,and Programming.Springer,2002:681-692.
[17]GUHA S,KOUDAS N.Approximating a data stream for que-rying and estimation:Algorithms and performance evaluation[C]//Proceedings 18th International Conference on Data Engineering.IEEE,2002:567-576.
[18]GUHA S,SHIM K,WOO J.REHIST:Relative error histogram construction algorithms [C]//Proceedings of the VLDB.2004:300-311.
[19]NOMER H A,ALNOWIBET K A,ELSAYED A,et al.Neural knapsack:A neural network based solver for the knapsack pro-blem[J].IEEE Access,2020,8:224200-224210.
[20]ZAREMBA W,SUTSKEVER I.Learning to execute [J].ar-Xiv:14104615,2014.
[21]GRAVES A,WAYNE G,DANIHELKA I.Neural turing ma-chines[J].arXiv:1410.5401,2014.
[22]CHEN Y,LI V O,CHO K,et al.A stable and effective learning strategy for trainable greedy decoding [J].arXiv:1804.07915,2018.
[23]KOEHN P.Pharaoh:a beam search decoder for phrase-basedstatistical machine translation models[C]//Proceedings of the Conference of the Association for Machine Translation in the Americas.Springer,2004:115-124.
[24]GRAVES A,WAYNE G,REYNOLDS M,et al.Hybrid compu-ting using a neural network with dynamic external memory [J].Nature,2016,538(7626):471-476.
[25]ZHOU H,ZHANG S,PENG J,et al.Informer:Beyond efficient transformer for long sequence time-series forecasting [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:11106-11115.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [J].arXiv.1706.03762,2017.
[27]QIU J,MA H,LEVY O,et al.Blockwise self-attention for long document understanding [J].arXiv:1911.02972,2019.
[28]LI S,JIN X,XUAN Y,et al.Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting [J].arXiv:1907.00235,2019.
[29]DAI Z,YANG Z,YANG Y,et al.Transformer-xl:Attentivelanguage models beyond a fixed-length context [J].arXiv:1901.02860,2019.
[30]ELEKES Á,ENGLHARDT A,SCHÄLER M,et al.Towardmeaningful notions of similarity in NLP embedding models [J].International Journal on Digital Libraries,2020,21(2):109-128.
[31]KE G,HE D,LIU T Y.Rethinking positional encoding in language pre-training [J].arXiv:2006.15595,2020.
[1] LU Yuhan, CHEN Liquan, WANG Yu, HU Zhiyuan. Efficient Encrypted Image Content Retrieval System Based on Secure CNN [J]. Computer Science, 2023, 50(9): 26-34.
[2] WANG Jing, ZHANG Miao, LIU Yang, LI Haoling, LI Haotian, WANG Bailing, WEI Yuliang. Study on Dual-security Knowledge Graph for Process Industrial Control [J]. Computer Science, 2023, 50(9): 68-74.
[3] YANG Yi, SHEN Sheng, DOU Zhiyang, LI Yuan, HAN Zhenjun. Tiny Person Detection for Intelligent Video Surveillance [J]. Computer Science, 2023, 50(9): 75-81.
[4] QIU Zhen, ZHENG Zhaohui. Graph Similarity Search with High Efficiency and Low Index [J]. Computer Science, 2023, 50(9): 130-138.
[5] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[6] WANG Wei, DU Xiangcheng, JIN Cheng. Image Relighting Network Based on Context-gated Residuals and Multi-scale Attention [J]. Computer Science, 2023, 50(9): 168-175.
[7] WANG Luo, LI Biao, FU Ruigang. Infrared Ground Multi-object Tracking Method Based on Improved ByteTrack Algorithm [J]. Computer Science, 2023, 50(9): 176-183.
[8] HU Shen, QIAN Yuhua, WANG Jieting, LI Feijiang, LYU Wei. Super Multi-class Deep Image Clustering Model Based on Contrastive Learning [J]. Computer Science, 2023, 50(9): 192-201.
[9] BAI Zhengyao, XU Zhu, ZHANG Yihan. Deep Artificial Correspondence Generation for 3D Point Cloud Registration [J]. Computer Science, 2023, 50(9): 210-219.
[10] LIU Peigang, SUN Jie, YANG Chaozhi, LI Zongmin. Crowd Counting Based on Multi-scale Feature Aggregation in Dense Scenes [J]. Computer Science, 2023, 50(9): 235-241.
[11] ZHAI Lizhi, LI Ruixiang, YANG Jiabei, RAO Yuan, ZHANG Qitan, ZHOU Yun. Overview About Composite Semantic-based Event Graph Construction [J]. Computer Science, 2023, 50(9): 242-259.
[12] LIU Yubo, GUO Bin, MA Ke, QIU Chen, LIU Sicong. Design of Visual Context-driven Interactive Bot System [J]. Computer Science, 2023, 50(9): 260-268.
[13] AN Haojia, SHI Dianxi, LI lin, SUN Yixuan, YANG Shaowu, CHEN Xucan. TAMP:A Hierarchical Multi-robot Task Assignment Method for Area Coverage [J]. Computer Science, 2023, 50(9): 269-277.
[14] YI Liu, GENG Xinyu, BAI Jing. Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion [J]. Computer Science, 2023, 50(9): 278-286.
[15] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!