计算机科学 ›› 2013, Vol. 40 ›› Issue (9): 163-168.

• 软件与数据库技术 • 上一篇    下一篇

基于“C藤”Pair Copula的高维OLAP查询建模方法研究

倪志伟,王超,高雅卓   

  1. 合肥工业大学管理学院智能管理研究所 合肥230009 合肥工业大学过程优化与智能决策教育部重点实验室 合肥230009;合肥工业大学管理学院智能管理研究所 合肥230009 合肥工业大学过程优化与智能决策教育部重点实验室 合肥230009;合肥工业大学管理学院智能管理研究所 合肥230009 合肥工业大学过程优化与智能决策教育部重点实验室 合肥230009
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家863高技术研究发展计划基金项目(2011AA040501),国家自然科学基金项目(71271071,70871033)资助

Efficient Modeling Method for Multidimensional OLAP Query Based on “C-Vine” Pair Copula

NI Zhi-wei,WANG Chao and GAO Ya-zhuo   

  • Online:2018-11-16 Published:2018-11-16

摘要: 信息爆炸造成的数据仓库维度的急剧增加,大大影响了OLAP查询模型的精度和效率。首次将数理统计学中的“C藤”Pair Copula引入到OLAP查询建模的研究中,有效地解决了高维OLAP查询建模时的“维数灾难”问题,并设计了针对该模型的参数估计方法以提取数据概要知识。实验分析表明与传统方法相比,基于Pair Copula方法的模型可以在保证OLAP的查询精度的基础上减少数据立方体的存储空间,并且在高维数据环境下具有更高的查询效率。

关键词: OLAP近似查询,数据立方体,数据概要,Pair Copula,C藤 中图法分类号TP311文献标识码A

Abstract: The rapid increasing dimensionality of database caused by the recent information explosion greatly impairs the accuracy and efficiency of On-Line Analytical Processing(OLAP)query models.In this paper,by first applying the statistical concept “C-Vine” Pair Copula in the study of OLAP query model,an effective solution to the “dimension curse” of higher dimensional OLAP query models was provided,and a specific parametric estimation method was proposed to extract the data synopsis from the original mass data for those higher dimensional OLAP models.Experimental results show that compared with existing methods,the proposed Pair Copula-based model can reduce storage space for data cubes while improving the relatively high query accuracy of OLAP models,and especially it provides a better query efficiency for higher dimensional data cubes compared with existing methods.

Key words: OLAP approximate query,Data cube,Data synopsis,Pair copula,C-vine

[1] Chaudhuri S,dayal U,Narasayya V.An overview of business intelligence technology [J].Communications of the ACM,2011,54(8):88-98
[2] Cuzzocrea A.Improving range-sum query evaluation on datacubes via polynomial approximation [J].Data & Knowledge Engineering,2006,56(2):85-121
[3] Barbará D,Wu X T.Loglinear-Based Quasi Cubes [J].Journal of Intelligent Information Systems,2001,16(3):255-276
[4] Chen Y,Dong G,Han J W,et al.Regression Cubes with Lossless Compression and Aggregation [J].IEEE Transations on Knowledge and Data Engineering,2006,18(12):1585-1599
[5] Poosala V,Ioannidis Y E.Selectivity estimation without the attribute value independence assumption [C]∥Proceedings of the 23rd International Conference on Very Large Databases.Athens,Greece,August 1997:486-495
[6] Gunopulos D,Kollios G,Tsotras V J,et al.ApproximatingMulti-Dimensional Aggregate Range Queries Over Real Attri-butes [C]∥Proceedings of the 2000ACM SIGMOD internationalconference on Management of data.Dallas,Texas,USA,May 2000:463-474
[7] Rsch P,Lehner W.A Sample Advisor for Approximate Query Processing [C]∥Proceedings of the 14th east European confe-rence on Advances in databases and information systems.Novi Sad,September 2010:490-504
[8] Li Xiao-lei,Han Jia-wei,Yin Zhi-jun,et al.Sampling cube:a framework for statistical OLAP over sampling data [C]∥Proceedings of the 2008ACM SIGMOD international conference on management of data.Vancouver,BC,Canada,June 2008:779-790
[9] Chakrabarti K,Garofalakis M,Rastogi R,et al.ApproximateQuery Processing Using Wavelets [J].The International Journal on Very Large Data Bases,2001,0(2/3):199-223
[10] Heinen A,Valdesogo A.Asymmetric CAPM dependence forlarge dimensions:the canonical vine autoregressive model[M].CORE discussion papers 2009069,Universit_ecatholique de Louvain,Center for Operations Research and Econometrics(CORE),2009
[11] Sklar A.Fonctions de répartition à n dimensions et leurs marges [M].Publications de l’Institut de Statistique de l’Universite de Paris 8,9:131-229
[12] Aas K,Berg D,Kurowicka D.Modeling Dependence Between Financial Returns Using Pair-Copula Constructions [M].Depen-dence Modeling:Vine Copula Handbook.World Scientific,2011:305-328
[13] Bhat C R,Eluru N.A copula-based approach to accommodateresidential self-selection effects in travel behavior modeling [J].Transportation Research Part B:Methodological,2009,43(7):749-765
[14] 高雅卓,倪志伟,倪丽萍.连续属性上的OLAP查询建模方法研究[J].情报学报,2011,30(4):372-379
[15] Aas K,Czado C,Frigessi A,et al.Pair-copula constructions of multiple dependence [J].Insurance:Mathematics and Econo-mics,2009,44(2):182-198
[16] Shanmugasundaram J,Fayyad U,Bradley P S.Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions [C]∥Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mi-ning.San Diego,CA,USA,Aug.1999:223-232
[17] Acharya S,Gibbons P B,Poosala V,et al.The AQUA approximate query answering system [C]∥Proceedings of the 1999ACM SIGMOD international conference on Management of data.Philadelphia,Pennsylvania,USA,June 1999:574-576
[18] Joe H.Families of m-variate distributions with given marginsand m(m-1)/2bivariate dependence parameters[J].Lecture Notes-Monograph Series,1996,28:120-141
[19] Patton A.Estimation of multivariate models for time series of possibly different lengths[J].Journal of Applied Econometrics,2006,1(2):147-173

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!