计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 67-73.doi: 10.11896/jsjkx.190800143
韦建华, 许建秋
WEI Jian-hua, XU Jian-qiu
摘要: 时态数据在医疗、经济和电子商务等领域有着广泛的应用。由于时间的测量技术不精确等因素,时态数据具有不确定性。文中针对该数据进行研究,处理Top-k查询,即返回与查询点相交的k个权值最大的数据,该权值是根据数据权值和相交概率按一定规则组合计算所得。为有效解决该查询问题,提出了一个基于关系模型和辅助结构的2D R-tree结构,其中关系模型用于管理所有区间数据的R-tree,辅助结构用于管理R-tree中每个节点内部数据权值的大小关系。基于该结构,提出了按权值的降序访问数据的查询算法。从根节点开始遍历R-tree,对于与查询点相交的节点,根据辅助结构中存储的信息找到数据权值最大的项,将它确定为下一个访问对象。实验使用数据规模在30万到1000万的合成数据集,以及包括大约320万条的航班信息的真实数据集。在可扩展数据库SECONDO系统下,将所提方法与无索引方法、R-tree和区间树方法在性能上进行比较,并以平均I/O访问次数和CPU时间作为性能的评判指标。实验结果表明,在1000万条的数据规模下,所提方法优于对比方法2~3个数量级。通过将实验返回的k个结果的概率与权值和实际相交数据的概率和权值作比较可以发现,实验返回的k个结果的概率与权值均靠近实际相交数据的概率和权值的最大值,因此所提算法可行且有效。
中图分类号:
[1] LI R,ZHANG X,ZHOU X,et al.INK:A Cloud-Based System for Efficient Top-k Interval Keyword Search[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.New York,2014:2003-2005. [2] SNODGRASS R T,AHN I.A Taxonomy of Time in Databases.[J].ACM,1985,14(4):236-246. [3] UYSAL M.Efficiently Processing Queries on Interval-and-Va-lue Tuples in Relational Databases[C]//International Confe-rence on Very Large Data Bases.DBLP,2005:385-396. [4] ARGE L,VITTER J S.Optimal external memory interval ma-nagement [J].SIAMJ.2003,32(6):1488-1508. [5] HAMPEL M.Joining Interval Data in Relational Databases[C]//ACM SIGMOD International Conference on Management of Data.DBLP,2004:683-694. [6] LAYER R M,SKADRON K,ROBINS G,et al.Binary Interval Search:a scalable algorithm for counting interval intersections[J].Bioinformatics,2013,29(1):1-7. [7] AGARWAL P K,ARGE L,YI K.An optimal dynamic interval stabbing-max data structure[C]//Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms.Vancouver,SODA,2005:803-812. [8] SFAKIANAKIS G,PATLAKAS I,NTARMOS N,et al.Interval indexing and querying on key-value cloud stores[C]//Proceedings of IEEE 29th International Conference on Data Engineering.Brisbane:IEEE Press,2013:805-816. [9] XU J Q,LU H.Efficiently answer top-k queries on typed intervals[J].Information Systems,2017,71:164-181. [10] LI S,WANG H.Dispersing points on intervals [J].Discrete Applied Mathematics,2018,239:106-118. [11] COWLEY W,PLEXOUSAKIS D.An Interval Algebra for Indeterminate Time[C]//Seventeenth National Conference on Artificial Intelligence.AAAI,2000:470-475. [12] PLESNIEWICZ G S,VU NGUEN T M,DMITRY M.Queryanswering over fact bases for fuzzy interval ontologies[C]//Proceedings of the 2nd International Workshop on Soft Computing Applications and Knowledge Discovery.Moscow,2016:93-100. [13] WU W Y,LI Y,NI Z W,et al.Probabilistic interval-valued hesitant fuzzy information aggregation operators and their application to multi-attribute decision making[J].Algorithms,2018,11(8):120-128. [14] ZHANG S C.Interval-Gap-Based Representation of Temporal Knowledge[J].Journal of Software,1994,5(6):49-54. [15] LIN J Y,PENG H,XIE J M.Uncertain temporal information representation and the extensions of temporal operation[J].Computer Science,2005,32(8):161-163. [16] KATERINA P,MARTIN T,MICHAEL B.Supporting Set Op-erations in Temporal-Probabilistic Databases[C]//34th IEEE. International Conference on Data Engineering(ICDE).2018:1180-1191. [17] BLANKENAGEL G.External segment trees[J].Algorithmica,1994,12(6):498-532. [18] IMAI H,ASANO T.Dynamic orthogonal segment intersection search[J].Journal of Algorithms,1987,8(1):1-18. [19] MCCREIGHT E M.Priority search trees[J].SIAM J,1985,14(2):257-276. [20] GUY D T MOL R D,BRONSELAER A.Indexing Possibilistic Numerical Data:The Interval B+-tree Approach[M]//Information Processing and Management of Uncertainty in Knowledge-Based Systems.Springer International Publishing,2016:305-316. [21] MEI W,MENG X,PENG S,et al.A hybrid index for temporal big data[J].China Modern Medicine,2017,72:264-272. [22] KRIEGEL H P,PÖTKE,MARCO,et al.Managing IntervalsEfficiently in Object-Relational Databases[C]//International Conference on Very Large Data Bases.Morgan Kaufmann Publishers Inc,2000:407-418. [23] BLIUJUTE R,JENSEN C S,SALTENIS S,et al.R-tree Based Indexing of Now-Relative Bitemporal Data[C]//International Conference on Very Large Data Bases.Morgan Kaufmann Publishers Inc.,1998:345-356. [24] ZENG X J.Temporal index trees for uncertain temporal information [J].Microcomputer Information,2011(5):236-237. [25] GÜTING R H,BEHR T,DÜNTGEN C.SECONDO:A plat-form for moving objects database rsearch and for publishing and integrating research implementations[J].IEEE Data Eng,2010,33(2):56-63. |
[1] | 林潮伟, 林兵, 陈星. 边缘环境下基于模糊理论的科学工作流调度研究 Study on Scientific Workflow Scheduling Based on Fuzzy Theory Under Edge Environment 计算机科学, 2022, 49(2): 312-320. https://doi.org/10.11896/jsjkx.201000102 |
[2] | 穆聪聪, 王一舒, 袁野, 乔百友, 马玉亮. 时序图中Top-k稠密子图查询算法研究 Top-k Densest Subgraphs Search in Temporal Graphs 计算机科学, 2021, 48(10): 152-159. https://doi.org/10.11896/jsjkx.201100005 |
[3] | 杨洁,王国胤,李帅. 基于边界域的邻域知识距离度量模型 Neighborhood Knowledge Distance Measure Model Based on Boundary Regions 计算机科学, 2020, 47(3): 61-66. https://doi.org/10.11896/jsjkx.190500174 |
[4] | Renata WONG. 量子计算与不确定性原理 Uncertainty Principle as Related to Quantum Computation 计算机科学, 2020, 47(1): 40-50. https://doi.org/10.11896/jsjkx.190400432 |
[5] | 杨洁, 王国胤, 张清华, 冯林. 层次粒结构下粗糙模糊集的不确定性度量 Uncertainty Measure of Rough Fuzzy Sets in Hierarchical Granular Structure 计算机科学, 2019, 46(1): 45-50. https://doi.org/10.11896/j.issn.1002-137X.2019.01.007 |
[6] | 郑宏亮, 侯雪辉, 宋笑迎, 庞阔, 邹丽. 基于犹豫模糊可信度的知识推理 Approach for Knowledge Reasoning Based on Hesitate Fuzzy Credibility 计算机科学, 2019, 46(1): 131-137. https://doi.org/10.11896/j.issn.1002-137X.2019.01.020 |
[7] | 许华杰, 吴青华, 胡小明. 基于轨迹多特性的隐私保护算法 Privacy Protection Algorithm Based on Multi-characteristics of Trajectory 计算机科学, 2019, 46(1): 190-195. https://doi.org/10.11896/j.issn.1002-137X.2019.01.029 |
[8] | 夏扬波, 杨文忠, 张振宇, 王庆鹏, 石研. 一种移动无线传感器网络的节点位置预测方法 Node Position Prediction Method for Mobile Wireless Sensor Networks 计算机科学, 2018, 45(8): 113-118. https://doi.org/10.11896/j.issn.1002-137X.2018.08.020 |
[9] | 汪琳娜, 杨新, 杨习贝. 监督邻域粗糙集 Supervised Neighborhood Rough Set 计算机科学, 2018, 45(8): 186-190. https://doi.org/10.11896/j.issn.1002-137X.2018.08.033 |
[10] | 毛莺池,陈杨. 不确定性车辆路口的轨迹预测 Uncertain Vehicle Intersection Trajectory Prediction 计算机科学, 2018, 45(3): 235-240. https://doi.org/10.11896/j.issn.1002-137X.2018.03.037 |
[11] | 戴华, 保静静, 朱向洋, 易训, 杨庚. 面向云环境的一致性可验证单关键词检索方法 Integrity-verifying Single Keyword Search Method in Clouds 计算机科学, 2018, 45(12): 92-97. https://doi.org/10.11896/j.issn.1002-137X.2018.12.014 |
[12] | 魏方圆,黄德才. 基于区间数的多维不确定性数据UID-DBSCAN聚类算法 UID-DBSCAN Clustering Algorithm of Multi-dimensional Uncertain Data Based on Interval Number 计算机科学, 2017, 44(Z11): 442-447. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.094 |
[13] | 赵凡,魏玲. D型概率决策形式背景下的规则获取 Rule Acquisition of D-type Probabilistic Decision Formal Context 计算机科学, 2017, 44(8): 274-279. https://doi.org/10.11896/j.issn.1002-137X.2017.08.047 |
[14] | 徐俊洁,陈荣. 基于故障关联的多故障概率诊断方法 Probabilistic Diagnosis Approach to Diagnosing Multiple-fault Programs with Fault Correlation 计算机科学, 2017, 44(4): 124-130. https://doi.org/10.11896/j.issn.1002-137X.2017.04.027 |
[15] | 刘德浩,王倩. 一种具有相依关系的二维云推理方法及其在预测中的应用 Uncertainty Reasoning Based on Related Planar Cloud and Application in Prediction 计算机科学, 2016, 43(Z6): 99-102. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.024 |
|