计算机科学 ›› 2019, Vol. 46 ›› Issue (5): 150-156.doi: 10.11896/j.issn.1002-137X.2019.05.023

所属专题: 数据库技术

• 软件与数据库技术 • 上一篇    下一篇

DFTS:面向大数据集的Top-k Skyline查询算法

魏亮1, 林子雨1, 赖永炫2,3   

  1. (厦门大学信息科学与技术学院 福建 厦门361005)1
    (厦门大学软件学院 福建 厦门361005)2
    (厦门大学深圳研究院 广东 深圳518000)3
  • 收稿日期:2018-07-12 修回日期:2018-09-15 发布日期:2019-05-15
  • 作者简介:魏 亮(1992-),男,硕士生,主要研究领域为数据挖掘、大数据等;林子雨(1978-),男,CCF会员,主要研究领域为数据库、数据挖掘、大数据等,E-mail:ziyulin@xmu.edu.cn(通信作者);赖永炫(1981-),男,副教授,CCF会员,主要研究领域为数据库、传感器网络数据管理等。
  • 基金资助:
    国家自然科学基金(61672441),深圳市基础研究计划(JCYJ20170818141325209)资助。

DFTS:A Top-k Skyline Query for Large Datasets

WEI Liang1, LIN Zi-yu1, LAI Yong-xuan2,3   

  1. (School of Information Science and Engineering,Xiamen University,Xiamen,Fujian 361005,China)1
    (School of Software,Xiamen University,Xiamen,Fujian 361005,China)2
    (Shenzhen Research Institute,Xiamen University,Shenzhen,Guangdong 518000,China)3
  • Received:2018-07-12 Revised:2018-09-15 Published:2019-05-15

摘要: Top-k Skyline查询结合了Top-k与Skyline的特性,可以在数据集中找到最好的点。但是,现有的算法在大数据环境下具有较高的时间开销。文中提出一种新的算法DFTS,其可以高效地在大数据集中进行Top-k Skyline查询。DFTS包括3个步骤:首先,利用度值评价函数对数据集进行排序,快速过滤掉大量的点,仅保留足够少的候选集;然后,对候选集进行Skyline查询计算,进一步排除掉Skyline集合外的点;最后,筛选出Top-k的数据点作为最终结果。通过这种方式,DFTS有效减少了算法的运行时间。从理论上证明了DFTS查询的最终结果符合Top-k Skyline查询的要求。基于大数据集的大量实验表明,DFTS具有比现有算法更好的性能。

关键词: ApacheSpark, Skyline, Top-k

Abstract: Top-k Skyline query combines the features of Top-k query and Skyline,which can find the best object in the datasets.However,the available methods can not fit to large datasets well.An efficient Top-k Skyline query method called DFTS was proposed,which can perform well for large datasets.DFTS involves three steps.Firstly,the degreescore function is used to rank the dataset,and a large quantity of objects with low ranking will be filtered out.Secondly,DFTS makes a Skyline query upon the candidates and generates a Skyline subset.Finally,top-k objects with high ran-king will be selected from the Skyline subset as the final result.Through these steps,DFTS can significantly reduce the time cost.It is proved that the results of DFTS satisfy the demand of Top-k Skyline query.Extensive experimental results show that DFTS can achieve much better performance for large datasets than state-of-the-art methods.

Key words: Apache Spark, Skyline, Top-k

中图分类号: 

  • TP311
[1]BÖRZSÖNYI S,KOSSMANN D,STOCKER K.The Skyline operator[J].Data Engineering,2001,1:433-442.
[2]CHAN C Y,JAGADISH H V,TAN K L,et al.Finding k-dominant skylines in high dimensional space[C]∥Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data.Chicago:ACM,2006:503-514.
[3]SHARIFZADEH M,SHAHABI C.The spatial skyline queries[C]∥Proceedings of the 32nd International Conference on Very Large Data Bases.Seoul,Korea:VLDB Endowment,2006:751-762.
[4]MAN L Y,MAMOULIS N.Efficient processing of top-k dominating queries on multi-dimensional data[C]∥Proceedings of the 33rd International Conference on Very Large Data Bases.Vienna:VLDB Endowment,2007:483-494.
[5]ENDRES M,PREISINGER T.Behind the skyline[J].Procee-dings of DBKDA,2015,15.
[6]PREISINGER T,ENDRES M.Looking for the best,but not too many of them:multi-level and top-k skylines[J].Int.J.Adv.Softw,2015,8(3):4.
[7]SIDDIQUE M A,TIAN H,MORIMOTO Y.Selecting Representative Objects from Large Database by Using K-Skyband and Top-k Dominating Queries in MapReduce Environment[M]∥Advanced Data Mining and Applications.Springer International Publishing,2014:560-572.
[8]CHOMICKI J,GODFREY P,GRYZ J,et al.Skyline with presorting:Theory and optimizations[M]∥Intelligent Information Processing and Web Mining.Springer,Berlin,Heidelberg,2005:595-604.
[9]GODFREY P,SHIPLEY R,GRYZ J.Maximal vector computation in large data sets[C]∥Proceedings of the 31st International Conference on Very Large Data Bases.Trondheim:VLDB Endowment,2005:229-240.
[10]LEE K C K,ZHENG B,LI H,et al.Approaching the skyline in Z order[C]∥Proceedings of the 33rd International Conference on Very Large Data Bases.Vienna:VLDB Endowment,2007:279-290.
[11]LEE K C K,LEE W C,ZHENG B,et al.Z-SKY:an efficientskyline query processing framework based on Z-order[J].The VLDB Journal,2010,19(3):333-362.
[12]CHAUDHURI S,DALVI N,KAUSHIK R.Robust cardinality and cost estimation for skyline operator[C]∥22nd International Conference on Data Engineering (ICDE’06).Atlanta:IEEE,2006:64-64.
[13]ZHANG Z,YANG Y,CAI R,et al.Kernel-based skyline cardinality estimation[C]∥Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data.Providence:ACM,2009:509-522.
[14]TAn K L,Eng P K,Ooi B C.Efficient Progressive Skyline Computation[C]∥ International Conference on Very Large Data Bases.Rome:Morgan Kaufmann Publishers Inc.2001:301-310.
[15]KOSSMANN D,RAMSAK F,ROST S.Shooting stars in the sky:An online algorithm for skyline queries[C]∥Proceedings of the 28th International Conference on Very Large Data Bases.Hong Kong:VLDB Endowment,2002:275-286.
[16]PAPADIAS D,TAO Y,FU G,et al.Progressive skyline computation in database systems[J].Acm Transactions on Database Systems,2005,30(1):41-82.
[17]PAPADIAS D,TAO Y,FU G,et al.An optimal and progressive algorithm for skyline queries[C]∥Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data.San Diego:ACM,2003:467-478.
[1] 朱润泽, 秦小麟, 刘嘉琛.
基于查询对象的路网Skyline查询中Why-not问题的研究
Study on Why-not Problem in Skyline Query of Road Network Based on Query Object
计算机科学, 2021, 48(6): 57-62. https://doi.org/10.11896/jsjkx.200700016
[2] 王妍, 韩笑, 曾辉, 刘荆欣, 夏长清.
边缘计算环境下服务质量可信的任务迁移节点选择
Task Migration Node Selection with Reliable Service Quality in Edge Computing Environment
计算机科学, 2020, 47(10): 240-246. https://doi.org/10.11896/jsjkx.190900054
[3] 周剑刚, 秦小麟, 张珂珩, 许建秋.
基于道路网的多移动用户动态Skyline查询
Dynamic Skyline Query for Multiple Mobile Users Based on Road Network
计算机科学, 2019, 46(9): 73-78. https://doi.org/10.11896/j.issn.1002-137X.2019.09.009
[4] 齐玉东,何诚,司维超.
MapReduce框架下的Skyline云资源选择算法
Cloud Resource Selection Algorithm by Skyline under MapReduce Frame
计算机科学, 2018, 45(6A): 411-414.
[5] 孙志, 孙雪姣.
P2P环境中的skyline查询综述
Survey of Skyline Processing in P2P Environments
计算机科学, 2018, 45(11A): 63-70.
[6] 董雷刚,刘国华,崔晓微.
PPQ:一种基于区域划分的c-skyline查询算法
PPQ:Finding Combinatorial Skyline Based on Partition
计算机科学, 2018, 45(1): 267-272. https://doi.org/10.11896/j.issn.1002-137X.2018.01.047
[7] 戴华,叶庆群,杨庚,肖甫,何瑞良.
两层传感器网络中安全Top-k查询处理技术综述
Overview of Secure Top-k Query Processing in Two-tiered Wireless Sensor Networks
计算机科学, 2017, 44(5): 6-13. https://doi.org/10.11896/j.issn.1002-137X.2017.05.002
[8] 余未,郑吉平,王海翔,王永阁,陈嘉良,江顺青.
空间Skyline查询处理:应用、研究与挑战
Spatial Skyline Queries:Applications,Research and Challenges
计算机科学, 2017, 44(2): 1-16. https://doi.org/10.11896/j.issn.1002-137X.2017.02.001
[9] 李青,肖迎元,王晓晔,李玉坤.
无线传感器网络中基于聚簇结构的Skyline查询方法
Clustering Architecture-based Skyline Query Processing in Wireless Sensor Networks
计算机科学, 2017, 44(10): 177-181. https://doi.org/10.11896/j.issn.1002-137X.2017.10.033
[10] 郑诗敏,秦小麟,刘亮,周倩.
流数据Top-K关键字查询算法
Algorithm for Top-K Keyword Query in Data Streams
计算机科学, 2016, 43(8): 142-147. https://doi.org/10.11896/j.issn.1002-137X.2016.08.030
[11] 崔文相,肖迎元,郝刚,王洪亚,邓华锋.
基于MapReduce的Skyline查询处理算法
MapReduce-based Skyline Query Processing Algorithm
计算机科学, 2016, 43(6): 35-38. https://doi.org/10.11896/j.issn.1002-137X.2016.06.007
[12] 郭长友,郑雪峰,高秀莲.
基于不确定理论的不确定性数据Top-k查询计算
Top-k Query Calculation of Uncertain Data Based on Uncertainty Theory
计算机科学, 2016, 43(3): 225-230. https://doi.org/10.11896/j.issn.1002-137X.2016.03.041
[13] 赵法信,金义富.
Vague数据库Skyline查询技术研究
Study on Skyline Query for Vague Database
计算机科学, 2015, 42(8): 236-239.
[14] 张卫华,李小勇,马俊,余杰.
REPS:一种高效的容错并行概率流Skyline查询方法
REPS:An Efficient Fault-tolerant Approach for Parallel Skyline Queries over Probabilistic Data Streams
计算机科学, 2015, 42(8): 225-230.
[15] 张燕平,荆紫慧,张以文,钱付兰,石 磊.
基于离散粒子群算法的动态Web服务组合
Dynamic Web Service Composition Based on Discrete Particle Swarm Optimization
计算机科学, 2015, 42(6): 71-75. https://doi.org/10.11896/j.issn.1002-137X.2015.06.016
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!