计算机科学 ›› 2016, Vol. 43 ›› Issue (6): 35-38.doi: 10.11896/j.issn.1002-137X.2016.06.007

• 目次 • 上一篇    下一篇

基于MapReduce的Skyline查询处理算法

崔文相,肖迎元,郝刚,王洪亚,邓华锋   

  1. 天津理工大学计算机与通信工程学院 天津300384;天津市智能计算及软件新技术重点实验室 天津300384,天津理工大学计算机与通信工程学院 天津300384;天津市智能计算及软件新技术重点实验室 天津300384,天津理工大学计算机与通信工程学院 天津300384;天津市智能计算及软件新技术重点实验室 天津300384,东华大学计算机科学与技术学院 上海201620,江西师范大学计算机信息工程学院 南昌330022
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61170174,5),天津市创新团队计划(TD12-5016)资助

MapReduce-based Skyline Query Processing Algorithm

CUI Wen-xiang, XIAO Ying-yuan, HAO Gang, WANG Hong-ya and DENG Hua-feng   

  • Online:2018-12-01 Published:2018-12-01

摘要: Skyline查询是一个典型的多目标优化查询,在多目标优化、数据挖掘等领域有着广泛的应用。现有的Skyline查询处理算法大都假定数据集存放在单一数据库服务器中,查询处理算法通常也被设计成针对单一服务器的串行算法。随着数据量的急剧增长,特别是在大数据背景下,传统的基于单机的串行Skyline算法已经远远不能满足用户的需求。基于流行的分布式并行编程框架MapReduce,研究了适用于大数据集的并行Skyline查询算法。针对影响MapReduce计算的因素,对现有基于角度的划分策略进行了改进,提出了Balanced Angular划分策略;同时,为了减少Reduce过程的计算量,提出了在Map端预先进行数据过滤的策略。实验结果显示所提出的Skyline查询算法能显著提升系统性能。

关键词: MapReduce,Skyline,数据划分

Abstract: Skyline query is a typical multi-objective optimization problem and is widely applied in multi-objective optimization,data mining and other fields.Most of the existing Skyline query processing algorithms assume that the data set is placed in a single server,and query processing algorithm is designed as a serial algorithm for a single server.With the rapid growth of data,especially under the background of big data,the traditional serial Skyline algorithms based on a single computer are far from enough to meet the needs of users.Based on the popular distributed parallel programming framework MapReduce,this paper studied the parallel skyline query algorithm suitable for large data sets.Aiming at the factors affecting MapReduce,this paper improved the existing data partition strategy based on angles and proposed the data partition strategy based on Balanced Angular.Meanwhile,to reduce the computation of Reduce phase,this paper proposed the data filtering strategy in advance at Map.The experimental results show that the proposed Skyline query algorithm can improve system performance significantly.

Key words: MapReduce,Skyline,Data partition

[1] Balke W T,Güntzer U.Multi-objective query processing for database systems[C]∥Int’l Conference on Very Large Data Bases (VLDB).2004:936-947
[2] Wei Xiao-juan,Yang Jing,Li Cui-ping,et al.Skyline query[J].Journal of Software,2008,9(6):1386-1400(in Chinese) 魏小娟,杨婧,李翠平,等.Skyline查询处理[J].软件学报,2008,9(6):1386-1400
[3] Kung H T,Luccio F,Preparata F P.On finding the maxima of a set of vectors[J].Journal of ACM,1975,2(4):469-476
[4] Preparata F P,Shamos M I.Computational Geometry:An introduction[M].New York:Springer-Verlag,1985
[5] Brzsnyi S,Kossmann D,Stocker K.The skyline operator[C]∥IEEE International Conference on Data Engineering (ICDE).2001:421-430
[6] Chomicki J,Godfrey P,Gryz J,et al.Skyline with presorting[C]∥IEEE International Conference on Data Engineering (ICDE).2003:717-816
[7] Godfrey P,Shipley R,Gryz J,et al.Maximal vector computation in large data sets[C]∥Int’l Conference on Very Large Data Bases (VLDB).2005:229-240
[8] Bartolini I,Ciaccia P,Patella M,et al.Efficient sort-based skyline evaluation[J].ACM Transactions on Database Systems (TODS),2008,3(4):1-45
[9] Tan K L,Eng P K.Efficient progressive skyline computation[C]∥IEEE International Conference on Data Engineering (ICDE).2001:301-310
[10] Papadias D,Tao Y.An optimal and progressive algorithm forskyline queries[C]∥ACM SIGMOD Int’l.Conference on Ma-nagement of Data (SIGMOD).2003:467-478
[11] Papadias D,Tao Y.Progressive skyline computation in database systems[J].ACM Transactions on Database Systems (TODS),2005,30(1):41-82
[12] Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM,2008,51(1):107-113
[13] Zhang Bo-liang,Zhou Shui-geng,Guan Ji-hong.Skyline under Map-Reduce framework[J].Computer Science and Exploration,2011,5(5):385-397(in Chinese) 张波良,周水庚,关佶红.MapReduce框架下的Skyline计算[J].计算机科学与探索,2011,5(5):385-397
[14] Chen L,Hwang K,Wu J.MapReduce Skyline Query Processing with A New Angular Partitioning Approach[C]∥Proc.of International Parallel and Distributed Processing Symposium.2012:2262-2270

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!