计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 213-219.doi: 10.11896/j.issn.1002-137X.2018.09.035
赵尔平1, 孟小峰2
ZHAO Er-ping1, MENG Xiao-feng2
摘要: 针对Spark引擎不支持多维空间查询的问题,提出基于R树的二级空间索引,即在每个Worker节点上创建R子树,并将这些子树作为孩子,在Master节点上创建R树。针对LRU算法内存替换粒度粗、结果不够精确的问题,提出基于数据使用权重的内存替换方法。该方法将每次实际使用数据量与其总量的比值作为替换权重,将热点场景数据以RDD形式持久化至内存中,提高了基于内存查询的效率。根据远粗近细的视觉原理提出细节层次查询,该方法将最能代表物体特征的点云数据先传输给客户端,或者仅把简化模型点数据传给客户端,以解决网络带宽不足和数据加载延迟的问题。实验证明,文中方法能有效解决Spark多维空间的查询问题,查询效率得到了明显提高。
中图分类号:
[1]ELWIRA H,JERZY W,ROBERT S,et al.Color based Algo-rithm for Automatic Merging of Multiview 3D Point Clouds[J].ACM Journal on Computing and Cultural Heritage,2014,7(3):1-21. [2]RUFAEL M,PABLO C.MP3DG-PCC Open Source Software Framework for Implementation and Evaluation of Point Cloud Compression[C]∥Proceedings of the 2016 ACM Conference on Multimedia Conference.New York:ACM,2016:1222-1226. [3]ROMULO G,TOMV T,KOSTIS K,et al.Aspatial column store to triangulate the Netherlands on the fly[C]∥Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.NewYork:ACM,2016:1-4. [4]ZHANG R,QI J Z,MARTIN S,et al.Towardsa Painless Index for Spatial Objects[J].ACM Transactions on Database Systems,2014,39(3):1-19. [5]RIZKI P N M,PARK J S,OH S,et al.STR-Octree Indexing Method for Processing LiDAR Data[C]∥Proceedings of IEEE SENSORS.NewYork:IEEE Press,2015:1-4. [6]YANG H,QI W T,GAO X F.Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1503-1571. [7]DENG Y C,JACK C,CHENG P.Automatic Transformation of Different Levels of Detail in 3D GIS City Models in CityGML[J].International Journal of 3-D Information Modeling,2015,4(3):1-21. [8]NICHOLAS F P,ANKIT S,PETER S,et al.A Novel Level-of-Detail Technique for Virtual City Environments[C]∥Procee-dings of the 21st International Conference on Web3D Technology.New York:ACM,2016:183-184. [9]DENG H Y,WU F,ZHAI R J,et al.R-TreeIndex Structure for Multi-Scale Representation of Spatial Data[J].Chinese Journal of Computers,2009,32(1):177-184.(in Chinese) 邓红艳,武芳,翟仁健,等.一种用于空间数据多尺度表达的 R 树索引结构[J].计算机学报,2009,32(1):177-184. [10]SONG B Y,WANG J L,WANG Y,et al.Optimized Storage Strategy Research of HDFS Based on Vandermonde Code [J].Chinese Journal of Computers,2015,38(9):1825-1837.(in Chinese) 宋宝燕,王俊陆,王妍,等.基于范德蒙码的HDFS优化存储策略研究[J].计算机学报,2015,38(9):1825-1837. [11]VINITHA R G,NIKHIL T.Indexing HDFS Data in PDW:Splitting the data from the index[C]∥Proceedings of the 40th International Conference on Very Large Database.NewYork:PVLDB,2014:1520-1528. [12]JAMES G,SHA N H,LIANGD.Large Scale Distributed Data Science from Scratch using Apache Spark 2.0[C]∥Proceedings of the 26th International Conference on World Wide Web Companion.New York:ACM,2017:955-957. [13]RANDALL T W,MICHAEL B P,SARA M A,et al.Spatial indexing and analytics on Hadoop[C]∥Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.NewYork:ACM,2014:73-82. [14]ZHONG Y Q,ZHU X M,FANG J Y.Elastic and Effective Spatio-Temporal Query Processing Scheme on Hadoop[C]∥Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analyticsfor Big GeospatialData.NewYork:ACM,2012:33-42. [15]AHMED E,MOHAMED F M.A Demonstration of SpatialHadoop:AnEfficientMapReduce Framework for Spatial Data[J].PVLDB Endowment,2013,6(12):1230-1233. [16]WAIL Y A,SATTAM A,MICHAEL J C.Largescale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark[J].Proceedings of the Vidb Endowment,2016,9(13):1585-1588. [17]GU J P,WU Z B,LI Y L,et al.Parallel Optimization of Pixel Purity Index Algorithm for Hyperspectral Unmixing Based on Spark[C]∥Proceedings of Third International Conference on Advanced Cloud and Big Data.New York:IEEE Press,2015:159-166. [18]WANG F Y,WANG X Z,CUI W J,et al.Distributed retrieval for massive remote sensing image metadata on spark[C]∥Proceedings of 2016 IEEE International Geoscience and Remote Sensing Symposium.New York:IEEE Press,2016:5909-5912. [19]ZHANG R,QI J Z,MARTIN S,et al.Towards a Painless Index for Spatial Objects[J].ACM Transactions on Database Systems,2014,39(3):1-41. [20]PARTH P,DEEPAK G.Perfect Hashing Base R-tree for Multiple Queries[C]∥Proceedings of 2014 IEEE International Advance Computing Conference.NewYork:IEEE Press,2014:636-640. [21]BIAN C,YU J,YING C T,et al.Self-Adaptive Strategy for Cache Management in Spark[J].Acta Electronica Sinica,2017,45(2):278-284.(in Chinese) 卞琛,于炯,英昌甜,等.并行计算框架Spark的自适应缓存管理策略[J].电子学报,2017,45(2):278-284. [22]SHI J W,QIU Y J,UMAR F M,et al.Clash of the Titans:Map-Reduce vs.Spark for Large Scale Data Analytics [C]∥Procee-dings of the 41st International Conference on Very Large Database.New York:PVLDB,2015:2110-2121. |
[1] | 戴宏亮, 钟国金, 游志铭, 戴宏明. 基于Spark的舆情情感大数据分析集成方法 Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark 计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280 |
[2] | 俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法 Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark 计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129 |
[3] | 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对 Streaming Parallel Text Proofreading Based on Spark Streaming 计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070 |
[4] | 朱岸青, 李帅, 唐晓东. Spark平台中的并行化FP_growth关联规则挖掘方法 Parallel FP_growth Association Rules Mining Method on Spark Platform 计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110 |
[5] | 禹鑫燚, 施甜峰, 唐权瑞, 殷慧武, 欧林林. 面向预测性维护的工业设备管理系统 Industrial Equipment Management System for Predictive Maintenance 计算机科学, 2020, 47(11A): 667-672. https://doi.org/10.11896/jsjkx.200100091 |
[6] | 邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用 Application of Improved DBSCAN Algorithm on Spark Platform 计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071 |
[7] | 贾宁, 李瑛达. 基于智能可穿戴设备的个性化健康监管平台的构建 Construction of Personalized Health Monitoring Platform Based on Intelligent Wearable Device 计算机科学, 2019, 46(6A): 566-570. |
[8] | 赵俊先, 喻剑. 基于RDD非序列化本地存储的Spark存储性能优化 Optimization of Spark RDD Based on Non-serialization Native Storage 计算机科学, 2019, 46(5): 143-149. https://doi.org/10.11896/j.issn.1002-137X.2019.05.022 |
[9] | 魏亮, 林子雨, 赖永炫. DFTS:面向大数据集的Top-k Skyline查询算法 DFTS:A Top-k Skyline Query for Large Datasets 计算机科学, 2019, 46(5): 150-156. https://doi.org/10.11896/j.issn.1002-137X.2019.05.023 |
[10] | 崔光范, 许利杰, 刘杰, 叶丹, 钟华. 基于Spark SQL的分布式全文检索框架的设计与实现 Design and Implementation of Distributed Full-text Search Framework Based on Spark SQL 计算机科学, 2018, 45(9): 104-112. https://doi.org/10.11896/j.issn.1002-137X.2018.09.016 |
[11] | 廖湖声, 黄珊珊, 徐俊刚, 刘仁峰. Spark性能优化技术研究综述 Survey on Performance Optimization Technologies for Spark 计算机科学, 2018, 45(7): 7-15. https://doi.org/10.11896/j.issn.1002-137X.2018.07.002 |
[12] | 石进平,李劲,和凤珍. 基于社交关系和用户偏好的多样性图推荐方法 Diversity Recommendation Approach Based on Social Relationship and User Preference 计算机科学, 2018, 45(6A): 423-427. |
[13] | 史经启,杨庚,孙彦珺,白双杰,闵兆娥. 支持浮点运算的高效并行全同态加密算法 Efficient Parallel Algorithm of Fully Homomorphic Encryption Supporting Operation of Floating-point Number 计算机科学, 2018, 45(5): 116-122. https://doi.org/10.11896/j.issn.1002-137X.2018.05.020 |
[14] | 彭徵, 王灵矫, 郭华. 基于随机森林的文本分类并行化 Parallel Text Categorization of Random Forest 计算机科学, 2018, 45(12): 148-152. https://doi.org/10.11896/j.issn.1002-137X.2018.12.023 |
[15] | 冯贵兰, 周文刚. 基于Spark平台的并行KNN异常检测算法 Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor 计算机科学, 2018, 45(11A): 349-352. |
|