基于Spark的3D点云数据空间索引技术

doi:10.11896／j.issn.1002-137X.2018.09.035

摘要/Abstract

摘要： 针对Spark引擎不支持多维空间查询的问题,提出基于R树的二级空间索引,即在每个Worker节点上创建R子树,并将这些子树作为孩子,在Master节点上创建R树。针对LRU算法内存替换粒度粗、结果不够精确的问题,提出基于数据使用权重的内存替换方法。该方法将每次实际使用数据量与其总量的比值作为替换权重,将热点场景数据以RDD形式持久化至内存中,提高了基于内存查询的效率。根据远粗近细的视觉原理提出细节层次查询,该方法将最能代表物体特征的点云数据先传输给客户端,或者仅把简化模型点数据传给客户端,以解决网络带宽不足和数据加载延迟的问题。实验证明,文中方法能有效解决Spark多维空间的查询问题,查询效率得到了明显提高。

关键词: 3D点云数据, Spark, 多维空间索引, 数据使用权重, 细节层次, 虚拟旅游

Abstract: Two level spatial index based on R tree was presented according to the problem that spark engine doesn’t support multi-dimensional spatial query,that is,the R subtree is created on each worker node,and these subtrees are used as children to create the R tree on the master node.Memory replacement granularity of LRU algorithm is coarse,and the result is not accurate enough.For this reason,the method of memory replacement based on data usage weight was proposed.The ratio of actual used amount of data and its total amount is used as replacement weight.The method stores the hot scene data in RDD form into memory and improves the query efficiency based on memory.According to the visual principle of far thick and near fine,the level of detail query was presented.The point cloud data that best represent the object characteristics are firstly transmitted or the simplified model data are only transmitted to the client,so as to solve the problem of insufficient network bandwidth and data loading delay.Experimental results show that the proposed method can effectively solve the problem of multi-dimensional spatial query on spark,and the query efficiency is improved obviously.

Key words: 3D point cloud data, Data usage weight, Level of detail, Multi-dimensional spatial index, Spark, Virtual tourism

中图分类号:

TP392

赵尔平, 孟小峰. 基于Spark的3D点云数据空间索引技术[J]. 计算机科学, 2018, 45(9): 213-219. https://doi.org/10.11896／j.issn.1002-137X.2018.09.035

ZHAO Er-ping, MENG Xiao-feng. Spatial Index of 3D Point Cloud Data Based on Spark[J]. Computer Science, 2018, 45(9): 213-219. https://doi.org/10.11896／j.issn.1002-137X.2018.09.035

参考文献

[1]ELWIRA H,JERZY W,ROBERT S,et al.Color based Algo-rithm for Automatic Merging of Multiview 3D Point Clouds[J].ACM Journal on Computing and Cultural Heritage,2014,7(3):1-21.
[2]RUFAEL M,PABLO C.MP3DG-PCC Open Source Software
Framework for Implementation and Evaluation of Point Cloud Compression[C]∥Proceedings of the 2016 ACM Conference on Multimedia Conference.New York:ACM,2016:1222-1226.
[3]ROMULO G,TOMV T,KOSTIS K,et al.Aspatial column
store to triangulate the Netherlands on the fly[C]∥Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.NewYork:ACM,2016:1-4.
[4]ZHANG R,QI J Z,MARTIN S,et al.Towardsa Painless Index for Spatial Objects[J].ACM Transactions on Database Systems,2014,39(3):1-19.
[5]RIZKI P N M,PARK J S,OH S,et al.STR-Octree Indexing Method for Processing LiDAR Data[C]∥Proceedings of IEEE SENSORS.NewYork:IEEE Press,2015:1-4.
[6]YANG H,QI W T,GAO X F.Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1503-1571.
[7]DENG Y C,JACK C,CHENG P.Automatic Transformation of Different Levels of Detail in 3D GIS City Models in CityGML[J].International Journal of 3-D Information Modeling,2015,4(3):1-21.
[8]NICHOLAS F P,ANKIT S,PETER S,et al.A Novel Level-of-Detail Technique for Virtual City Environments[C]∥Procee-dings of the 21st International Conference on Web3D Technology.New York:ACM,2016:183-184.
[9]DENG H Y,WU F,ZHAI R J,et al.R-TreeIndex Structure for Multi-Scale Representation of Spatial Data[J].Chinese Journal of Computers,2009,32(1):177-184.(in Chinese)
邓红艳,武芳,翟仁健,等.一种用于空间数据多尺度表达的 R 树索引结构[J].计算机学报,2009,32(1):177-184.
[10]SONG B Y,WANG J L,WANG Y,et al.Optimized Storage Strategy Research of HDFS Based on Vandermonde Code [J].Chinese Journal of Computers,2015,38(9):1825-1837.(in Chinese)
宋宝燕,王俊陆,王妍,等.基于范德蒙码的HDFS优化存储策略研究[J].计算机学报,2015,38(9):1825-1837.
[11]VINITHA R G,NIKHIL T.Indexing HDFS Data in PDW:Splitting the data from the index[C]∥Proceedings of the 40th International Conference on Very Large Database.NewYork:PVLDB,2014:1520-1528.
[12]JAMES G,SHA N H,LIANGD.Large Scale Distributed Data Science from Scratch using Apache Spark 2.0[C]∥Proceedings of the 26th International Conference on World Wide Web Companion.New York:ACM,2017:955-957.
[13]RANDALL T W,MICHAEL B P,SARA M A,et al.Spatial indexing and analytics on Hadoop[C]∥Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.NewYork:ACM,2014:73-82.
[14]ZHONG Y Q,ZHU X M,FANG J Y.Elastic and Effective Spatio-Temporal Query Processing Scheme on Hadoop[C]∥Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analyticsfor Big GeospatialData.NewYork:ACM,2012:33-42.
[15]AHMED E,MOHAMED F M.A Demonstration of SpatialHadoop:AnEfficientMapReduce Framework for Spatial Data[J].PVLDB Endowment,2013,6(12):1230-1233.
[16]WAIL Y A,SATTAM A,MICHAEL J C.Largescale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark[J].Proceedings of the Vidb Endowment,2016,9(13):1585-1588.
[17]GU J P,WU Z B,LI Y L,et al.Parallel Optimization of Pixel Purity Index Algorithm for Hyperspectral Unmixing Based on Spark[C]∥Proceedings of Third International Conference on Advanced Cloud and Big Data.New York:IEEE Press,2015:159-166.
[18]WANG F Y,WANG X Z,CUI W J,et al.Distributed retrieval for massive remote sensing image metadata on spark[C]∥Proceedings of 2016 IEEE International Geoscience and Remote Sensing Symposium.New York:IEEE Press,2016:5909-5912.
[19]ZHANG R,QI J Z,MARTIN S,et al.Towards a Painless Index for Spatial Objects[J].ACM Transactions on Database Systems,2014,39(3):1-41.
[20]PARTH P,DEEPAK G.Perfect Hashing Base R-tree for Multiple Queries[C]∥Proceedings of 2014 IEEE International Advance Computing Conference.NewYork:IEEE Press,2014:636-640.
[21]BIAN C,YU J,YING C T,et al.Self-Adaptive Strategy for
Cache Management in Spark[J].Acta Electronica Sinica,2017,45(2):278-284.(in Chinese)
卞琛,于炯,英昌甜,等.并行计算框架Spark的自适应缓存管理策略[J].电子学报,2017,45(2):278-284.
[22]SHI J W,QIU Y J,UMAR F M,et al.Clash of the Titans:Map-Reduce vs.Spark for Large Scale Data Analytics [C]∥Procee-dings of the 41st International Conference on Very Large Database.New York:PVLDB,2015:2110-2121.

相关文章 15

[1]	戴宏亮, 钟国金, 游志铭, 戴宏明. 基于Spark的舆情情感大数据分析集成方法 Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark 计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280
[2]	俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法 Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark 计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129
[3]	杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对 Streaming Parallel Text Proofreading Based on Spark Streaming 计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070
[4]	朱岸青, 李帅, 唐晓东. Spark平台中的并行化FP_growth关联规则挖掘方法 Parallel FP_growth Association Rules Mining Method on Spark Platform 计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110
[5]	禹鑫燚, 施甜峰, 唐权瑞, 殷慧武, 欧林林. 面向预测性维护的工业设备管理系统 Industrial Equipment Management System for Predictive Maintenance 计算机科学, 2020, 47(11A): 667-672. https://doi.org/10.11896/jsjkx.200100091
[6]	邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用 Application of Improved DBSCAN Algorithm on Spark Platform 计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071
[7]	贾宁, 李瑛达. 基于智能可穿戴设备的个性化健康监管平台的构建 Construction of Personalized Health Monitoring Platform Based on Intelligent Wearable Device 计算机科学, 2019, 46(6A): 566-570.
[8]	赵俊先, 喻剑. 基于RDD非序列化本地存储的Spark存储性能优化 Optimization of Spark RDD Based on Non-serialization Native Storage 计算机科学, 2019, 46(5): 143-149. https://doi.org/10.11896/j.issn.1002-137X.2019.05.022
[9]	魏亮, 林子雨, 赖永炫. DFTS:面向大数据集的Top-k Skyline查询算法 DFTS:A Top-k Skyline Query for Large Datasets 计算机科学, 2019, 46(5): 150-156. https://doi.org/10.11896/j.issn.1002-137X.2019.05.023
[10]	崔光范, 许利杰, 刘杰, 叶丹, 钟华. 基于Spark SQL的分布式全文检索框架的设计与实现 Design and Implementation of Distributed Full-text Search Framework Based on Spark SQL 计算机科学, 2018, 45(9): 104-112. https://doi.org/10.11896／j.issn.1002-137X.2018.09.016
[11]	廖湖声, 黄珊珊, 徐俊刚, 刘仁峰. Spark性能优化技术研究综述 Survey on Performance Optimization Technologies for Spark 计算机科学, 2018, 45(7): 7-15. https://doi.org/10.11896／j.issn.1002-137X.2018.07.002
[12]	石进平,李劲,和凤珍. 基于社交关系和用户偏好的多样性图推荐方法 Diversity Recommendation Approach Based on Social Relationship and User Preference 计算机科学, 2018, 45(6A): 423-427.
[13]	史经启,杨庚,孙彦珺,白双杰,闵兆娥. 支持浮点运算的高效并行全同态加密算法 Efficient Parallel Algorithm of Fully Homomorphic Encryption Supporting Operation of Floating-point Number 计算机科学, 2018, 45(5): 116-122. https://doi.org/10.11896/j.issn.1002-137X.2018.05.020
[14]	彭徵, 王灵矫, 郭华. 基于随机森林的文本分类并行化 Parallel Text Categorization of Random Forest 计算机科学, 2018, 45(12): 148-152. https://doi.org/10.11896／j.issn.1002-137X.2018.12.023
[15]	冯贵兰, 周文刚. 基于Spark平台的并行KNN异常检测算法 Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor 计算机科学, 2018, 45(11A): 349-352.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed