计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 62-70.doi: 10.11896/jsjkx.241100052

• 数据库&大数据&数据科学 • 上一篇    下一篇

Truster:面向高效查询的自动驾驶轨迹数据聚类存储方案

王征权1, 彭智勇1,2   

  1. 1 武汉大学计算机学院 武汉 430000
    2 武汉大学大数据研究院 武汉 430000
  • 收稿日期:2024-11-07 修回日期:2025-02-16 出版日期:2025-11-15 发布日期:2025-11-06
  • 通讯作者: 彭智勇(peng@whu.edu.cn)
  • 作者简介:(zq_wang@whu.edu.cn)

Truster:Efficient Query-oriented Clustered Storage Solution for Autonomous Vehicle TrajectoryData

WANG Zhengquan1, PENG Zhiyong1,2   

  1. 1 School of Computer Science,Wuhan University,Wuhan 430000,China
    2 Big Data Institute,Wuhan University,Wuhan 430000,China
  • Received:2024-11-07 Revised:2025-02-16 Online:2025-11-15 Published:2025-11-06
  • About author:WANG Zhengquan,born in 2000,postgraduate,is a member of CCF(No.N4586G).His main research interests include mobile database and storage management.
    PENG Zhiyong,born in 1963,Ph.D supervisor,is a member of CCF(No.06132F).His main research interests include advanced database management and Web information management,etc.

摘要: 自动驾驶轨迹数据具有重要的研究与应用价值,其存储与查询技术得到了广泛的研究与关注。然而,现有轨迹数据管理方案主要针对一般轨迹数据设计,无法支持高频采样的自动驾驶轨迹数据的高效写入,且动态环境下高昂的索引维护开销使其难以满足动态更新与实时查询的需求。针对自动驾驶场景下高采样频率、高实时性的轨迹数据如何实现高频写入、动态更新、实时查询的问题,提出一种面向高效查询的自动驾驶轨迹数据聚类存储方案Truster。该方法设计了编码器和嵌入器,为原始轨迹生成空间感知键并提取特征向量;设计了基于日志结构合并树的存储结构CLSM树,以实现相似轨迹的集中存储;设计了LCC合并策略,在有序字符串表进行合并的同时,利用基于局部敏感哈希的分桶方法进行快速聚类;设计了轨迹查询算法,利用多粒度缓存和分桶映射快速筛选搜索空间。Truster不仅支持高频写入,还能适应动态工作负载的索引维护,且查询效率更高。在真实自动驾驶轨迹数据集Argoverse上进行的对比实验表明,与现有方法相比,Truster在写入操作上取得了20%~200%的加速,在查询操作上取得了20%~100%的加速。

关键词: 轨迹数据, 轨迹存储, 轨迹查询, 日志结构合并树, 局部敏感哈希

Abstract: Autonomous vehicle trajectory data holds significant research and practical value,attracting extensive attention to its storage and querying technologies.However,existing trajectory data management solutions are primarily designed for general trajectory data and exhibit notable limitations in efficiently writing autonomous driving trajectory data with high sampling frequency.Additionally,the high cost of index maintenance in dynamic environments makes it challenging to meet the demands of dynamic updates and real-time queries.To address the challenges of achieving high-frequency writes,dynamic updates,and real-time queries for high-sampling-rate and high-real-time trajectory data in autonomous driving scenarios,this paper proposes Truster,an efficient query-oriented clustered storage solution for autonomous vehicle trajectory data.This method includes the design of an encoder and embedder to generate position-aware keys for raw trajectories and extract feature vectors;a storage structure based on a Log-Structured Merge tree(LSM-tree)called the CLSM-tree to achieve clustered storage of similar trajectories;an LCC compaction strategy that leverages locality-sensitive hashing(LSH)for rapid clustering during the compaction of Sorted-String Tables(SSTables);and a trajectory query algorithm that uses multi-granularity cache and bucket mapping to quickly narrow down the search space.Truster not only supports high-frequency writes but also maintains index adaptability to dynamic workloads while offering enhanced query efficiency.Comparative experiments on the real-world autonomous vehicle trajectory dataset Argoverse demonstrate that Truster achieves a 20% to 200% improvement in write performance and a 20% to 100% improvement in query performance compared to existing methods.

Key words: Trajectory data, Trajectory storage, Trajectory query, LSM-tree, LSH

中图分类号: 

  • TP392
[1]ZEUCH S,CHATZILIADIS X,CHAUDHARY A,et al.NebulaStream:Data Management for the Internet of Things[J].Datenbank-Spektrum,2022,22(2):131-141.
[2]SCHELTER S,LANGE D,SCHMIDT P,et al.Automatinglarge-scale data quality verification[C]//Proceedings of the VLDB Endowment.2018:1781-1794.
[3]VANB J,O'BRIEN M,GRUYER D,et al.Autonomous vehicle perception:The technology of today and tomorrow[J].Transportation Research Part C:Emerging Technologies,2018,89:384-406.
[4]CHANG M F,LAMBERT J,SANGKLOY P,et al.Argoverse:3d tracking and forecasting with rich maps[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:8748-8757.
[5]CAESAR H,BANKITI V,LANG A H,et al.nuscenes:A multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11621-11631.
[6]ZIMÁNYI E,SAKR M,LESUISSE A.MobilityDB:A mobilitydatabase based on PostgreSQL and PostGIS[J].ACM Transactions on Database Systems,2020,45(4):1-42.
[7]ALSUBAIEE S,BEHM A,BORKAR V,et al.Storage management in AsterixDB[C]//Proceedings of the VLDB Endowment.2014:841-852.
[8]SHIN J,WANG J,AREF W G.The LSM RUM-tree:a log structured merge R-tree for update-intensive spatial workloads[C]//2021 IEEE 37th International Conference on Data Engineering.IEEE,2021:2285-2290.
[9]FANG Z,CHEN L,GAO Y,et al.Dragoon:a hybrid and efficient big trajectory management system for offline and online analytics[J].The VLDB Journal,2021,30:287-310.
[10]WANG S,BAO Z,CULPEPPER J S,et al.Torch:A search engine for trajectory data[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:535-544.
[11]LAN H,XIE J,BAO Z,et al.Vre:a versatile,robust,and economical trajectory data system[C]//Proceedings of the VLDB Endowment.2022:3398-3410.
[12]BAO Y,HUANG Z,GONG X,et al.Optimizing segmented tra-jectory data storage with HBase for improved spatio-temporal query efficiency[J].International Journal of Digital Earth,2023,16(1):1124-1143.
[13]KIM Y S,KIM T,CAREY M J,et al.A comparative study of log-structured merge-tree-based spatial indexes for big data[C]//2017 IEEE 33rd International Conference on Data Engineering(ICDE).IEEE,2017:147-150.
[14]O'NEIL P,CHENG E,GAWLICK D,et al.The log-structured merge-tree(LSM-tree)[J].Acta Informatica,1996,33:351-385.
[15]KOGA H,ISHIBASHI T,WATANABE T.Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Ha-shing[J].Knowledge and Information Systems,2007,12:25-53.
[16]SANCHEZ I,AYE Z M M,RUBINSTEIN B I P,et al.Fast tra-jectory clustering using hashing methods[C]//2016 Internatio-nal Joint Conference on Neural Networks(IJCNN).IEEE,2016:3689-3696.
[17]CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:A distri-buted storage system for structured data[J].ACM Transactions on Computer Systems,2008,26(2):1-26.
[18]SEARS R,RAMAKRISHNAN R.bLSM:a general purpose log structured merge tree[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.2012:217-228.
[19]LI P,LU H,ZHENG Q,et al.LISA:A learned index structure for spatial data[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:2119-2133.
[20]JI J,LI J,YAN S,et al.Super-bit locality-sensitive hashing[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.2012:108-116.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!