计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 136-145.doi: 10.11896/jsjkx.240600075

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于道格拉斯-普克算法的路网轨迹学习索引结构

缪祝青1, 韩京宇1,2, 李彩云1, 王彦之1, 毛毅1, 张怡婷1   

  1. 1 南京邮电大学计算机学院 南京 210000
    2 江苏省大数据安全与智能处理重点实验室(南京邮电大学) 南京 210000
  • 收稿日期:2024-06-11 修回日期:2024-09-04 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 韩京宇(jyhan@njupt.edu.cn)
  • 作者简介:(1022040911@njupt.edu.cn)
  • 基金资助:
    江苏省重点研发项目(BE2022065-5)

Douglas-Peuker Algorithm Based Learned Index Structure for Road Network Trajectroy Data

MIAO Zhuqing1, HAN Jingyu1,2, LI Caiyun1, WANG Yanzhi1, MAO Yi1, ZHANG Yiting1   

  1. 1 College of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210000,China
    2 Jiangsu Key Laboratory of Big Data Security and Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210000,China
  • Received:2024-06-11 Revised:2024-09-04 Online:2025-08-15 Published:2025-08-08
  • About author:MIAO Zhuqing,born in 1997,postgra-duate,is a student member of CCF(No.O8027G).His main research interests include machine learning and spatio-temporal database.
    HAN Jingyu,born in 1976,Ph.D,professor,is a member of CCF(No.E20-0012101M).His main research in-terests include biomedical data proces-sing,machine learning and spatio-temporal database.
  • Supported by:
    Key Research and Development Project of Jiangsu Province(BE2022065-5).

摘要: 近年来,基于位置服务的技术迅猛发展,产生了海量的路网轨迹数据。而路径范围查询作为一种路网轨迹查询类型,是支持其他查询类型的基础。为了实现对海量路网轨迹数据的高效索引,同时提供精确的路径范围查询服务,提出了一种基于道格拉斯-普克算法的学习型索引结构(Douglas-Peuker Based Learned Index Structure,DPLI)。首先将轨迹数据分为多个轨迹段,然后取轨迹段中的点作为轨迹数据的表征,利用映射函数将其映射为一维映射值序列,而后根据键值数量将其划分为多个数据分片。在分片内将首尾数据组成一条线段,然后计算其余数据点距离线段的拟合误差,将超过误差阈值的数据点作为新的线段端点,递归分割原有的直线段,直到所有数据点的拟合误差小于阈值,从而拟合分段线性函数。采用多个路网数据和轨迹数据进行了充分的实验,实验结果表明:与传统索引方法相比,DPLI具有更快的构建效率和磁盘访问效率;与学习索引方法相比,DPLI保持了构建效率的优势,并且达到了100%查询召回率。

关键词: 位置服务, 路网轨迹, 学习型索引, 范围查询, 道格拉斯-普克算法

Abstract: In recent years,the technology of location-based services has developed rapidly,which has produced a large amount of road network trajectory data.As a type of road network trajectory query,path range query is the basis to support other query types.This paper proposes a Douglas-Peuker based learned index structure(DPLI) based on Douglas-Purker algorithm,in order to efficiently index massive road network trajectory data and provide accurate path range query service.Firstly,the trajectory data is divided into multiple trajectory segments,then the midpoint of the trajectory segment is taken as the representation of the tra-jectory data,and the mapping function is used to map the sequence of one-dimensional mapping values,and then the trajectory data is divided into multiple data fragments according to the number of key values.In the fragment,the first and last data are formed into a line segment,and then the fitting error of the distance line segment of the remaining data points is calculated.The data points exceeding the error threshold are taken as the new line segment endpoints,and the original line segment is recursively divided until the fitting error of all data points is less than the threshold,so as to fit the piecewise linear function.The results show that compared with the traditional index methods,DPLI has faster construction efficiency and disk access efficiency.Compared to learning indexing methods,DPLI maintains the advantage of build efficiency and achieves a 100% query recall rate.

Key words: Location services, Road network trajectory data, Learned index, Range query, Douglas-Purker algorithm

中图分类号: 

  • TP392
[1]ZHANG S M,CAI P,LI C P,et al.High-dimensional Learned Index Based on Space Division and Dimension Reduction [J].Journal of Software,2023,34(5):2413-2426.
[2]SANDU POPA I,ZEITOUNI K,ORIA V,et al.Indexing in-network trajectory flows [J].The VLDB Journal,2011(20):643-669.
[3]MORRIS B T,TRIVEDI M M.Trajectory Learning for Activity Understanding:Unsupervised,Multilevel,and Long-Term Adaptive Approach [J].IEEE Transactions on Software Engineering,2011,33(11):2287-2301.
[4]TAO Y,PAPADIAS D.MV3R-tree:A spatio-temporal access method for timestamp and interval queries[C]//Proceedings of the 27th International Conference on Very Large Data Bases.San Francisco,CA:Morgan Kaufmann Publishers Inc.,2001:431-440.
[5]GUTTMAN A.R-trees:A dynamic index structure for sparial searching [J].ACM Sigmod Record,1984,14(2):47-57.
[6]BECKMANN N,KRIEGEL H P,SCHNEIDER R,et al.TheR*-tree:an efficient and robust access method for points and rectangles[C]//International Conference on Management of Data.New York:ACM,1990:322-331.
[7]SINGH M,ZHU Q,JAGADISH H V.SWST:A Disk Based Index for Sliding Window Spatio-Temporal Data[C]//2012 IEEE 28th International Conference on Data Engineering.IEEE,2012:342-353.
[8]ROBINSON J T.The K-D-B-tree:a search structure for large multidimensional dynamic indexes[C]//Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data.New York:ACM,1981:10-18.
[9]PFOSER D,JENSEN C S.Indexing of network constrainedmoving objects[C]//CIKM03:12th International Conference on Information and Knowledge Management.New York:ACM,2003:25-32.
[10]ZHANG R,QI J,STRADLING M,et al.Towards a Painless Index for Spatial Objects [J].ACM Transactions on Database Systems,2014,39(3):1-42.
[11]LI P,LU H,ZHENG Q,et al.LISA:A Learned Index Structure for Spatial Data[C]//International Conference on Management of Data.New York:ACM,2020:2119-2133.
[12]LI R,RUAN S,BAO J,et al.Querying Massive Trajectories by Path on the Cloud[C]//the 25th ACM SIGSPATIAL International Conference.New York:ACM,2017:1-4.
[13]KRASKA T,BEUTEL A,CHI E H,et al.The Case for Learned Index Structures[C]// Proceedings of the 2018 International Conference on Management of Data.New York:ACM,2018:489-504.
[14]DING J,MINHAS U F,YU J,et al.ALEX:An UpdatableAdaptive Learned Index[C]//International Conference on Management of Data.New York:ACM,2020:969-984.
[15]GALAKAATOS A,MARKOVITCH M,BINNIG C,et al.FITing-Tree:A Data-aware Index Structure[C]//Proceedings of the 2019 International Conference on Management of Data.New York:ACM,2019:1189-1106.
[16]WANG Y T,WANG Y S,YUAN Y.Survey of Learned Index[J].Computer Science,2023,50(1):1-8.
[17]WU F,HAN J Y,LIU Y,et al.Trajectory Index Based on Piecewise Linear Regression Tree[J].Journal of Chinese Computer Systems,2022,43(6):1245-1253.
[18]ZHANG Z,JIN P Q,XIE X K.Learning index:Current situation and research prospects [J].Journal of Software,2021,32(4):1129-1150.
[19]KRASKA T,ALIZADEH M.SageDB:An Instance-Optimized Data Analytics System [J].Proceedings of the VLDB Endowment,2019,15(13):4062-4078.
[20]WANG H,FU X,XU J,et al.Learned Index for Spatial Queries[C]//2019 20th IEEE International Conference on Mobile Data Management(MDM).IEEE,2019:569-574.
[21]QI J,LIU G,JENSEN C S,et al.Effectively learning spatial indices [J].Proceedings of the VLDB Endowment,2020,13(12):2341-2354.
[22]DAVITKOVA A,MILCHEVSKI E,MICHEL S.The ML-Index:A Multidimensional,Learned Index for Point,Range,and Nearest-Neighbor Queries[C]//International Conference on Extending Database Technology.Copenhagen:Open Procee-dings.org,2020:407-410.
[23]ZHANG S,RAY S,LU R,et al.Efficient Learned Spatial Index With Interpolation Function Based Learned Model [J].IEEE Transactions on Big Data,2023,9(2):733-745.
[24]NEWSON P,KRUMM J.Hidden Markov map matchingthrough noise and sparseness[C]//Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.New York:ACM,2009:336-343.
[25]CHRISTOPHER H.Introduction to Real Analysis[M].New York:Wiley,2019:33-86.
[26]WU S T,MARQUEZ M R G.A non-self-intersection Douglas-Peucker algorithm[C]//Computer Graphics and Image Proces-sing.IEEE,2003:60-66.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!