计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211100268-10.doi: 10.11896/jsjkx.211100268

• 大数据&数据科学 • 上一篇    下一篇

DCPFS:分布式轨迹流伴随模式挖掘框架

张康威1, 张敬伟1, 杨青2, 胡晓丽1, 单美静3   

  1. 1 广西可信软件重点实验室(桂林电子科技大学) 广西 桂林 541004
    2 广西自动检测技术与仪器重点实验室(桂林电子科技大学) 广西 桂林 541004
    3 华东政法大学信息科学与技术系 上海 201620
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 单美静(shanmeijing@ecupl.edu.cn)
  • 作者简介:(zkw951@126.com)
  • 基金资助:
    国家自然科学基金(61862013,U1811264,U1711263);广西自然科学基金(2020GXNSFAA159117,2018GXNSFAA281199);广西可信软件重点实验室重点课题(KX202052);广西自动检测技术与仪器重点实验室主任基金课题(YQ21102)

DCPFS:Distributed Companion Patterns Mining Framework for Streaming Trajectories

ZHANG Kang-wei1, ZHANG Jing-wei1, YANG Qing2, HU Xiao-li1, SHAN Mei-jing3   

  1. 1 Guangxi Key Laboratory of Trusted Software(Guilin University of Electronic Technology),Guilin,Guangxi 541004,China
    2 Guangxi Key Laboratory of Automatic Detecting Technology and Instruments(Guilin University of Electronic Technology),Guilin,Guangxi541004,China
    3 Department of Information Science and Technology,East China University of Political Science and Law,Shanghai 201620,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:ZHANG Kang-wei,born in 1995,postgraduate.His main research interests include large-scale trajectory data stream pattern mining and analysis.
    SHAN Mei-jing,born in 1979,Ph.D,associate professor.Her main research interests include big data analysis and information security.
  • Supported by:
    National Natural Science Foundation of China(61862013,U1811264,U1711263),Natural Science Foundation of Guangxi(2020GXNSFAA159117,2018GXNSFAA281199),Key Project of Guangxi Key Laboratory of Trusted Software(KX202052) and Foundation of Guangxi Key Laboratory of Automatic Detection Technology and Instrument(YQ21102).

摘要: 随着定位技术的广泛使用,产生了以轨迹流形式收集的海量时空数据,如何从中挖掘有用的信息得到越来越多学者的关注。从轨迹流中挖掘伴随模式指在同一时间内发现具有高度相似行为的群体,对于交通管理、推荐系统的实时应用至关重要。然而,现有的研究只达到秒级响应,面对大规模轨迹数据难以在毫秒级的时间内快速响应。因此,提出了分布式轨迹流挖掘框架DCPFS。框架的主要模块包括:1)为了减少基于密度的聚类算法DBSCAN由于大规模数据带来的大量时间消耗,研究基于分布式部署方案,设计了数据分区策略和聚类合并算法,确保聚类的并行性及准确性;2)由于现实中轨迹移动具有方向性,在聚类阶段增加方向维度以减少冗余聚类;3)鉴于模式挖掘阶段涉及对聚类结果的交叉,设计了并行交叉算法来提高挖掘效率;4)基于Flink分布式大数据流处理平台实现了DCPFS。以成都市出租车GPS数据集和谷歌生活数据集为例进行实验,验证了所提框架比基准方法具有更快的响应速度。

关键词: 大数据, 轨迹流, 伴随模式, 密度聚类, 分布式

Abstract: The widespread use of location technology leads to huge volumes of spatio-temporal data collected in the form of tra-jectory data streams.How to discover useful information from it has attracted more and more scholars’ attention.Mining companion pattern from trajectory stream refers to discovering groups with highly similar behaviors at the same time,which is essential for real-time applications of traffic management and recommendation systems.However,the existing research only achieves a second-level response,and it is difficult to respond quickly in milliseconds to large-scale trajectory data.Therefore,this paper proposes a distributed companion patterns mining framework DCPFS.The main contents of our work include:1) In order to reduce the time consumption of the density-based clustering algorithm DBSCAN for large-scale data,this paper proposes a data partition strategy and clustering merging algorithm based on a distributed deployment plan to ensure clustering parallelism and accuracy.2) Because the trajectory movement is directional in reality,we increase the direction dimension to reduce the redundancy in the clustering.3) We designed a parallel intersection algorithm to improve the efficiency of the intersection of clustering results in the pattern mining stage.4) We implement DCPFS on the Flink distributed big data processing platform and use Chengdu taxi GPS dataset and Google life dataset for experiments.Comprehensive empirical study demonstrates that the proposed framework has faster response speed than the baseline method.

Key words: Big data, Trajectory streams, Companion patterns, Density-based clustering, Distributed

中图分类号: 

  • TP311.13
[1]ZHENG Y.Trajectory DataMining:An Overview[M].ACM,2015.
[2]BENKERT M,GUDMUNDSSON J,HUBNER F,et al.Repor-ting Flock Patterns[J].Computational Geometry,2010,41(3):111-125.
[3]JEUNG H,SHEN H T,ZHOU X.Convoy queries inspation-temporal databases[C]//IEEE 24th International Conference on Data Engineering.New York,IEEE,2008:1457-1459.
[4]JEUNG,YIU H,MAN L,et al.Discovery of Convoys in Trajectory Databases[J].Computer Science,2010,1(1):1068-1080.
[5]LI Z,DING B L,HAN J W,et al.Swarm:Mining relaxed temporal moving object clusters[J].Processdings of the Very Large Database Endowment,2010,3(1/2):723-734.
[6]LI Y,BAILEY J,KU L.Efficient mining of platoon patterns in trajectory databases[J].Data & Knowledge Engineering,2015,100:167-187.
[7]ORAKZAI F,CALDERS T,Pedersen T B.Distributed convoy pattern mining[C]//Proc. of the 17th Int. Conf. on Mobile Data Management.Piscataway,NJ:IEEE,2016:122-131.
[8]ZHANG J W,LIU S J,YANG Q.DMFUCP:A Distributed Mi-ning Framework for Universal Companion Patterns on Large-Scale Trajectory Data[J].Journal of Computer Research and Development,2022,59(3):647-660.
[9]VIEIRA M R,BAKALOV P,TSOTRAS V J.On-line discovery of flock patterns inspatio-temporal data[C]//Procceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Informantion Systems.Seattle,Washington,USA,2009:286-295.
[10]TANG L A,ZHENG Y,YUAN J,et al.On Discovery of Traveling Companions form Streaming Trajectories[C]//IEEE 28th International Conference on Data Engineering.Washington,2012:186-197.
[11]LI X,CEIKUTE V,JENSEN C S,et al.Effective Online Group Discovey in Trajectory Databases[J].IEEE Transactions on Knowledge & Data Engineering,2013,25(12):2752-2766.
[12]LAN R,YUY,CAO L,et al.Discovering Evolving Moving Object Groups from Massive-Scale Trajectory Stream[C]//IEEE International Conference on Mobile Data Management.IEEE,2017:256-265.
[13]NASERIAN E,WANG X H,XU X L,et al.A Framework ofLoose Travelling Compnion Discovery from Human Trajectories[J].IEEE Transactionson Moblie Computing,2018,17(11):2497-2511.
[14]SHEIN T T,PUNTHEERANURAK S,IMAMURA M.Dis-covery of evolving companion from trajectory data streams[J].Knowledge and Information Systems,2020,62:3509-3533.
[15]ZHENG Y,ZHANG L,XIE X,et al.Mining interesting loca-tions and travel sequences from GPS trajectories[C]//Proc. of Int. Conf. on World Wild Web.New York:ACM,2009:791-800.
[16]XIAN Y,LIU Y,XU C,et al.Parallel Discovery of Trajectory Companion Pattern and System Evaluation[J].2020,28(10):538-550.
[17]ZHENG Y,CHEN Y,XIE X,et al.Understanding mobilitybased on GPS data[C]//Proceedings of ACM Conference on Ubiquitous Computing.New York:ACM,2008:312-321.
[18]ZHENG Y,XIE X,MA W.GeoLife:A collaborative social networking service among user,location and trajectory[J].IEEE Data Engineering Bulletin,2010,33(2):32-39.
[19]TILMANN R,JONAS T,ASTERIOS K,et al.Apache Flink in current research[J].2016,58(4):157-165.
[20]YUAN J,ZHENG Y,ZHANG C.et al.T-Drive:Driving Directions Based on Taxi Trajectories[C]//Proceedings of the 18th Annual ACM International Conference on Advances in Geographic Information Systems.ACM,2010.99-108.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[3] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[4] 傅丽玉, 陆歌皓, 吴义明, 罗娅玲.
区块链技术的研究及其发展综述
Overview of Research and Development of Blockchain Technology
计算机科学, 2022, 49(6A): 447-461. https://doi.org/10.11896/jsjkx.210600214
[5] 杨亚红, 王海瑞.
基于Renyi熵和BiGRU算法实现SDN环境下的DDoS攻击检测方法
DDoS Attack Detection Method in SDN Environment Based on Renyi Entropy and BiGRU Algorithm
计算机科学, 2022, 49(6A): 555-561. https://doi.org/10.11896/jsjkx.210800095
[6] 孙浩, 毛瀚宇, 张岩峰, 于戈, 徐石成, 何光宇.
区块链跨链技术发展及应用
Development and Application of Blockchain Cross-chain Technology
计算机科学, 2022, 49(5): 287-295. https://doi.org/10.11896/jsjkx.210800132
[7] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[8] 冯了了, 丁滟, 刘坤林, 马科林, 常俊胜.
区块链BFT共识算法研究进展
Research Advance on BFT Consensus Algorithms
计算机科学, 2022, 49(4): 329-339. https://doi.org/10.11896/jsjkx.210700011
[9] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[10] 谭双杰, 林宝军, 刘迎春, 赵帅.
基于机器学习的分布式星载RTs系统负载调度算法
Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning
计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126
[11] 陆炫廷, 蔡瑞杰, 刘胜利.
基于流量分析发现未知UDP反射放大协议
Discovery of Unknown UDP Reflection Amplification Protocol Based on Traffic Analysis
计算机科学, 2022, 49(11A): 211000089-5. https://doi.org/10.11896/jsjkx.211000089
[12] 王清旭, 董理君, 贾伟, 刘超, 杨光, 吴铁军.
开放式环境下基于向量表征与计算的动态访问控制
Vector Representation and Computation Based Dynamic Access Control in Open Environment
计算机科学, 2022, 49(11A): 210900217-7. https://doi.org/10.11896/jsjkx.210900217
[13] 李辉, 韩林, 陶红伟, 董本松.
基于申威众核处理器的Office口令恢复向量化研究
Study on Office Password Recovery Vectorization Technology Based on Sunway Many-core Processor
计算机科学, 2022, 49(11A): 210900176-5. https://doi.org/10.11896/jsjkx.210900176
[14] 李辉, 韩林, 于哲, 王威.
基于人工蜂群算法的多维函数优化加速方法
Acceleration Method for Multidimensional Function Optimization Based on Artificial Bee Colony Algorithm
计算机科学, 2022, 49(11A): 211200075-6. https://doi.org/10.11896/jsjkx.211200075
[15] 王冬霞, 雷咏梅, 张泽宇.
面向通用一致性优化的通信高效的异步ADMM算法
Communication Efficient Asynchronous ADMM for General Form Consensus Optimization
计算机科学, 2022, 49(11): 309-315. https://doi.org/10.11896/jsjkx.211200006
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!