计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 99-106.doi: 10.11896/jsjkx.240600031

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于自然语言增强的签到轨迹与用户匹配方法

王天一, 林友芳, 贡乐天, 陈炜, 郭晟楠, 万怀宇   

  1. 北京交通大学计算机与信息技术学院 北京 100044
    交通数据分析与挖掘北京市重点实验室 北京 100044
  • 收稿日期:2024-06-04 修回日期:2024-08-29 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 万怀宇(hywan@bjtu.edu.cn)
  • 作者简介:(wangtianyi@bjtu.edu.cn)
  • 基金资助:
    国家自然科学基金面上项目(62372031)

Check-in Trajectory and User Linking Based on Natural Language Augmentation

WANG Tianyi, LIN Youfang, GONG Letian, CHEN Wei, GUO Shengnan, WAN Huaiyu   

  1. School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing 100044,China
  • Received:2024-06-04 Revised:2024-08-29 Online:2025-02-15 Published:2025-02-17
  • About author:WANG Tianyi,born in 2000,postgra-duate.His main research interests include spatio-temporal data mining and deep learning.
    WAN Huaiyu,born in 1981,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.17732D).His main research interests include spatial-temporal data mining,information extraction and social networks mining.
  • Supported by:
    National Natural Science Foundation of China(62372031).

摘要: 随着定位技术和传感器的高速发展,用户移动轨迹数据日渐丰富,但大多分散在不同平台上。为了全面利用这些数据并准确反映用户的真实行为,对轨迹用户匹配的研究变得至关重要。该任务旨在从海量签到轨迹数据中精准关联用户身份。近年来,研究者们尝试运用循环神经网络、注意力机制等方法深入挖掘轨迹数据。然而,当前方法在处理用户签到轨迹时面临两大挑战:一是签到数据中有限的时空特征不足以从主观和客观两个角度全面地建模签到点信息,二是用户的签到轨迹往往围绕着一个特定的主题。针对这两点挑战,提出了一种基于自然语言增强的轨迹用户匹配模型(Natural Language Augmented Trajectory User Link,NLATUL)。首先,设计了一套自然语言模板与软提示令牌来描述签到轨迹,并使用语言模型来理解签到点中的主观意图,融合用户的时空状态,提供了一种充分从主观与客观两个方面建模签到点的方法;在此基础上,通过提示学习的方法推理签到轨迹的主题,并对建模的签到点表示的轨迹进行双向编码,通过签到轨迹主题与签到轨迹编码的结合实现对用户签到轨迹的准确理解。在两个真实世界签到数据集上验证的实验结果表明,NLATUL能够更准确地匹配签到轨迹与其对应的用户。

关键词: 轨迹用户匹配, 签到序列学习, 时空数据挖掘, 语言模型, 提示学习

Abstract: With the rapid development of positioning technology and sensors,user movement trajectory data is becoming increa-singly abundant but scattered on different platforms.In order to fully utilize these data and accurately reflect users' real beha-vior,the study of trajectory user linking has become crucial.This task aims to accurately correlate user identities from massive check-in trajectory data.In recent years,researchers have tried to use methods such as recurrent neural networks and attention mechanisms to deeply mine trajectory data.However,current methods face two major challenges when processing user check-in sequences.First,the limited spatiotemporal features in the check-in data are insufficient to comprehensively model check-in point information from both subjective and objective perspectives.Second,the topic of the user check-in sequence will affect understan-ding and modeling check-in sequences.In response to these two challenges,this paper proposes a trajectory user linking model based on natural language augmentation named NLATUL,and designs a set of natural language templates and soft prompt tokens to describe the check-in sequence,and uses the language model to understand the subjective intention in the check-in points,integrating the user's spatiotemporal status,and providing a new perspective and representation that fully models the check-in points from both subjective and objective aspects.On this basis,this paper infer the topic of the check-in sequence through prompt learning,and performs bi-direction encoding on the trajectory represented by the modeled check-in points,so as to achieve an accurate understanding of the user's check-in sequence through the combination of the check-in sequence topic and the check-in sequence encoding,which can link the trajectory with the user more effectively.Verified on two check-in datasets,the experimental results show that proposed method can more accurately link check-in trajectories and their corresponding users.

Key words: Trajectory user link, Check-in sequence learning, Spatiotemporal data mining, Language model, Prompt learning

中图分类号: 

  • TP391
[1]CHEN L,NG R.On The Marriage of Lp-norms and Edit Distance[C]//Proceedings 2004 VLDB Conference.2004:792-803.
[2]DING H,GOCE T,PETER S,et al.Querying and mining oftime series data[C]//Proceedings of the VLDB Endowment.2008:1542-1552.
[3]RENDLE S,FREUDENTHALER C,SCHMIDT-THIEME L.Factorizing personalized Markov chains for next-basket recommend-dation[C]//Proceedings of the 19th International Confe-rence on World Wide Web.2010.
[4]GAO Q,ZHOU F,ZHANG K P,et al.Identifying Human Mo-bility via Trajectory Embeddings[C]//International Joint Confe-rences on Artificial Intelligence(IJCAI 2017).2019:1689-1695.
[5]HOPFIELD J J.Neural networks and physical systems withemergent collective computational abilities[C]//Proceedings of the National Academy of Sciences.1982:2554-2558.
[6]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[7]CHUNG J,GÜLÇEHRE Ç,CHO K,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.[J].arXiv:1412.3555,2014.
[8]ZHOU F,GAO Q,TRAJCEVSKI C,et al.Trajectory-UserLinking via Variational AutoEncoder[C]//27th International Joint Conference on Artificial Intelligence and 23rd European Conference on Artificial Intelligence(IJCAI-ECAI 2018).2018:3212-3218.
[9]XU W D,SUN H Z,DENG C,et al.Variational Autoencoder for Semi-Supervised Text Classification[C]//Proceeding of the Thirty-First AAAI Conference on Artificial Intelligence:Twenth-Ninth Innovative Applications of Artificial Intelligence Conference and Seventh Symposium on Educational Advances in Artificial Intelligence.2017:3358-3364.
[10]MIAO C C,WANG J,YU H,et al.Trajectory-User Linkingwith Attentive Recurrent Network[C]//International Confe-rence on Autonomous Agents and Multiagent Systems(AAMAS 2020).2021:869-877.
[11]ZHOU F,CHEN S P,WU J,et al.Trajectory-User Linking via Graph Neural Network[C]//2021 IEEE International Confe-rence on Communications:IEEE International Confe-rence on Communications(ICC).2021:1-6.
[12]YU Y,TANG H,WANG F,et al.TULSN:Siamese Network for Trajectory-user Linking[C]//2020 International Joint Conference on Neural Networks(IJCNN).2020.
[13]GONG L,LIN Y,GUO S,et al.Contrastive Pre-training withAdversarial Perturbations for Check-In Sequence Representation Learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:4276-4283.
[14]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North,Minneapolis,Minneso Data.2019.
[15]ALEC R,JEFFREY W,REWON C,et al.Language models are unsupervised multitask learners[EB/OL].https://openai.com/research/overview.
[16]XUE H,FLORA D,REN Y L,et al.Translating Human Mobi-lity Forecasting through Natural Language Generation[C]//Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining(WSDM '22).Association for Computing Machinery,New York,NY,USA,2022:1224-1233.
[17]HU B L.Sentiment Analysis,Stock Price,Text Mining,StockMarket[J].Statistics and Application,2021,10(6):957-962.
[18]ZAMANI H,SHAKERY A.A language model-based frame-work for multi-publisher content-based recommender systems[J].Information Retrieval Journal,2018,21(5):369-409.
[19]LIU P,YUAN W,FU J,et al.Pre-train,Prompt,and Predict:A Systematic Survey of Prompting Methods in Natural Language Processing[J].arXiv:2107.13586,2021.
[20]Chris Veness.Geohash[EB/OL].https://www.movable-type.co.uk/scripts/geohash.html.
[21]GUO H,CHEN B,TANG R,et al.An Embedding LearningFramework for Numerical Features in CTR Prediction[C]//Proceedings of the 27th ACM SIGKDD Conference on Know-ledge Discovery & Data Mining.2021.
[22]CHO E,MYERS S A,LESKOVEC J.Friendship and mobility[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011.
[23]LIU Y,WEI W,SUN A,et al.Exploiting Geographical Neighborhood Characteristics for Location Recommendation [C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.2014.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!