Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250800004-8.doi: 10.11896/jsjkx.250800004

• Big Data & Data Science • Previous Articles     Next Articles

Model-based Trajectory Anomalies Detection Algorithm for Longitudinal Data

DONG Dong, JIN Pengchao   

  1. College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:DONG Dong,born in 1971,master,associate professor,master's supervisor.His main research interests include outlier detection and vulnerability mining.
  • Supported by:
    Humanities and Social Science Foundation of Hebei Normal University(S23JX003).

Abstract: Longitudinal data have attracted considerable attention in fields such as public health because they capture the dynamic trajectories of the same subjects over time.Data cleaning process ensures the quality of longitudinal data modelling and directly influences downstream analysis quality.To address this issue,this paper proposes a novel anomaly detection method which integrates generalized linear model(GLM)-based polynomial fittingand adaptive clustering.The approach assigns binary normal and abnormal labels to individual trajectories and compares them with ground-truth annotations.On four independent datasets(two simulated longitudinal cohorts,one additional UCR dataset,and one real-world clinical dataset) a systematically comparison against established R-package methods is conducted.Experimental results demonstrate superior detection performance and robust generalizability across diverse settings.Applying the proposed method to six years of height data from primary school students in a specific city further demonstrates its effectiveness and accuracy in detecting outlier trajectories in practical longitudinal health data.This method offers robust support for public health surveillance and intervention,disease progression assessment,and clinical decision support.

Key words: Longitudinal trajectory, Trajectory anomalies detection, Data cleaning, Unsupervised clustering

CLC Number: 

  • TP311.14
[1] DIGGLE P.Analysis of longitudinal data [M].Oxford:Oxford University Press,2002:1-3.
[2] LU Z.Clustering longitudinal data:A review of methods and software packages [J].International Statistical Review,2025,93(3):425-458.
[3] TOPHAM G L,WASHBURN I J,HUBBS-TAIT L,et al.TheFamilies and Schools for Health Project:a longitudinal cluster randomized controlled trial targeting children with overweight and obesity [J].International Journal of Environmental Research and Public Health,2021,18(16):8744.
[4] POULAKIS K,PEREIRA J B,MUEHLBOECK J S,et al.Multi-cohort and longitudinal bayesian clustering study of stage and subtype in Alzheimer's disease [J].Nature Communications,2022,13(1):4566.
[5] SALMANPOUR M R,SHAMSAEI M,HAJIANFAR G,et al.Longitudinal clustering analysis and prediction of Parkinson's disease progression using radiomics and hybrid machine learning [J].Quantitative Imaging in Medicine and Surgery,2022,12(2):906.
[6] MATSON G,MCELROY S,LEE Y,et al.Longitudinal analysis of COVID-19 impacts on mobility:an early snapshot of the emerging changes in travel behavior [J].Transportation Research Record,2023,2677(4):298-312.
[7] HUANG D Y C,EVANS E,HARA M,et al.Employment tra-jectories:Exploring gender differences and impacts of drug use [J].Journal of Vocational Behavior,2011,79(1):277-289.
[8] LI C N,FENG G W,YAO H,et al.Survey on trajectory anomaly detection [J].Journal of Software,2024,35(2):927-974.
[9] CÔTÉ P O,NIKANJAM A,AHMED N,et al.Data cleaningand machine learning:a systematic literature review [J].Automated Software Engineering,2024,31(2):54.
[10] GRÜN B,LEISCH F.flexmix:Flexible Mixture Modeling:Rpackage version 2.3-20 [EB/OL].https://CRAN.R-project.org/package=flexmix.
[11] LEISCH F.FlexMix:A General Framework for Finite Mixture Models and Latent Class Regression in R [J].Journal of Statistical Software,2004,11(8):1-18.
[12] GRÜN B,LEISCH F.Fittingfinite mixtures of generalized linear regressions in R [J].Computational Statistics & Data Analysis,2007,51(11):5247-5252.
[13] GRÜN B,LEISCH F.FlexMixversion 2:finite mixtures withconcomitant variables and varying and constant parameters [J].Journal of Statistical Software,2008,28(4):1-35.
[14] GENOLINI C,ALACOQUE X,SENTENAC M,et al.kml and kml3d:Rpackages to cluster longitudinal data [J].Journal of Statistical Software,2015,65(4):1-34.
[15] GENOLINI C,FALISSARD B,KIENER P.kml:K-means for longitudinal data:R package version 2.5-0 [EB/OL].https://CRAN.R-project.org/package=kml.
[16] PROUST-LIMA C,PHILIPPS V,LIQUET B.Estimation ofextended mixed models using latent classes and latent processes:The R package lcmm [J].Journal of Statistical Software,2017,78(2):1-56.
[17] PROUST-LIMA C,PHILIPPS V,DIAKITE A,et al.lcmm:Extended mixed models using latent classes and latent processes:R package version 2.2.1 [EB/OL].https://cran.r-project.org/package=lcmm.
[18] ZHOU Y,CHEN H,IAO S,et al.fdapace:Functionaldata analysis and empirical dynamics:R package version 0.6.0 [EB/OL].https://CRAN.R-project.org/package=fdapace.
[19] REN R,FANG K.FADPclust:Functional data clustering using adaptive density peak detection [EB/OL].https://CRAN.R-project.org/package=FADPclust.
[20] KNORR E M,NG R T,TUCAKOV V.Distance-based outliers:algorithms and applications [J].The VLDB Journal,2000,8(3):237-253.
[21] LEE J G,HAN J,LI X.Trajectory outlier detection:A partition-and-detect framework [C]//2008 IEEE 24th International Conference on Data Engineering.IEEE,2008:140-149.
[22] LIU L,QIAO S,ZHANG Y,et al.An efficient outlying trajecto-ries mining approach based on relative distance [J].InternationalJournal of Geographical Information Science,2012,26(10):1789-1810.
[23] GENOLINI C,FALISSARD B.KmL:k-means for longitudinal data [J].Computational Statistics,2010,25(2):317-328.
[24] WANG J,YUAN Y,NI T,et al.Anomalous trajectory detection and classification based on difference and intersection set distance [J].IEEE Transactions on Vehicular Technology,2020,69(3):2487-2500.
[25] MANGÉ V,ANEZIN Y,TOURNERET J Y,et al.Detectingabnormal ship trajectories using functional isolation forests and dynamic time warping [C]//32nd European Signal Processing Conference(EUSIPCO 2024).IEEE,2024:2342-2346.
[26] LIU Z,PI D,JIANG J.Density-based trajectory outlier detection algorithm [J].Journal of Systems Engineering and Electronics,2013,24(2):335-340.
[27] LUAN F,ZHANG Y,CAO K,et al.Based local density trajectory outlier detection with partition-and-detect framework [C]//2017 13th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD).IEEE,2017:1708-1714.
[28] GUAN B,ZHANG Y,LIU L,et al.An improving algorithm of trajectory outliersdetection [C]//Advanced Technology in Teaching-Proceedings of the 2009 3rd International Conference on Teaching and Computational Science(WTCS 2009).Berlin:Springer,2012:907-914.
[29] PICIARELLI C,MICHELONI C,FORESTI G L.Trajectory-based anomalous event detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2008,18(11):1544-1554.
[30] LI X,HAN J,KIM S,et al.Roam:Rule-and motif-based anomaly detection in massive moving object data sets [C]//Procee-dings of the 2007 SIAM International Conference on Data Mi-ning.Society for Industrial and Applied Mathematics,2007:273-284.
[31] LUO D,CHEN P,YANG J,et al.A new classification method for ship trajectories based on AIS data [J].Journal of Marine Science and Engineering,2023,11(9):1646.
[32] SYLVESTRE M P,BOULANGER L,et al.traj:Clustering offunctional data based on measures of change:R package version 2.2.1 [EB/OL].Available:https://CRAN.R-project.org/package=traj.
[33] TANG H,HUANG J,LIN H,et al.The global burden and biomarkers of cardiovascular disease attributable to ambient particu-late matter pollution [J].Journal of Translational Medicine,2025,23(1):359.
[34] MORENO-TORRES J G,RAEDER T,ALAIZ-RODRÍGUEZR,et al.A unifying view on dataset shift in classification [J].Pattern Recognition,2012,45(1):521-530.
[35] KOH P W,SAGAWA S,MARKLUND H,et al.Wilds:Abenchmark of in-the-wild distribution shifts [C]//International Conference on Machine Learning.PMLR,2021:5637-5664.
[36] BREIMAN L.Random forests [J].Machine Learning,2001,45:5-32.
[37] GENEUR R,POGGI J M,TULEAU-MALOT C.Variable se-lection using random forests [J].Pattern Recognition Letters,2010,31(14):2225-2236.
[38] DOBSON A J,BARNETT A G.An introduction to generalized linear models[M].Chapman and Hall/CRC,2018.
[39] PELLEG D,MOORE A.X-means:Extending K-means with Efficient Estimation of the Number of Clusters [C]//Proceedings of the Seventeenth International Conference on Machine Learning(ICML 2000).San Francisco:Morgan Kaufmann,2000:727-734.
[40] GENOLINI C,FALISSARD B.KmL:k-means for longitudinaldata [J].Computational Statistics,2010,25(2):317-328.
[41] WIJAYA Y A,KURNIADY D A,SETYANTO E,et al.Davies-Bouldin index algorithm for optimizing clustering case studies map school facilities [J].TEM J,2021,10(3):1099-1103.
[42] DAU H A,BAGNALL A,KAMGAR K,et al.The UCR time series archive [J].IEEE/CAA Journal of Automatica Sinica,2019,6(6):1293-1305.
[43] ALOIA M S,GOODWIN M S,VELICER W F,et al.Time series analysis of treatment adherence patterns in individuals with obstructive sleep apnea [J].Annals of Behavioral Medicine,2008,36(1):44-53.
[44] XIE J,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis [C]//International Conference on Machine Learning.PMLR,2016:478-487.
[45] HAN J W,KAMBER M,PEI J.Data Mining:Concepts andTechniques [M].Beijing:China Machine Press,2012:236-240.
[46] STEINLEY D.Properties of the Hubert-Arabie adjusted Rand index [J].Psychological Methods,2004,9(3):386-396.
[1] JIANG Yakun, LIN Xu. Intrusion Detection Method for Power Monitoring System Based on Multi-source Network Data [J]. Computer Science, 2025, 52(11A): 241200157-7.
[2] QIAN Zekai, DING Xiaoou, SUN Zhe, WANG Hongzhi, ZHANG Yan. Intelligent Evidence Set Selection Method for Diverse Data Cleaning Tasks [J]. Computer Science, 2024, 51(8): 124-132.
[3] PENG Bo, LI Yaodong, GONG Xianfu. Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder [J]. Computer Science, 2024, 51(6A): 230700070-5.
[4] WANG Chundong, DU Yingqi, MO Xiuliang, FU Haoran. Enhanced Federated Learning Frameworks Based on CutMix [J]. Computer Science, 2023, 50(11A): 220800021-8.
[5] LIANG Haowei, WANG Shi, CAO Cungen. Study on Short Text Classification with Imperfect Labels [J]. Computer Science, 2023, 50(1): 185-193.
[6] WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[7] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[8] XU He, WU Hao, LI Peng. Design of Temporal-spatial Data Processing Algorithm for IoT [J]. Computer Science, 2020, 47(11): 310-315.
[9] LIU Jin-shuo, LIU Bi-wei, ZHANG Mi, LIU Qing. Fault Prediction of Power Metering Equipment Based on GBDT [J]. Computer Science, 2019, 46(6A): 392-396.
[10] WANG Xiao-xia, SUN De-cai. Q-sample-based Local Similarity Join Parallel Algorithm [J]. Computer Science, 2019, 46(12): 38-44.
[11] SUN De-cai and WANG Xiao-xia. MapReduce Based Similarity Self-join Algorithm for Big Dataset [J]. Computer Science, 2017, 44(5): 20-25.
[12] GU Yun-hua, GAO Bao, ZHANG Jun-yong and DU Jie. RFID Data Cleaning Algorithm Based on Tag Velocity and Sliding Sub-window [J]. Computer Science, 2015, 42(1): 144-148.
[13] WANG Wan-liang,GU Xi-ren and ZHAO Yan-wei. RFID Uncertain Data Cleaning Algorithm Based on Dynamic Tags [J]. Computer Science, 2014, 41(Z6): 383-386.
[14] CHEN Jing-yun,ZHOU Liang and DING Qiu-lin. Cleaning Method Research of RFID Data Stream Based on Improved Kalman Filter [J]. Computer Science, 2014, 41(3): 202-204.
[15] . Data Cleaning and its General System Framework [J]. Computer Science, 2012, 39(Z11): 207-211.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!