计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 164-173.doi: 10.11896/jsjkx.260100070

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

基于流程编辑距离的结构感知轨迹聚类方法

叶剑虹, 吴永进, 黄鸿楷   

  1. 华侨大学计算机科学与技术学院 福建 厦门 361021
  • 收稿日期:2026-01-13 修回日期:2026-03-11 发布日期:2026-05-08
  • 通讯作者: 叶剑虹(leafever@hqu.edu.cn)
  • 基金资助:
    福建省科学技术厅引导性项目(2024H01010100)

Structure-aware Trace Clustering Method Based on Process Edit Distance

YE Jianhong, WU Yongjin, HUANG Hongkai   

  1. College of Computer Science and Technology, Huaqiao University, Xiamen, Fujian 361021, China
  • Received:2026-01-13 Revised:2026-03-11 Online:2026-05-08
  • About author:YE Jianhong,born in 1976,Ph.D,associate professor,is a senior member of CCF(No.14242M).His main research interests include data mining,fault diagnosis and prediction,intelligent manufacturing,and robotics.
  • Supported by:
    Guiding Project of Fujian Provincial Department of Science and Technology(2024H01010100).

摘要: 在模型检测与流程挖掘中,轨迹聚类通过对相似执行轨迹进行分组,为构建准确的行为模型、验证模型正确性以及基于实际数据的模型改进提供重要支撑。然而,现有基于序列模式的轨迹聚类方法通常将轨迹视为一般字符串进行处理,忽略了活动之间固有的并发与循环执行关系,容易导致结构信息丢失,从而影响聚类效果。针对上述问题,提出一种新的轨迹相似度度量方法——流程编辑距离。该方法首先将轨迹中活动的并发执行关系规范化为一致的顺序执行表示;随后,通过压缩化简机制对轨迹中的循环序列进行抽象处理,以减少冗余重复行为的干扰;最后,综合考虑活动本身及活动之间的直接跟随关系,对轨迹间的相似度进行度量。进一步地,为在聚类结果中获得更符合实际业务行为的流程模型,在凝聚层次聚类框架下引入一种后处理策略——合并噪音簇,以缓解由噪声或小规模簇引起的结构碎片化问题。实验结果表明,基于流程编辑距离的轨迹聚类算法在聚类质量上优于现有同类型方法,并表现出良好的稳定性与鲁棒性;同时,合并噪音簇策略能够持续有效地降低聚类结果的整体结构复杂性,从而生成更加清晰、可解释的流程模型。

关键词: 模型检测, 序列模式, 轨迹聚类, 流程编辑距离, 合并噪音簇

Abstract: In model checking and process mining,trace clustering groups similar execution traces to support the construction of accurate behavioral models,the verification of model correctness,and data-driven model refinement.However,existing sequence pattern-based trace clustering approaches typically treat traces as ordinary strings,neglecting the inherent concurrent and cyclic execution relationships among activities.This simplification often leads to the loss of structural information and consequently degrades clustering quality.To address this issue,this paper proposes a novel trace similarity measurement method,referred to as process edit distance.The proposed method first normalizes concurrent execution relationships among activities into a consistent sequential representation.It then abstracts repetitive loop behaviors through a compression and simplification mechanism to reduce the influence of redundant executions.Finally,trace similarity is measured by jointly considering activity occurrences and the direct-follow relationships between activities.Furthermore,to obtain process models that better reflect real business behavior,a post-processing strategy termed merging noise clusters is introduced within an agglomerative hierarchical clustering framework to alleviate structural fragmentation caused by noise or small-sized clusters.Experimental results demonstrate that the trace clustering algorithm based on process edit distance outperforms existing methods of the same category in terms of clustering quality,while exhibiting strong stability and robustness.In addition,the merging noise clusters strategy consistently reduces the overall structural complexity of the clustering results,leading to clearer and more interpretable process models.

Key words: Model checking, Sequence patterns, Trace clustering, Process edit distance, Merge noise clusters

中图分类号: 

  • TP393
[1]LU E,FANG X W,FANG N,et al.Discovery of effective infrequent sequences based on maximum probability path[J].Connection Science,2022,34(1):63-82.
[2]ZANDKARIMI F,REHSE J R,SOUDMAND P,et al.A generic framework for trace clustering in process mining[C]//2020 2nd International conference on process mining (ICPM).2020:177-184.
[3]BERTRAND Y,DE WEERDT J,SERRAL E.TROPICCAL:Multi-perspective trace clustering for IoT-enhanced processes[J].Computers in Industry,2026,175:104419.
[4]PEEPERKORN J,DE SMEDT J,DE WEERDT J.Model-driven stochastic trace clustering[J].Information Systems,2026,139:102697.
[5]EVERMANN J,THALER T,FETTKE P.Clustering tracesusing sequence alignment[C]//Business Process Management Workshops:BPM 2015,13th International Workshop.2016:179-190.
[6]LU X X,TABATABAEI S A,HOOGENDOORN M,et al.Trace clustering on very large event data in healthcare using frequent sequence patterns[C]//Business Process Management:17th International Conference,BPM 2019.2019:198-215.
[7]DOČAN O,AVVAD H.Fuzzy clustering based on activity sequence and cycle time in process mining[J].Axioms,2025,14(5):351.
[8]LIN L L,WEN L J,QIAN C,et al.Overview of Log Partitioning Technology for Process Mining[J].Journal of Computer Science,2022,045(9):1946-1968.
[9]APPICE A,MALERBA D.A co-training strategy for multiple view clustering in process mining[J].IEEE Transactions on Services Computing,2015,9(6):832-845.
[10]DE KONINCK P,NELISSEN K,BAESENS B,et al.An approach for incorporating expert knowledge in trace clustering[C]//Advanced Information Systems Engineering:29th International Conference.2017:561-576.
[11]DE KONINCK P,NELISSEN K,VANDEN BROUCKE S,et al.Expert-driven trace clustering with instance-level constraints[J].Knowledge and Information Systems,2021,63:1197-1220.
[12]HE Z,HU L,HE J,et al.Significance-based interpretable se-quence clustering[J].Information Sciences,2025,704:121972.
[13]DE WEERDT J,VANDEN BROUCKE S,VANTHIENEN J,et al.Active trace clustering for improved process discovery[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(12):2708-2720.
[14]JIAN K Y,SHI Y Q,HUANG S,et al.Review of Similarity Research on Business Process Models[J].Computer Science,2023,50(6):338-350.
[15]VAN DER AALST W M.Process mining:a 360 degree overview[M]//Process Mining Handbook:Springer,2022:3-34.
[16]LOPES I F,FERREIRA D R.A survey of process mining competitions:the BPI challenges 2011-2018[C]//Business Process Management Workshops:BPM 2019 International Workshops.2019:263-274.
[17]VAN DER AALST W M.Process mining:Data science in action[M].Springer,2016.
[18]ZHANG Y L,ZHOU Y J.Overview of Clustering Algorithms[J].Computer Applications,2019,39(7):1869-1882.
[19]GAO LL,BIEN J,WITTEN D.Selective inference for hierarchical clustering[J].Journal of the American Statistical Association,2022(10):1-11.
[20]SAINT J,FAN Y Z,SINGH S,et al.Using process mining to analyse self-regulated learning:a systematic analysis of four algorithms[C]//LAK21:11th International Learning Analytics and Knowledge Conference.2021:333-343.
[21]IMRAN M,ISMAIL M A,HAMID S,et al.Complex process modeling in Process mining:A systematic review[J].IEEE Access,2022,9(10):101515-101536.
[22]DE KONINCK P,DE WEERDT J.Scalable mixed-paradigmtrace clustering using super-instances[C]//2019 International Conference on Process Mining(ICPM).2019:17-24.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!