计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 164-173.doi: 10.11896/jsjkx.260100070
叶剑虹, 吴永进, 黄鸿楷
YE Jianhong, WU Yongjin, HUANG Hongkai
摘要: 在模型检测与流程挖掘中,轨迹聚类通过对相似执行轨迹进行分组,为构建准确的行为模型、验证模型正确性以及基于实际数据的模型改进提供重要支撑。然而,现有基于序列模式的轨迹聚类方法通常将轨迹视为一般字符串进行处理,忽略了活动之间固有的并发与循环执行关系,容易导致结构信息丢失,从而影响聚类效果。针对上述问题,提出一种新的轨迹相似度度量方法——流程编辑距离。该方法首先将轨迹中活动的并发执行关系规范化为一致的顺序执行表示;随后,通过压缩化简机制对轨迹中的循环序列进行抽象处理,以减少冗余重复行为的干扰;最后,综合考虑活动本身及活动之间的直接跟随关系,对轨迹间的相似度进行度量。进一步地,为在聚类结果中获得更符合实际业务行为的流程模型,在凝聚层次聚类框架下引入一种后处理策略——合并噪音簇,以缓解由噪声或小规模簇引起的结构碎片化问题。实验结果表明,基于流程编辑距离的轨迹聚类算法在聚类质量上优于现有同类型方法,并表现出良好的稳定性与鲁棒性;同时,合并噪音簇策略能够持续有效地降低聚类结果的整体结构复杂性,从而生成更加清晰、可解释的流程模型。
中图分类号:
| [1]LU E,FANG X W,FANG N,et al.Discovery of effective infrequent sequences based on maximum probability path[J].Connection Science,2022,34(1):63-82. [2]ZANDKARIMI F,REHSE J R,SOUDMAND P,et al.A generic framework for trace clustering in process mining[C]//2020 2nd International conference on process mining (ICPM).2020:177-184. [3]BERTRAND Y,DE WEERDT J,SERRAL E.TROPICCAL:Multi-perspective trace clustering for IoT-enhanced processes[J].Computers in Industry,2026,175:104419. [4]PEEPERKORN J,DE SMEDT J,DE WEERDT J.Model-driven stochastic trace clustering[J].Information Systems,2026,139:102697. [5]EVERMANN J,THALER T,FETTKE P.Clustering tracesusing sequence alignment[C]//Business Process Management Workshops:BPM 2015,13th International Workshop.2016:179-190. [6]LU X X,TABATABAEI S A,HOOGENDOORN M,et al.Trace clustering on very large event data in healthcare using frequent sequence patterns[C]//Business Process Management:17th International Conference,BPM 2019.2019:198-215. [7]DOČAN O,AVVAD H.Fuzzy clustering based on activity sequence and cycle time in process mining[J].Axioms,2025,14(5):351. [8]LIN L L,WEN L J,QIAN C,et al.Overview of Log Partitioning Technology for Process Mining[J].Journal of Computer Science,2022,045(9):1946-1968. [9]APPICE A,MALERBA D.A co-training strategy for multiple view clustering in process mining[J].IEEE Transactions on Services Computing,2015,9(6):832-845. [10]DE KONINCK P,NELISSEN K,BAESENS B,et al.An approach for incorporating expert knowledge in trace clustering[C]//Advanced Information Systems Engineering:29th International Conference.2017:561-576. [11]DE KONINCK P,NELISSEN K,VANDEN BROUCKE S,et al.Expert-driven trace clustering with instance-level constraints[J].Knowledge and Information Systems,2021,63:1197-1220. [12]HE Z,HU L,HE J,et al.Significance-based interpretable se-quence clustering[J].Information Sciences,2025,704:121972. [13]DE WEERDT J,VANDEN BROUCKE S,VANTHIENEN J,et al.Active trace clustering for improved process discovery[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(12):2708-2720. [14]JIAN K Y,SHI Y Q,HUANG S,et al.Review of Similarity Research on Business Process Models[J].Computer Science,2023,50(6):338-350. [15]VAN DER AALST W M.Process mining:a 360 degree overview[M]//Process Mining Handbook:Springer,2022:3-34. [16]LOPES I F,FERREIRA D R.A survey of process mining competitions:the BPI challenges 2011-2018[C]//Business Process Management Workshops:BPM 2019 International Workshops.2019:263-274. [17]VAN DER AALST W M.Process mining:Data science in action[M].Springer,2016. [18]ZHANG Y L,ZHOU Y J.Overview of Clustering Algorithms[J].Computer Applications,2019,39(7):1869-1882. [19]GAO LL,BIEN J,WITTEN D.Selective inference for hierarchical clustering[J].Journal of the American Statistical Association,2022(10):1-11. [20]SAINT J,FAN Y Z,SINGH S,et al.Using process mining to analyse self-regulated learning:a systematic analysis of four algorithms[C]//LAK21:11th International Learning Analytics and Knowledge Conference.2021:333-343. [21]IMRAN M,ISMAIL M A,HAMID S,et al.Complex process modeling in Process mining:A systematic review[J].IEEE Access,2022,9(10):101515-101536. [22]DE KONINCK P,DE WEERDT J.Scalable mixed-paradigmtrace clustering using super-instances[C]//2019 International Conference on Process Mining(ICPM).2019:17-24. |
|
||