Computer Science ›› 2026, Vol. 53 ›› Issue (5): 164-173.doi: 10.11896/jsjkx.260100070

• Database & Big Data & Data Science • Previous Articles     Next Articles

Structure-aware Trace Clustering Method Based on Process Edit Distance

YE Jianhong, WU Yongjin, HUANG Hongkai   

  1. College of Computer Science and Technology, Huaqiao University, Xiamen, Fujian 361021, China
  • Received:2026-01-13 Revised:2026-03-11 Published:2026-05-08
  • About author:YE Jianhong,born in 1976,Ph.D,associate professor,is a senior member of CCF(No.14242M).His main research interests include data mining,fault diagnosis and prediction,intelligent manufacturing,and robotics.
  • Supported by:
    Guiding Project of Fujian Provincial Department of Science and Technology(2024H01010100).

Abstract: In model checking and process mining,trace clustering groups similar execution traces to support the construction of accurate behavioral models,the verification of model correctness,and data-driven model refinement.However,existing sequence pattern-based trace clustering approaches typically treat traces as ordinary strings,neglecting the inherent concurrent and cyclic execution relationships among activities.This simplification often leads to the loss of structural information and consequently degrades clustering quality.To address this issue,this paper proposes a novel trace similarity measurement method,referred to as process edit distance.The proposed method first normalizes concurrent execution relationships among activities into a consistent sequential representation.It then abstracts repetitive loop behaviors through a compression and simplification mechanism to reduce the influence of redundant executions.Finally,trace similarity is measured by jointly considering activity occurrences and the direct-follow relationships between activities.Furthermore,to obtain process models that better reflect real business behavior,a post-processing strategy termed merging noise clusters is introduced within an agglomerative hierarchical clustering framework to alleviate structural fragmentation caused by noise or small-sized clusters.Experimental results demonstrate that the trace clustering algorithm based on process edit distance outperforms existing methods of the same category in terms of clustering quality,while exhibiting strong stability and robustness.In addition,the merging noise clusters strategy consistently reduces the overall structural complexity of the clustering results,leading to clearer and more interpretable process models.

Key words: Model checking, Sequence patterns, Trace clustering, Process edit distance, Merge noise clusters

CLC Number: 

  • TP393
[1]LU E,FANG X W,FANG N,et al.Discovery of effective infrequent sequences based on maximum probability path[J].Connection Science,2022,34(1):63-82.
[2]ZANDKARIMI F,REHSE J R,SOUDMAND P,et al.A generic framework for trace clustering in process mining[C]//2020 2nd International conference on process mining (ICPM).2020:177-184.
[3]BERTRAND Y,DE WEERDT J,SERRAL E.TROPICCAL:Multi-perspective trace clustering for IoT-enhanced processes[J].Computers in Industry,2026,175:104419.
[4]PEEPERKORN J,DE SMEDT J,DE WEERDT J.Model-driven stochastic trace clustering[J].Information Systems,2026,139:102697.
[5]EVERMANN J,THALER T,FETTKE P.Clustering tracesusing sequence alignment[C]//Business Process Management Workshops:BPM 2015,13th International Workshop.2016:179-190.
[6]LU X X,TABATABAEI S A,HOOGENDOORN M,et al.Trace clustering on very large event data in healthcare using frequent sequence patterns[C]//Business Process Management:17th International Conference,BPM 2019.2019:198-215.
[7]DOČAN O,AVVAD H.Fuzzy clustering based on activity sequence and cycle time in process mining[J].Axioms,2025,14(5):351.
[8]LIN L L,WEN L J,QIAN C,et al.Overview of Log Partitioning Technology for Process Mining[J].Journal of Computer Science,2022,045(9):1946-1968.
[9]APPICE A,MALERBA D.A co-training strategy for multiple view clustering in process mining[J].IEEE Transactions on Services Computing,2015,9(6):832-845.
[10]DE KONINCK P,NELISSEN K,BAESENS B,et al.An approach for incorporating expert knowledge in trace clustering[C]//Advanced Information Systems Engineering:29th International Conference.2017:561-576.
[11]DE KONINCK P,NELISSEN K,VANDEN BROUCKE S,et al.Expert-driven trace clustering with instance-level constraints[J].Knowledge and Information Systems,2021,63:1197-1220.
[12]HE Z,HU L,HE J,et al.Significance-based interpretable se-quence clustering[J].Information Sciences,2025,704:121972.
[13]DE WEERDT J,VANDEN BROUCKE S,VANTHIENEN J,et al.Active trace clustering for improved process discovery[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(12):2708-2720.
[14]JIAN K Y,SHI Y Q,HUANG S,et al.Review of Similarity Research on Business Process Models[J].Computer Science,2023,50(6):338-350.
[15]VAN DER AALST W M.Process mining:a 360 degree overview[M]//Process Mining Handbook:Springer,2022:3-34.
[16]LOPES I F,FERREIRA D R.A survey of process mining competitions:the BPI challenges 2011-2018[C]//Business Process Management Workshops:BPM 2019 International Workshops.2019:263-274.
[17]VAN DER AALST W M.Process mining:Data science in action[M].Springer,2016.
[18]ZHANG Y L,ZHOU Y J.Overview of Clustering Algorithms[J].Computer Applications,2019,39(7):1869-1882.
[19]GAO LL,BIEN J,WITTEN D.Selective inference for hierarchical clustering[J].Journal of the American Statistical Association,2022(10):1-11.
[20]SAINT J,FAN Y Z,SINGH S,et al.Using process mining to analyse self-regulated learning:a systematic analysis of four algorithms[C]//LAK21:11th International Learning Analytics and Knowledge Conference.2021:333-343.
[21]IMRAN M,ISMAIL M A,HAMID S,et al.Complex process modeling in Process mining:A systematic review[J].IEEE Access,2022,9(10):101515-101536.
[22]DE KONINCK P,DE WEERDT J.Scalable mixed-paradigmtrace clustering using super-instances[C]//2019 International Conference on Process Mining(ICPM).2019:17-24.
[1] ZUO Chencui, HUANG Zhiqiu, HU Jun, XIE Jian, XU Heng, SHI Fan. Research on Safety Analysis of Mode Transition of Flight Guidance System Based on STPA [J]. Computer Science, 2026, 53(1): 341-352.
[2] ZHANG Cong, CHEN Zhe, WANG Huijie, WEI Yiyang. SCADE Model Checking Based on Implicit Predicate Abstraction and Property-directedReachability [J]. Computer Science, 2025, 52(12): 24-31.
[3] SHAO Wenxin, YANG Zhibin, LI Wei, ZHOU Yong. Natural Language Requirements Based Approach for Automatic Test Cases Generation of SCADE Models [J]. Computer Science, 2024, 51(7): 29-39.
[4] ZHENG Hong, QIAN Shihui, LIU Zerun, DU Wen. Formal Verification of Supply Chain Contract Based on Coloured Petri Nets [J]. Computer Science, 2023, 50(6A): 220300220-7.
[5] YANG Liu, FAN Hongyu, LI Dongfang, HE Fei. IC3 Hardware Verification Algorithm Based on Variable Hiding Abstraction [J]. Computer Science, 2023, 50(11A): 230200112-6.
[6] RAN Dan, CHEN Zhe, SUN Yi, YANG Zhi-bin. SCADE Model Checking Based on Program Transformation [J]. Computer Science, 2021, 48(12): 125-130.
[7] CAI Yong, QIAN Jun-yan, PAN Hai-yu. Approximate Safety Properties in Metric Linear Temporal Logic [J]. Computer Science, 2020, 47(10): 309-314.
[8] XIA Nu-nu, YANG Jin-ji, ZHAO Gan-sen, MO Xiao-shan. Formal Verification of Cloud-aided Lightweight Certificateless Authentication Protocol Based on Probabilistic Model [J]. Computer Science, 2019, 46(8): 206-211.
[9] HAN Ying-jie, ZHOU Qing-lei, ZHU Wei-jun. Survey on DNA-computing Based Methods of Computation Tree Logic Model Checking [J]. Computer Science, 2019, 46(11): 25-31.
[10] ZHOU Nv-qi, ZHOU Yu. Multi-objective Verification of Web Service Composition Based on Probabilistic Model Checking [J]. Computer Science, 2018, 45(8): 288-294.
[11] LI Yun-chou,YIN Ping. Research of Model Checking Application on Aerospace TT&C Software [J]. Computer Science, 2018, 45(6A): 523-526.
[12] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure [J]. Computer Science, 2018, 45(4): 71-75.
[13] NIE Kai, ZHOU Qing-lei, ZHU Wei-jun and ZHANG Chao-yang. Modeling for Three Kinds of Network Attacks Based on Temporal Logic [J]. Computer Science, 2018, 45(2): 209-214.
[14] YANG Hong, HONG Mei, QU Yuan-yuan. Approach of Mutation Test Case Generation Based on Model Checking [J]. Computer Science, 2018, 45(11A): 488-493.
[15] ZHAO Ying, PAN Hua, ZHANG Yun-meng, MO Qi, DAI Fei. Modeling and Behavior Verification for Collaborative Business Processes [J]. Computer Science, 2018, 45(11A): 597-602.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!