计算机科学 ›› 2023, Vol. 50 ›› Issue (5): 38-51.doi: 10.11896/jsjkx.220900030
李炳辉, 方欢, 梅振辉
LI Binghui, FANG Huan, MEI Zhenhui
摘要: 由异常值和缺失值导致的低质量事件日志在实际的业务流程中通常不可避免,低质量的事件日志会降低过程挖掘相关算法的性能,从而干扰决策的正确实施。在系统参考模型未知的条件下,现有方法在进行日志异常检测与修复工作中,存在需要人为设定阈值、不知预测模型学习何种行为约束以及修复结果可解释性较差的问题。采用遮掩策略的预训练语言模型BERT可以通过上下文信息自监督地学习文本中的通用语义,受此启发,提出了模型BERT4Log和弱行为轮廓理论,并结合多层多头注意力机制进行低质量事件日志的可解释修复。所提修复方法不需要预先设定阈值,仅需要进行一次自监督训练,同时该方法利用弱行为轮廓理论量化行为上的日志修复程度,并结合多层多头注意力机制实现对具体预测结果的详细解释。最后,在一组公开数据集上对方法性能进行评估,并与目前性能最优的研究进行对比分析,实验结果表明BERT4Log的修复性能整体优于对比方法,可以学习弱行为轮廓并实现修复结果的详细解释。
中图分类号:
[1]WICKRAMANAYAKE B,HE Z,OUYANG C,et al.Building interpretable models for business process prediction using shared and specialised attention mechanisms[J].Knowledge-Based Systems,2022,248:108773. [2]MOON J,PARK G,JEONG J.Pop-on:Prediction of processusing one-way language model based on nlp approach[J].Applied Sciences,2021,11(2):864. [3]BATINI C,CAPPIELLO C,FRANCALANCI C,et al.Metho-dologies for data quality assessment and improvement[J].ACM Computing Surveys(CSUR),2009,41(3):1-52. [4]NGUYEN H T C,LEE S,KIM J,et al.Autoencoders for improving quality of process event logs[J].Expert Systems with Applications,2019,131:132-147. [5]BOSE R P J C,MANS R S,VAN DER AALST W M P.Wanna improve process mining results?[C]//2013 IEEE Symposium on Computational Intelligence and Data Mining(CIDM).IEEE,2013:127-134. [6]SARZYNSKA-WAWER J,WAWER A,PAWLAK A,et al.Detecting formal thought disorder by deep contextualized word representations[J].Psychiatry Research,2021,304:114135. [7]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].(2018-12-30)[2022-07-15].https://www.cs.ubc.ca/~amuham01/LING530/papers/radfod2018improving.pdf. [8]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [9]VAN DER AALST W M P,DE MEDEIROS A K A.Process mining and security:Detecting anomalous process executions and checking process conformance[J].Electronic Notes in Theo-retical Computer Science,2005,121:3-21. [10]GHIONNA L,GRECO G,GUZZO A,et al.Outlier detectiontechniques for process mining applications[C]//International Symposium on Methodologies for Intelligent Systems.Springer,2008:150-159. [11]FANISANIM,ZELST S J,VAN DER AALST W M P.Repairing outlier behaviour in event logs[C]//International Confe-rence on Business Information Systems.Springer,2018:115-131. [12]ROGGE-SOLTIA,MANS R S,VAN DER AALST W M P,et al.Improving documentation by repairing event logs[C]//IFIP Working Conference on the Practice of Enterprise Modeling.Heidelberg,Springer,2013:129-144. [13]WANG J,SONG S,ZHU X,et al.Efficient recovery of missing events[J].Proceedings of the VLDB Endowment,2013,6(10):841-852. [14]WANG J,SONG S,LIN X,et al.Cleaning structured eventlogs:A graph repair approach[C]//2015 IEEE 31st International Conference on Data Engineering.IEEE,2015:30-41. [15]CHINCES D,SALOMIE I.Optimizing spaghetti process models[C]//2015 20th International Conference on Control Systems and Computer Science.IEEE,2015:506-511. [16]LIU J,XU J,ZHANG R,et al.A repairing missing activities approach with succession relation for event logs[J].Knowledge and Information Systems,2021,63(2):477-495. [17]SURIADI S,ANDREWS R,TER HOFSTEDE A H M,et al.Event log imperfection patterns for process mining:Towards a systematic approach to cleaning event logs[J].Information Systems,2017,64:132-150. [18]NOLLE T,LUETTGEN S,SEELIGER A,et al.Analyzing busi-ness process anomalies using autoencoders[J].Machine Lear-ning,2018,107(11):1875-1893. [19]NGUYEN H T C,COMUZZI M.Event log reconstruction using autoencoders[C]//International Conference on Service-Oriented Computing.Springer,2018:335-350. [20]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [21]VAN DER A A H,REBMANN A,LEOPOLD H.Natural language-based detection of semantic execution anomalies in event logs[J].Information Systems,2021,102:101824. [22]RIZZI W,DI FRANCESCOMARINO C,MAGGI F M.Explaina-bility in predictive process monitoring:when understanding helps improving[C]//International Conference on Business Process Management.Springer,2020:141-158. [23]GALANTI R,COMA-PUIG B,DE LEONI M,et al.Explainable predictive process monitoring[C]//2020 2nd International Conference on Process Mining(ICPM).IEEE,2020:1-8. [24]WEIDLICH M,MENDLING J,WESKE M.Efficient consistency measurement based on behavioral profiles of process models[J].IEEE Transactions on Software Engineering,2010,37(3):410-429. [25]FANG H,JIN P P,FANG X W,et al.Process variants cluster mining method based on causal behavioral profiles[J].Compu-ter Integrated Manufacturing System,2020,26(6):1538-1547. [26]FANG H,FANG X W,WANG L L.Review of Reliability Ana-lysis Based on Petri Nets[J].Computer Science,2014,41(7):40-44. [27]FANG H,SUN S Y,FANG X W.Behavior change mining me-thods based on incomplete logs conjoint occurrence relation[J].Computer Integrated Manufacturing System,2020,26(7):1887-1895. |
[1] | 杨斌, 梁婧, 周佳薇, 赵梦赐. 基于注意力机制的可解释点击率预估模型研究 Study on Interpretable Click-Through Rate Prediction Based on Attention Mechanism 计算机科学, 2023, 50(5): 12-20. https://doi.org/10.11896/jsjkx.221000032 |
[2] | 王先旺, 周浩, 张明慧, 朱尤伟. 基于Swin Transformer和三维残差多层融合网络的高光谱图像分类 Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network 计算机科学, 2023, 50(5): 155-160. https://doi.org/10.11896/jsjkx.220400035 |
[3] | 胡绍凯, 赫晓慧, 田智慧. 基于MLUM-Net的高分遥感影像土地利用多分类方法 Land Use Multi-classification Method of High Resolution Remote Sensing Images Based on MLUM-Net 计算机科学, 2023, 50(5): 161-169. https://doi.org/10.11896/jsjkx.220300110 |
[4] | 贾天豪, 彭力. 残差学习与循环注意力下的SSD目标检测算法 SSD Object Detection Algorithm with Residual Learning and Cyclic Attention 计算机科学, 2023, 50(5): 170-176. https://doi.org/10.11896/jsjkx.220400085 |
[5] | 阳影, 张凡, 李天瑞. 基于情感知识的双通道图卷积网络的方面级情感分析 Aspect-based Sentiment Analysis Based on Dual-channel Graph Convolutional Network with Sentiment Knowledge 计算机科学, 2023, 50(5): 230-237. https://doi.org/10.11896/jsjkx.220300008 |
[6] | 张雪, 赵晖. 基于多事件语义增强的情感分析 Sentiment Analysis Based on Multi-event Semantic Enhancement 计算机科学, 2023, 50(5): 238-247. https://doi.org/10.11896/jsjkx.220400256 |
[7] | 罗亮, 程春玲, 刘倩, 归耀城. 基于多层感知机和语义矩阵的答案选择模型 Answer Selection Model Based on MLP and Semantic Matrix 计算机科学, 2023, 50(5): 270-276. https://doi.org/10.11896/jsjkx.220400275 |
[8] | 韩雪明, 贾彩燕, 李轩涯, 张鹏飞. 传播树结构结点及路径双注意力谣言检测模型 Dual-attention Network Model on Propagation Tree Structures for Rumor Detection 计算机科学, 2023, 50(4): 22-31. https://doi.org/10.11896/jsjkx.220200037 |
[9] | 尹恒, 张凡, 李天瑞. 基于多邻接图与多头注意力机制的短期交通流量预测 Short-time Traffic Flow Forecasting Based on Multi-adjacent Graph and Multi-head Attention Mechanism 计算机科学, 2023, 50(4): 40-46. https://doi.org/10.11896/jsjkx.220200079 |
[10] | 雒晓辉, 吴云, 王晨星, 余文婷. 基于用户长短期偏好的序列推荐模型 Sequential Recommendation Model Based on User’s Long and Short Term Preference 计算机科学, 2023, 50(4): 47-55. https://doi.org/10.11896/jsjkx.220100264 |
[11] | 王娅丽, 张凡, 余增, 李天瑞. 基于交互注意力和图卷积网络的方面级情感分析 Aspect-level Sentiment Classification Based on Interactive Attention and Graph Convolutional Network 计算机科学, 2023, 50(4): 196-203. https://doi.org/10.11896/jsjkx.220100105 |
[12] | 于兴崭, 芦天亮, 杜彦辉, 王曦锐, 杨成. 基于合成图像和Xception改进模型的安卓恶意家族分类方法 Android Malware Family Classification Method Based on Synthetic Image and Xception Improved Model 计算机科学, 2023, 50(4): 351-358. https://doi.org/10.11896/jsjkx.220300200 |
[13] | 李帅, 徐彬, 韩祎珂, 廖同鑫. SS-GCN:情感增强和句法增强的方面级情感分析模型 SS-GCN:Aspect-based Sentiment Analysis Model with Affective Enhancement and Syntactic Enhancement 计算机科学, 2023, 50(3): 3-11. https://doi.org/10.11896/jsjkx.220700238 |
[14] | 陈富强, 寇嘉敏, 苏利敏, 李克. 基于图神经网络的多信息优化实体对齐模型 Multi-information Optimized Entity Alignment Model Based on Graph Neural Network 计算机科学, 2023, 50(3): 34-41. https://doi.org/10.11896/jsjkx.220700242 |
[15] | 邓亮, 齐攀虎, 刘振龙, 李敬鑫, 唐积强. BGPNRE:一种基于BERT的全局指针网络实体关系联合抽取方法 BGPNRE:A BERT-based Global Pointer Network for Named Entity-Relation Joint Extraction Method 计算机科学, 2023, 50(3): 42-48. https://doi.org/10.11896/jsjkx.220600239 |
|