计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 313-321.doi: 10.11896/jsjkx.220500020

• 信息安全 • 上一篇    下一篇

基于日志模板主题特征的日志异常检测

孙雪奎1, 戴华1,2, 周建国1, 杨庚1,2, 陈燕俐1   

  1. 1 南京邮电大学计算机学院 南京 210023
    2 江苏省大数据安全与智能处理重点实验室 南京 210023
  • 收稿日期:2022-05-03 修回日期:2022-09-07 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 戴华(daihua@njupt.edu.cn)
  • 作者简介:(kasonsun@foxmail.com)
  • 基金资助:
    国家自然科学基金面上项目(61872197,61972209,61902199,61771251);中国博士后自然科学基金(2019M651919);南京邮电大学自然科学基金(NY217119,NY219142)

LTTFAD:Log Template Topic Feature-based Anomaly Detection

SUN Xuekui1, DAI Hua1,2, ZHOU Jianguo1, YANG Geng1,2, CHEN Yanli1   

  1. 1 School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing 210023,China
  • Received:2022-05-03 Revised:2022-09-07 Online:2023-06-15 Published:2023-06-06
  • About author:SUN Xuekui,born in 1996,master.His main research interests include deep learning and system security.DAI Hua,born in 1982,Ph.D,professor.His main research interests include data management and security.
  • Supported by:
    National Natural Science Foundation of China(61872197,61972209,61902199,61771251),Postdoctoral Science Foundation of China(2019M651919) and Natural Science Foundation of Nanjing University of Posts and Telecommunications(NY217119,NY219142).

摘要: 在系统安全领域,通过日志来检测软件或者系统异常是一种常用的安全防护手段。随着软件和硬件的快速发展,在大规模的日志记录上进行人工标记变得十分困难,目前已有大量的日志异常检测的相关研究。现有的自动化日志检测模型均使用日志模板作为分类,这些模型的性能以及实用性很容易受到日志模板变化的影响。因此,基于日志模板主题特征的日志异常检测模型LTTFAD被提出,LTTFAD首次引入了LDA主题模型以提取日志模板的主题特征并且通过循环神经网络LSTM实现异常检测。实验结果表明,在HDFS和OpenStack数据集上基于日志模板主题特征的日志异常检测模型LTTFAD的查准率、查全率和调和分数等性能指标均明显优于现有基于日志模板的日志异常检测模型。此外,对于新日志模板的注入,LTTFAD模型依然具有较高的稳定性。

关键词: 异常检测, 日志分析, 深度学习, LDA, 主题特征

Abstract: In the field of system security,using logs to detect software of system anomalies is a very popular method.With the rapid development of software and hardware,it is hard to perform manual detection on the huge scale of logs.There has been a lot of researches on log anomaly detection.Existing automatic log anomaly detection approaches are all based on log template,which is unstable when log template is modified.This paper proposes a log anomaly detection model based on topic feature of log template.Firstly,it utilizes an LDA topic model to extract topic feature of log template and implements anomaly detection through LSTM recurrent neural network.Experimental results show that the proposed anomaly detection model outperforms the existing models on HDFS and OpenStack datasets in most metrics,such as the precision,recall and F1 Score.In addition,LTTFAD model still has high stability for new log template injection.

Key words: Anomaly detection, Log analysis, Deep learning, LDA, Topic feature

中图分类号: 

  • TP391
[1]LIANG Y,ZHANG Y,XIONG H,et al.Failure prediction in ibm bluegene/l event logs[C]//Seventh IEEE International Conference on Data Mining(ICDM 2007).IEEE,2007:583-588.
[2]WANG Y,WONG J,MINER A.Anomaly intrusion detectionusing one class SVM[C]//Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop,2004.IEEE,2004:358-364.
[3]BREIER J,BRANIŠOVÁ J.Anomaly detection from log filesusing data mining techniques[M]//Information Science and Applications.Berlin:Springer,2015:449-457.
[4]HE P,ZHU J,HE S,et al.Towards automated log parsing for large-scale log data analysis[J].IEEE Transactions on Dependable and Secure Computing,2017,15(6):931-944.
[5]CHEN M,ZHENG A X,LLOYD J,et al.Failure diagnosis using decision trees[C]//International Conference on Autonomic Computing.IEEE,2004:36-43.
[6]YING S,WANG B,WANG L,et al.An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples[J].ACM Transactions on Knowledge Disco-very from Data(TKDD),2021,15(3):1-22.
[7]XU W,HUANG L,FOX A,et al.Detecting large-scale system problems by mining console logs[C]//Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles.2009:117-132.
[8]XU D,WANG Y,MENG Y,et al.An improved data anomaly detection method based on isolation forest[C]//2017 10th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2017,2:287-291.
[9]LOU J G,FU Q,YANG S,et al.Mining invariants from console logs for system problem detection[C]//2010 USENIX Annual Technical Conference(USENIX ATC 10).2010.
[10]VAARANDI R,PIHELGAS M.Logcluster-a data clusteringand pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management(CNSM).IEEE,2015:1-7.
[11]DU M,LI F,ZHENG G,et al.Deeplog:Anomaly detection and diagnosis from system logs through deep learning[C]//Procee-dings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:1285-1298.
[12]MENG W,LIU Y,ZHU Y,et al.LogAnomaly:Unsuperviseddetection of sequential and quantitative anomalies in unstructured logs[C]//IJCAI.2019:4739-4745.
[13]ZHANG X,XU Y,LIN Q,et al.Robust log-based anomaly detection on unstable log data[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Confe-rence and Symposium on the Foundations of Software Enginee-ring.2019:807-817.
[14]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[15]CINQUE M,COTRONEO D,PECCHIA A.Event logs for the analysis of software failures:A rule-basedapproach[J].IEEE Transactions on Software Engineering,2012,39(6):806-821.
[16]HANSEN S E,ATKINS E T.Automated System Monitoringand Notification with Swatch[C]//LISA.1993,93:145-152.
[17]OPREA A,LI Z,YEN T F,et al.Detection of early-stage enterprise infection by mining large-scale log data[C]//2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.IEEE,2015:45-56.
[18]PREWETT J E.Analyzing cluster log files using logsurfer[C]//Proceedings of the 4th Annual Conference on Linux Clusters.Citeseer,2003.
[19]ROUILLARD J P.Real-time Log File Analysis Using the Simple Event Correlator(SEC)[C]//LISA.2004,4:133-150.
[20]ROY S,KÖNIG A C,DVORKIN I,et al.Perfaugur:Robustdiagnostics for performance anomalies in cloud services[C]//2015 IEEE 31st International Conference on Data Engineering.IEEE,2015:1167-1178.
[21]YAMANISHI K,MARUYAMA Y.Dynamic syslog mining for network failure monitoring[C]//Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Disco-very in Data Mining.2005:499-508.
[22]YEN T F,OPREA A,ONARLIOGLU K,et al.Beehive:Large-scale log analysis for detecting suspicious activity in enterprise networks[C]//Proceedings of the 29th Annual Computer Secu-rity Applications Conference.2013:199-208.
[23]MAKANJU A A O,ZINCIR-HEYWOOD A N,MILIOS E E.Clustering event logs using iterative partitioning[C]//Procee-dings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:1255-1264.
[24]TANG L,LI T,PERNG C S.LogSig:Generating system events from raw textual logs[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management.2011:785-794.
[25]HAMOONI H,DEBNATH B,XU J,et al.Logmine:Fast pattern recognition for log analytics[C]//Proceedings of the 25th ACM International on Conference on Information and Know-ledge Management.2016:1573-1582.
[26]DU M,LI F.Spell:Streaming parsing of system event logs[C]//2016 IEEE 16th International Conference on Data Mining(ICDM).IEEE,2016:859-864.
[27]HE P,ZHU J,ZHENG Z,et al.Drain:An online log parsing approach with fixed depth tree[C]//2017 IEEE International Conference on Web Services(ICWS).IEEE,2017:33-40.
[28]MESSAOUDI S,PANICHELLA A,BIANCULLI D,et al.Asearch-based approach for accurate identification of log message formats[C]//2018 IEEE/ACM 26th International Conference on Program Comprehension(ICPC).IEEE,2018:167-16710.
[29]MIZUTANI M.Incremental mining of system log format[C]//2013 IEEE International Conference on Services Computing.IEEE,2013:595-602.
[30]SHIMA K.Length matters:Clustering system log messagesusing length of words[J].arXiv:1611.03213,2016.
[31]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of machine Learning research,2003,3(Jan):993-1022.
[32]ZHU J,HE S,LIU J,et al.Tools and benchmarks for automated log parsing[C]//2019 IEEE/ACM 41st International Confe-rence on Software Engineering:Software Engineering in Practice(ICSE-SEIP).IEEE,2019:121-130.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!