计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 313-321.doi: 10.11896/jsjkx.220500020
孙雪奎1, 戴华1,2, 周建国1, 杨庚1,2, 陈燕俐1
SUN Xuekui1, DAI Hua1,2, ZHOU Jianguo1, YANG Geng1,2, CHEN Yanli1
摘要: 在系统安全领域,通过日志来检测软件或者系统异常是一种常用的安全防护手段。随着软件和硬件的快速发展,在大规模的日志记录上进行人工标记变得十分困难,目前已有大量的日志异常检测的相关研究。现有的自动化日志检测模型均使用日志模板作为分类,这些模型的性能以及实用性很容易受到日志模板变化的影响。因此,基于日志模板主题特征的日志异常检测模型LTTFAD被提出,LTTFAD首次引入了LDA主题模型以提取日志模板的主题特征并且通过循环神经网络LSTM实现异常检测。实验结果表明,在HDFS和OpenStack数据集上基于日志模板主题特征的日志异常检测模型LTTFAD的查准率、查全率和调和分数等性能指标均明显优于现有基于日志模板的日志异常检测模型。此外,对于新日志模板的注入,LTTFAD模型依然具有较高的稳定性。
中图分类号:
[1]LIANG Y,ZHANG Y,XIONG H,et al.Failure prediction in ibm bluegene/l event logs[C]//Seventh IEEE International Conference on Data Mining(ICDM 2007).IEEE,2007:583-588. [2]WANG Y,WONG J,MINER A.Anomaly intrusion detectionusing one class SVM[C]//Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop,2004.IEEE,2004:358-364. [3]BREIER J,BRANIOVÁ J.Anomaly detection from log filesusing data mining techniques[M]//Information Science and Applications.Berlin:Springer,2015:449-457. [4]HE P,ZHU J,HE S,et al.Towards automated log parsing for large-scale log data analysis[J].IEEE Transactions on Dependable and Secure Computing,2017,15(6):931-944. [5]CHEN M,ZHENG A X,LLOYD J,et al.Failure diagnosis using decision trees[C]//International Conference on Autonomic Computing.IEEE,2004:36-43. [6]YING S,WANG B,WANG L,et al.An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples[J].ACM Transactions on Knowledge Disco-very from Data(TKDD),2021,15(3):1-22. [7]XU W,HUANG L,FOX A,et al.Detecting large-scale system problems by mining console logs[C]//Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles.2009:117-132. [8]XU D,WANG Y,MENG Y,et al.An improved data anomaly detection method based on isolation forest[C]//2017 10th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2017,2:287-291. [9]LOU J G,FU Q,YANG S,et al.Mining invariants from console logs for system problem detection[C]//2010 USENIX Annual Technical Conference(USENIX ATC 10).2010. [10]VAARANDI R,PIHELGAS M.Logcluster-a data clusteringand pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management(CNSM).IEEE,2015:1-7. [11]DU M,LI F,ZHENG G,et al.Deeplog:Anomaly detection and diagnosis from system logs through deep learning[C]//Procee-dings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:1285-1298. [12]MENG W,LIU Y,ZHU Y,et al.LogAnomaly:Unsuperviseddetection of sequential and quantitative anomalies in unstructured logs[C]//IJCAI.2019:4739-4745. [13]ZHANG X,XU Y,LIN Q,et al.Robust log-based anomaly detection on unstable log data[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Confe-rence and Symposium on the Foundations of Software Enginee-ring.2019:807-817. [14]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [15]CINQUE M,COTRONEO D,PECCHIA A.Event logs for the analysis of software failures:A rule-basedapproach[J].IEEE Transactions on Software Engineering,2012,39(6):806-821. [16]HANSEN S E,ATKINS E T.Automated System Monitoringand Notification with Swatch[C]//LISA.1993,93:145-152. [17]OPREA A,LI Z,YEN T F,et al.Detection of early-stage enterprise infection by mining large-scale log data[C]//2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.IEEE,2015:45-56. [18]PREWETT J E.Analyzing cluster log files using logsurfer[C]//Proceedings of the 4th Annual Conference on Linux Clusters.Citeseer,2003. [19]ROUILLARD J P.Real-time Log File Analysis Using the Simple Event Correlator(SEC)[C]//LISA.2004,4:133-150. [20]ROY S,KÖNIG A C,DVORKIN I,et al.Perfaugur:Robustdiagnostics for performance anomalies in cloud services[C]//2015 IEEE 31st International Conference on Data Engineering.IEEE,2015:1167-1178. [21]YAMANISHI K,MARUYAMA Y.Dynamic syslog mining for network failure monitoring[C]//Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Disco-very in Data Mining.2005:499-508. [22]YEN T F,OPREA A,ONARLIOGLU K,et al.Beehive:Large-scale log analysis for detecting suspicious activity in enterprise networks[C]//Proceedings of the 29th Annual Computer Secu-rity Applications Conference.2013:199-208. [23]MAKANJU A A O,ZINCIR-HEYWOOD A N,MILIOS E E.Clustering event logs using iterative partitioning[C]//Procee-dings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:1255-1264. [24]TANG L,LI T,PERNG C S.LogSig:Generating system events from raw textual logs[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management.2011:785-794. [25]HAMOONI H,DEBNATH B,XU J,et al.Logmine:Fast pattern recognition for log analytics[C]//Proceedings of the 25th ACM International on Conference on Information and Know-ledge Management.2016:1573-1582. [26]DU M,LI F.Spell:Streaming parsing of system event logs[C]//2016 IEEE 16th International Conference on Data Mining(ICDM).IEEE,2016:859-864. [27]HE P,ZHU J,ZHENG Z,et al.Drain:An online log parsing approach with fixed depth tree[C]//2017 IEEE International Conference on Web Services(ICWS).IEEE,2017:33-40. [28]MESSAOUDI S,PANICHELLA A,BIANCULLI D,et al.Asearch-based approach for accurate identification of log message formats[C]//2018 IEEE/ACM 26th International Conference on Program Comprehension(ICPC).IEEE,2018:167-16710. [29]MIZUTANI M.Incremental mining of system log format[C]//2013 IEEE International Conference on Services Computing.IEEE,2013:595-602. [30]SHIMA K.Length matters:Clustering system log messagesusing length of words[J].arXiv:1611.03213,2016. [31]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of machine Learning research,2003,3(Jan):993-1022. [32]ZHU J,HE S,LIU J,et al.Tools and benchmarks for automated log parsing[C]//2019 IEEE/ACM 41st International Confe-rence on Software Engineering:Software Engineering in Practice(ICSE-SEIP).IEEE,2019:121-130. |
|