Computer Science ›› 2023, Vol. 50 ›› Issue (6): 313-321.doi: 10.11896/jsjkx.220500020

• Computer Network • Previous Articles     Next Articles

LTTFAD:Log Template Topic Feature-based Anomaly Detection

SUN Xuekui1, DAI Hua1,2, ZHOU Jianguo1, YANG Geng1,2, CHEN Yanli1   

  1. 1 School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing 210023,China
  • Received:2022-05-03 Revised:2022-09-07 Online:2023-06-15 Published:2023-06-06
  • About author:SUN Xuekui,born in 1996,master.His main research interests include deep learning and system security.DAI Hua,born in 1982,Ph.D,professor.His main research interests include data management and security.
  • Supported by:
    National Natural Science Foundation of China(61872197,61972209,61902199,61771251),Postdoctoral Science Foundation of China(2019M651919) and Natural Science Foundation of Nanjing University of Posts and Telecommunications(NY217119,NY219142).

Abstract: In the field of system security,using logs to detect software of system anomalies is a very popular method.With the rapid development of software and hardware,it is hard to perform manual detection on the huge scale of logs.There has been a lot of researches on log anomaly detection.Existing automatic log anomaly detection approaches are all based on log template,which is unstable when log template is modified.This paper proposes a log anomaly detection model based on topic feature of log template.Firstly,it utilizes an LDA topic model to extract topic feature of log template and implements anomaly detection through LSTM recurrent neural network.Experimental results show that the proposed anomaly detection model outperforms the existing models on HDFS and OpenStack datasets in most metrics,such as the precision,recall and F1 Score.In addition,LTTFAD model still has high stability for new log template injection.

Key words: Anomaly detection, Log analysis, Deep learning, LDA, Topic feature

CLC Number: 

  • TP391
[1]LIANG Y,ZHANG Y,XIONG H,et al.Failure prediction in ibm bluegene/l event logs[C]//Seventh IEEE International Conference on Data Mining(ICDM 2007).IEEE,2007:583-588.
[2]WANG Y,WONG J,MINER A.Anomaly intrusion detectionusing one class SVM[C]//Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop,2004.IEEE,2004:358-364.
[3]BREIER J,BRANIŠOVÁ J.Anomaly detection from log filesusing data mining techniques[M]//Information Science and Applications.Berlin:Springer,2015:449-457.
[4]HE P,ZHU J,HE S,et al.Towards automated log parsing for large-scale log data analysis[J].IEEE Transactions on Dependable and Secure Computing,2017,15(6):931-944.
[5]CHEN M,ZHENG A X,LLOYD J,et al.Failure diagnosis using decision trees[C]//International Conference on Autonomic Computing.IEEE,2004:36-43.
[6]YING S,WANG B,WANG L,et al.An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples[J].ACM Transactions on Knowledge Disco-very from Data(TKDD),2021,15(3):1-22.
[7]XU W,HUANG L,FOX A,et al.Detecting large-scale system problems by mining console logs[C]//Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles.2009:117-132.
[8]XU D,WANG Y,MENG Y,et al.An improved data anomaly detection method based on isolation forest[C]//2017 10th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2017,2:287-291.
[9]LOU J G,FU Q,YANG S,et al.Mining invariants from console logs for system problem detection[C]//2010 USENIX Annual Technical Conference(USENIX ATC 10).2010.
[10]VAARANDI R,PIHELGAS M.Logcluster-a data clusteringand pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management(CNSM).IEEE,2015:1-7.
[11]DU M,LI F,ZHENG G,et al.Deeplog:Anomaly detection and diagnosis from system logs through deep learning[C]//Procee-dings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:1285-1298.
[12]MENG W,LIU Y,ZHU Y,et al.LogAnomaly:Unsuperviseddetection of sequential and quantitative anomalies in unstructured logs[C]//IJCAI.2019:4739-4745.
[13]ZHANG X,XU Y,LIN Q,et al.Robust log-based anomaly detection on unstable log data[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Confe-rence and Symposium on the Foundations of Software Enginee-ring.2019:807-817.
[14]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[15]CINQUE M,COTRONEO D,PECCHIA A.Event logs for the analysis of software failures:A rule-basedapproach[J].IEEE Transactions on Software Engineering,2012,39(6):806-821.
[16]HANSEN S E,ATKINS E T.Automated System Monitoringand Notification with Swatch[C]//LISA.1993,93:145-152.
[17]OPREA A,LI Z,YEN T F,et al.Detection of early-stage enterprise infection by mining large-scale log data[C]//2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.IEEE,2015:45-56.
[18]PREWETT J E.Analyzing cluster log files using logsurfer[C]//Proceedings of the 4th Annual Conference on Linux Clusters.Citeseer,2003.
[19]ROUILLARD J P.Real-time Log File Analysis Using the Simple Event Correlator(SEC)[C]//LISA.2004,4:133-150.
[20]ROY S,KÖNIG A C,DVORKIN I,et al.Perfaugur:Robustdiagnostics for performance anomalies in cloud services[C]//2015 IEEE 31st International Conference on Data Engineering.IEEE,2015:1167-1178.
[21]YAMANISHI K,MARUYAMA Y.Dynamic syslog mining for network failure monitoring[C]//Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Disco-very in Data Mining.2005:499-508.
[22]YEN T F,OPREA A,ONARLIOGLU K,et al.Beehive:Large-scale log analysis for detecting suspicious activity in enterprise networks[C]//Proceedings of the 29th Annual Computer Secu-rity Applications Conference.2013:199-208.
[23]MAKANJU A A O,ZINCIR-HEYWOOD A N,MILIOS E E.Clustering event logs using iterative partitioning[C]//Procee-dings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:1255-1264.
[24]TANG L,LI T,PERNG C S.LogSig:Generating system events from raw textual logs[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management.2011:785-794.
[25]HAMOONI H,DEBNATH B,XU J,et al.Logmine:Fast pattern recognition for log analytics[C]//Proceedings of the 25th ACM International on Conference on Information and Know-ledge Management.2016:1573-1582.
[26]DU M,LI F.Spell:Streaming parsing of system event logs[C]//2016 IEEE 16th International Conference on Data Mining(ICDM).IEEE,2016:859-864.
[27]HE P,ZHU J,ZHENG Z,et al.Drain:An online log parsing approach with fixed depth tree[C]//2017 IEEE International Conference on Web Services(ICWS).IEEE,2017:33-40.
[28]MESSAOUDI S,PANICHELLA A,BIANCULLI D,et al.Asearch-based approach for accurate identification of log message formats[C]//2018 IEEE/ACM 26th International Conference on Program Comprehension(ICPC).IEEE,2018:167-16710.
[29]MIZUTANI M.Incremental mining of system log format[C]//2013 IEEE International Conference on Services Computing.IEEE,2013:595-602.
[30]SHIMA K.Length matters:Clustering system log messagesusing length of words[J].arXiv:1611.03213,2016.
[31]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of machine Learning research,2003,3(Jan):993-1022.
[32]ZHU J,HE S,LIU J,et al.Tools and benchmarks for automated log parsing[C]//2019 IEEE/ACM 41st International Confe-rence on Software Engineering:Software Engineering in Practice(ICSE-SEIP).IEEE,2019:121-130.
[1] ZHANG Guohua, YAN Xuefeng, GUAN Donghai. Anomaly Detection of Time-series Based on Multi-modal Feature Fusion [J]. Computer Science, 2023, 50(6A): 220700094-7.
[2] YU Jiabao, YAO Junmei, XIE Ruitao, WU Kaishun, MA Junchao. Tag Identification for UHF RFID Systems Based on Deep Learning [J]. Computer Science, 2023, 50(6A): 220200151-6.
[3] HAN Junling, LI Bo, KANG Xiaodong, YANG Jingyi, LIU Hanqing, WANG Xiaotian. Cardiac MRI Image Segmentation Based on Faster R-CNN and U-net [J]. Computer Science, 2023, 50(6A): 220600047-9.
[4] LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin. Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220600129-8.
[5] SUN Kaiwei, WANG Zhihao, LIU Hu, RAN Xue. Maximum Overlap Single Target Tracking Algorithm Based on Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220400023-5.
[6] XIE Puxuan, CUI Jinrong, ZHAO Min. Electiric Bike Helment Wearing Detection Alogrithm Based on Improved YOLOv5 [J]. Computer Science, 2023, 50(6A): 220500005-6.
[7] WAN Haibo, JIANG Lei, WANG Xiao. Real-time Detection of Motorcycle Lanes Based on Deep Learning [J]. Computer Science, 2023, 50(6A): 220200066-5.
[8] WANG Xiaotian, LI Bo, KANG Xiaodong, LIU Hanqing, HAN Junling, YANG Jingyi. Study on Phased Target Detection in CT Image [J]. Computer Science, 2023, 50(6A): 220200063-10.
[9] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[10] ZENG Wu, MAO Guojun. Few-shot Learning Method Based on Multi-graph Feature Aggregation [J]. Computer Science, 2023, 50(6A): 220400029-10.
[11] HOU Yanrong, LIU Ruixia, SHU Minglei, CHEN Changfang, SHAN Ke. Review of Research on Denoising Algorithms of ECG Signal [J]. Computer Science, 2023, 50(6A): 220300094-11.
[12] GU Yuhang, HAO Jie, CHEN Bing. Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion [J]. Computer Science, 2023, 50(6A): 220500001-6.
[13] ZHANG Jian, ZHANG Ye. College Students Employment Dynamic Prediction of Multi-feature Fusion Based on GRU-LSTM [J]. Computer Science, 2023, 50(6A): 220500056-6.
[14] LIANG Mingxuan, WANG Shi, ZHU Junwu, LI Yang, GAO Xiang, JIAO Zhixiang. Survey of Knowledge-enhanced Natural Language Generation Research [J]. Computer Science, 2023, 50(6A): 220200120-8.
[15] WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!