计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 353-355.

• 信息安全 • 上一篇    下一篇

基于大数据的网络日志分析技术

应毅1, 任凯2, 刘亚军3   

  1. 三江学院计算机科学与工程学院 南京2100121
    南京大学金陵学院 南京2100892
    东南大学计算机科学与工程学院 南京2100963
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 应 毅(1979-),男,硕士,副教授,主要研究方向为大数据处理与数据库,E-mail:907635255@qq.com
  • 作者简介:任 凯(1979-),女,硕士,讲师,主要研究方向为分布式计算与数据库;刘亚军(1953-),女,教授,硕士生导师,主要研究方向为软件工程与数据库应用。
  • 基金资助:
    本文受江苏省高等学校自然科学研究面上项目(17KJB520033)资助。

Network Log Analysis Technology Based on Big Data

YING Yi1, REN Kai2, LIU Ya-jun3   

  1. College of Computer Science and Technology,Sanjiang University,Nanjing 210012,China1
    Jinling College,Nanjing University,Nanjing 210089,China2
    School of Computer Science and Engineering,Southeast University,Nanjing 210096,China3
  • Online:2019-02-26 Published:2019-02-26

摘要: 传统的日志分析技术在处理海量数据时存在计算瓶颈。针对该问题,研究了基于大数据技术的日志分析方案:由多台计算机完成日志文件的存储、分析、挖掘工作,建立了一个基于Hadoop开源框架的并行网络日志分析引擎,在MapReduce模型下重新实现了IP统计算法和异常检测算法。实验证明,在数据密集型计算中使用大数据技术可以明显提高算法的执行效率和增加系统的可扩展性。

关键词: Hadoop, MapReduce, 大数据, 日志分析, 异常检测

Abstract: There exists a calculation bottleneck when traditional log analysis technology processes the massive data.To solve this problem,a log analysis solution based on big data technology was proposed in this paper.In this solution,the storage and analysis,mining tasks of Log files will be decomposed on multiple computers.The open source framework Hadoop is used to establish a parallel network log analysis engine.IP statistics and outlier detection algorithm was rea-lized with MapReduce model.Empirical studies show that the use of big data technology in data-intensive computing can significantly improve the execution efficiency of algorithms and the scalability of system.

Key words: Big data, Hadoop, Log analysis, MapReduce, Outlier detection

中图分类号: 

  • TP393
[1]国光明,洪晓光.基于日志挖掘的计算机取证系统的分析与设计[J].计算机科学,2007,34(12):299-303.
[2]WINDING R,WRIGHT T,CHAPPLE M.System Anomaly Detection:Mining Firewall Logs[C]∥Securecomm and Workshops,2006.IEEE,2006:1-5.
[3]SANDFORD P J,PARISH D J,SANDFORD J M.Detecting security threats in the network core using data mining techniques[C]∥10th IEEE/IFIP Network Operations and Management Symposium,2006(NOMS 2006).IEEE,2006:1-4.
[4]李学龙,龚海刚.大数据系统综述[J].中国科学:信息科学,2015,45(1):1-44.
[5]SHVACHKO K,KUANG H,RADIA S,et al.The hadoop distributed file system[C]∥2010 IEEE 26th symposium on Mass storage systems and technologies (MSST).IEEE,2010:1-10.
[6]孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169.
[7]DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[8]HAN J W,KAMBER M,PEI J.数据挖掘:概念与技术(3版)[M].北京:机械工业出版社,2012.
[1] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[2] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[3] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[4] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[5] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[6] 杜航原, 李铎, 王文剑.
一种面向电商网络的异常用户检测方法
Method for Abnormal Users Detection Oriented to E-commerce Network
计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092
[7] 刘卫明, 安冉, 毛伊敏.
基于聚类和WOA的并行支持向量机算法
Parallel Support Vector Machine Algorithm Based on Clustering and WOA
计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040
[8] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[9] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[10] 武玉坤, 李伟, 倪敏雅, 许志骋.
单类支持向量机融合深度自编码器的异常检测模型
Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder
计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
[11] 田冰川, 田臣, 周宇航, 陈贵海, 窦万春.
减少Hadoop集群中网络队头阻塞的调度算法
Reducing Head-of-Line Blocking on Network in Hadoop Clusters
计算机科学, 2022, 49(3): 11-22. https://doi.org/10.11896/jsjkx.210900117
[12] 冷佳旭, 谭明圮, 胡波, 高新波.
基于隐式视角转换的视频异常检测
Video Anomaly Detection Based on Implicit View Transformation
计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266
[13] 刘意, 毛莺池, 程杨堃, 高建, 王龙宝.
基于邻域一致性的异常检测序列集成方法
Locality and Consistency Based Sequential Ensemble Method for Outlier Detection
计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156
[14] 王俊, 王修来, 庞威, 赵鸿飞.
面向科技前瞻预测的大数据治理研究
Research on Big Data Governance for Science and Technology Forecast
计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207
[15] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳.
面向大数据分析的智能交互向导系统
Smart Interactive Guide System for Big Data Analytics
计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!