计算机科学 ›› 2019, Vol. 46 ›› Issue (9): 176-183.doi: 10.11896/j.issn.1002-137X.2019.09.025

• 信息安全 • 上一篇    下一篇

基于Storm实时流式计算框架的网络日志分析方法

杨立鹏, 张仰森, 张雯, 王建, 曾健荣   

  1. (北京信息科技大学智能信息处理研究所 北京100101)
  • 收稿日期:2018-07-04 出版日期:2019-09-15 发布日期:2019-09-02
  • 通讯作者: 张仰森(1962-),男,博士后,教授,CCF高级会员,主要研究方向为中文信息处理、大数据处理,E-mail:zhangyangsen@163.com
  • 作者简介:杨立鹏(1991-),男,硕士生,主要研究方向为大数据处理;张 雯(1997-),女,硕士生,主要研究方向为大数据处理;王 建(1993-),男,硕士生,主要研究方向为大数据处理、事件检测;曾健荣(1993-),男,硕士生,主要研究方向为大数据处理。
  • 基金资助:
    国家自然科学基金(61772081)

Web Log Analysis Method Based on Storm Real-time Streaming Computing Framework

YANG Li-peng, ZHANG Yang-sen, ZHANG Wen, WANG Jian, ZENG Jian-rong   

  1. (Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China)
  • Received:2018-07-04 Online:2019-09-15 Published:2019-09-02

摘要: 随着互联网的飞速发展,网络日志数据呈现爆炸式增长,网络日志蕴含着丰富的网络安全信息。通过对网络日志进行分析,提出了基于访问行为和网络关系的攻击IP识别模型和基于滑动时间窗口的IP真人属性判定模型。基于Storm实时流式计算框架,对所提模型进行算法实现,以构建分布式网络日志实时计算与分析平台,并对实现过程中遇到的技术问题给出了解决方案。通过真实数据对所构建的模型进行分析计算,结果表明,所构建的攻击IP识别模型的标注准确率达到98%,IP真人属性判定模型的标注准确率达到96%;构建的分布式网络日志实时计算与分析平台能够有效、实时地监控网络安全,并及时识别网络中存在的安全隐患。

关键词: IP真人率, Storm, 分布式网络日志分析平台, 攻击IP识别

Abstract: With the rapid development of the Internet,the network log data in the Internet show explosive growth,and the network log contains a wealth of network security information.By analyzing network log,this paper proposed an attack IP recognition model based on access behavior and network relationship and an IP real person attribute decision model based on sliding time window.Based on the Storm real-time flow computing framework,the proposed model was implemented in order to construct a real-time computing and analysis platform for distributed network logs,and a solution to the technical problems encountered in the implementation process was given.Through the analysis and calculation of the constructed model through real data,the results show that the accuracy of the constructed attack IP identification model is 98%,the accuracy rate of the IP real property judgment model reaches 96%,and the constructed distributed network log real-time computing and analyzing platform can effectively and timely monitor network security and timely identify potential security risks in the network.

Key words: Attack IP identification, Distributed network log analysis platform, IP real rate, Storm

中图分类号: 

  • TP391
[1]WANG X L,CHEN M,XING C Y,et al.A software-defined security network mechanism for defending against DDoS attacks[J].Journal of Software,2016,27(12):3104-3119.(in Chinese)王秀磊,陈鸣,邢长友,等.一种防御DDoS攻击的软件定义安全网络机制[J].软件学报,2016,27(12):3104-3119.
[2]CAO X,FEI J L,ZHU Y F.Anti-identification model of operating system based on network spoofing[J].Computer Application,2016,36(3):661-664.(in Chinese)曹旭,费金龙,祝跃飞.基于网络欺骗的操作系统抗识别模型[J].计算机应用,2016,36(3):661-664.
[3]MAZZARIELLO C,BIFULCO R,CANONICO R.Integrating a network IDS into an open source Cloud Computing environment[C]//2010 Sixth International Conference on Information Assurance and Security (IAS).IEEE,2010.
[4]WU J.Research on Network User Access Pattern Mining Algorithm[J].Computer Engineering and Applications,2016,52(5):61-64.(in Chinese)武健.网络用户访问模式挖掘算法研究[J].计算机工程与应用,2016,52(5):61-64.
[5]ZHENG B H,JUN S,SHIROCHIN V P.An Intelligent Lightweight Intrusion Detection System with Forensics Technique[C]//IEEE Workshop on Intelligent Data Acquisition & Advanced Computing Systems:Technology & Applications.IEEE Xplore,2007.
[6]XU K Y,GONG X R,CHENG M C.Audit Log AssociationRule Mining Based on Improved Apriori Algorithm[J].ComputerApplication,2016,36(7):1847-1851.(in Chinese)徐开勇,龚雪容,成茂才.基于改进Apriori算法的审计日志关联规则挖掘[J].计算机应用,2016,36(7):1847-1851.
[7]HAN H,LU X L,REN L Y.Using data mining to discover sig-natures in network-based intrusion detection[C]//2002 International Conference on Machine Learning and Cybernetics.IEEE,2002.
[8]SUN X B,SHI F D.Research and Optimization of Apriori Algorithm Based on Hadoop[J].Computer Engineering and Design,2018,39(1):126-133.(in Chinese)孙学波,石飞达.基于Hadoop的Apriori算法研究与优化[J].计算机工程与设计,2018,39(1):126-133.
[9]HU Y P,DING W L,WANG G L.A monitoring and scheduling service for heterogeneous big data computing framework[J].Computer Science,2018,45(6):73-77,101.(in Chinese)胡雅鹏,丁维龙,王桂玲.一种面向异构大数据计算框架的监控及调度服务[J].计算机科学,2018,45(6):73-77,101.
[10]WANG G,KOSHY J,SUBRAMANIAN S,et al.Building a replicated logging system with Apache Kafka[J].Proceedings of the Vldb Endowment,2015,8(12):1654-1655.
[11]CHEN Y,ZHU N,SHI Y.Online analytic processing of big data based on Hive[J].Computer Era,2018(1):1-3.
[12]ZHENG K,WANG X.Feature selection method with joint maximal information entropy between features and class[J].Pattern Recognition,2018,77:20-29.
[13]GAO J X.NAT recognition method based on Network Traffic Features[D].Chengdu:University of Electronic Science and Technology of China,2012.(in Chinese)高骥翔.基于网络流量特征的NAT识别方法[D].成都:电子科技大学,2012.
[14]WANG C K,MENG X F.Research on Distributed Data FlowRelational Query Technology[J].Journal of Computer,2016,39(1):80-96.(in Chinese)王春凯,孟小峰.分布式数据流关系查询技术研究[J].计算机学报,2016,39(1):80-96.
[15]WANG Y,WANG C.A reliable Consumer Design Scheme based on Kafka[J].Software,2016,37(1):61-66.(in Chinese)王岩,王纯.一种基于Kafka的可靠的Consumer的设计方案[J].软件,2016,37(1):61-66.
[16]CARDELLINI V,GRASSI V,PRESTI F L,et al.Distributed QoS-aware scheduling in storm[C]//ACM International Conference on Distributed Event-Based Systems.ACM,2015:344-347.
[17]GHADERI J,SHAKKOTTAI S,SRIKANT R.Scheduling Stor-ms and Streams in the Cloud[C]//ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.ACM,2015:439-440.
[18]JIANG Y,LUO Y H,ZHU H W.Topology-based task scheduling strategy under Storm cluster[J].Computer Engineering and Applications,2018,54(7):84-88,95.(in Chinese)蒋溢,罗宇豪,朱恒伟.Storm集群下一种基于Topology的任务调度策略[J].计算机工程与应用,2018,54(7):84-88,95.
[1] 简琤峰, 平靖, 张美玉.
面向边缘计算的Storm边缘节点调度优化方法
Edge Computing-oriented Storm Edge Node Scheduling Optimization Method
计算机科学, 2020, 47(5): 277-283. https://doi.org/10.11896/jsjkx.190600048
[2] 赵鑫, 马再超, 刘英博, 丁雨亭, 魏慕恒.
基于Apache Storm的增量式FFT及其应用
Incremental FFT Based on Apache Storm and Its Application
计算机科学, 2020, 47(11A): 504-507. https://doi.org/10.11896/jsjkx.191000086
[3] 张洲, 黄国锐, 金培权.
基于Storm的任务调度:现状与研究展望
Task Scheduling on Storm:Current Situations and Research Prospects
计算机科学, 2019, 46(9): 28-35. https://doi.org/10.11896/j.issn.1002-137X.2019.09.004
[4] 梁奎奎.
一种基于Storm平台的ETL方案实现
Implementation of ETL Scheme Based on Storm Platform
计算机科学, 2019, 46(11A): 208-211.
[5] 周雯, 史雪菲, 吴毅坚, 赵文耘.
数据需求驱动的Storm应用辅助开发框架
Framework Assisting Storm Application Development Driven by Data Requirements
计算机科学, 2018, 45(9): 81-88. https://doi.org/10.11896/j.issn.1002-137X.2018.09.012
[6] 王金明,王远方.
基于Twitter Storm平台并行挖掘最稠密子图
Parallel Mining of Densest Subgraph Based on Twitter Storm
计算机科学, 2014, 41(1): 274-278.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!