Computer Science ›› 2019, Vol. 46 ›› Issue (9): 176-183.doi: 10.11896/j.issn.1002-137X.2019.09.025

• Information Security • Previous Articles     Next Articles

Web Log Analysis Method Based on Storm Real-time Streaming Computing Framework

YANG Li-peng, ZHANG Yang-sen, ZHANG Wen, WANG Jian, ZENG Jian-rong   

  1. (Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China)
  • Received:2018-07-04 Online:2019-09-15 Published:2019-09-02

Abstract: With the rapid development of the Internet,the network log data in the Internet show explosive growth,and the network log contains a wealth of network security information.By analyzing network log,this paper proposed an attack IP recognition model based on access behavior and network relationship and an IP real person attribute decision model based on sliding time window.Based on the Storm real-time flow computing framework,the proposed model was implemented in order to construct a real-time computing and analysis platform for distributed network logs,and a solution to the technical problems encountered in the implementation process was given.Through the analysis and calculation of the constructed model through real data,the results show that the accuracy of the constructed attack IP identification model is 98%,the accuracy rate of the IP real property judgment model reaches 96%,and the constructed distributed network log real-time computing and analyzing platform can effectively and timely monitor network security and timely identify potential security risks in the network.

Key words: Storm, IP real rate, Attack IP identification, Distributed network log analysis platform

CLC Number: 

  • TP391
[1]WANG X L,CHEN M,XING C Y,et al.A software-defined security network mechanism for defending against DDoS attacks[J].Journal of Software,2016,27(12):3104-3119.(in Chinese)王秀磊,陈鸣,邢长友,等.一种防御DDoS攻击的软件定义安全网络机制[J].软件学报,2016,27(12):3104-3119.
[2]CAO X,FEI J L,ZHU Y F.Anti-identification model of operating system based on network spoofing[J].Computer Application,2016,36(3):661-664.(in Chinese)曹旭,费金龙,祝跃飞.基于网络欺骗的操作系统抗识别模型[J].计算机应用,2016,36(3):661-664.
[3]MAZZARIELLO C,BIFULCO R,CANONICO R.Integrating a network IDS into an open source Cloud Computing environment[C]//2010 Sixth International Conference on Information Assurance and Security (IAS).IEEE,2010.
[4]WU J.Research on Network User Access Pattern Mining Algorithm[J].Computer Engineering and Applications,2016,52(5):61-64.(in Chinese)武健.网络用户访问模式挖掘算法研究[J].计算机工程与应用,2016,52(5):61-64.
[5]ZHENG B H,JUN S,SHIROCHIN V P.An Intelligent Lightweight Intrusion Detection System with Forensics Technique[C]//IEEE Workshop on Intelligent Data Acquisition & Advanced Computing Systems:Technology & Applications.IEEE Xplore,2007.
[6]XU K Y,GONG X R,CHENG M C.Audit Log AssociationRule Mining Based on Improved Apriori Algorithm[J].ComputerApplication,2016,36(7):1847-1851.(in Chinese)徐开勇,龚雪容,成茂才.基于改进Apriori算法的审计日志关联规则挖掘[J].计算机应用,2016,36(7):1847-1851.
[7]HAN H,LU X L,REN L Y.Using data mining to discover sig-natures in network-based intrusion detection[C]//2002 International Conference on Machine Learning and Cybernetics.IEEE,2002.
[8]SUN X B,SHI F D.Research and Optimization of Apriori Algorithm Based on Hadoop[J].Computer Engineering and Design,2018,39(1):126-133.(in Chinese)孙学波,石飞达.基于Hadoop的Apriori算法研究与优化[J].计算机工程与设计,2018,39(1):126-133.
[9]HU Y P,DING W L,WANG G L.A monitoring and scheduling service for heterogeneous big data computing framework[J].Computer Science,2018,45(6):73-77,101.(in Chinese)胡雅鹏,丁维龙,王桂玲.一种面向异构大数据计算框架的监控及调度服务[J].计算机科学,2018,45(6):73-77,101.
[10]WANG G,KOSHY J,SUBRAMANIAN S,et al.Building a replicated logging system with Apache Kafka[J].Proceedings of the Vldb Endowment,2015,8(12):1654-1655.
[11]CHEN Y,ZHU N,SHI Y.Online analytic processing of big data based on Hive[J].Computer Era,2018(1):1-3.
[12]ZHENG K,WANG X.Feature selection method with joint maximal information entropy between features and class[J].Pattern Recognition,2018,77:20-29.
[13]GAO J X.NAT recognition method based on Network Traffic Features[D].Chengdu:University of Electronic Science and Technology of China,2012.(in Chinese)高骥翔.基于网络流量特征的NAT识别方法[D].成都:电子科技大学,2012.
[14]WANG C K,MENG X F.Research on Distributed Data FlowRelational Query Technology[J].Journal of Computer,2016,39(1):80-96.(in Chinese)王春凯,孟小峰.分布式数据流关系查询技术研究[J].计算机学报,2016,39(1):80-96.
[15]WANG Y,WANG C.A reliable Consumer Design Scheme based on Kafka[J].Software,2016,37(1):61-66.(in Chinese)王岩,王纯.一种基于Kafka的可靠的Consumer的设计方案[J].软件,2016,37(1):61-66.
[16]CARDELLINI V,GRASSI V,PRESTI F L,et al.Distributed QoS-aware scheduling in storm[C]//ACM International Conference on Distributed Event-Based Systems.ACM,2015:344-347.
[17]GHADERI J,SHAKKOTTAI S,SRIKANT R.Scheduling Stor-ms and Streams in the Cloud[C]//ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.ACM,2015:439-440.
[18]JIANG Y,LUO Y H,ZHU H W.Topology-based task scheduling strategy under Storm cluster[J].Computer Engineering and Applications,2018,54(7):84-88,95.(in Chinese)蒋溢,罗宇豪,朱恒伟.Storm集群下一种基于Topology的任务调度策略[J].计算机工程与应用,2018,54(7):84-88,95.
[1] JIAN Cheng-feng, PING Jing, ZHANG Mei-yu. Edge Computing-oriented Storm Edge Node Scheduling Optimization Method [J]. Computer Science, 2020, 47(5): 277-283.
[2] ZHAO Xin, MA Zai-chao, LIU Ying-bo, DING Yu-ting, WEI Mu-heng. Incremental FFT Based on Apache Storm and Its Application [J]. Computer Science, 2020, 47(11A): 504-507.
[3] ZHANG Zhou, HUANG Guo-rui, JIN Pei-quan. Task Scheduling on Storm:Current Situations and Research Prospects [J]. Computer Science, 2019, 46(9): 28-35.
[4] LIU Jing-fa, LI Fan, JIANG Sheng-yi. Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information [J]. Computer Science, 2019, 46(2): 215-222.
[5] LIANG Kui-kui. Implementation of ETL Scheme Based on Storm Platform [J]. Computer Science, 2019, 46(11A): 208-211.
[6] ZHOU Wen, SHI Xue-fei, WU Yi-jian, ZHAO Wen-yun. Framework Assisting Storm Application Development Driven by Data Requirements [J]. Computer Science, 2018, 45(9): 81-88.
[7] WANG Jin-ming and WANG Yuan-fang. Parallel Mining of Densest Subgraph Based on Twitter Storm [J]. Computer Science, 2014, 41(1): 274-278.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .