Computer Science ›› 2019, Vol. 46 ›› Issue (9): 176-183.doi: 10.11896/j.issn.1002-137X.2019.09.025

• Information Security • Previous Articles     Next Articles

Web Log Analysis Method Based on Storm Real-time Streaming Computing Framework

YANG Li-peng, ZHANG Yang-sen, ZHANG Wen, WANG Jian, ZENG Jian-rong   

  1. (Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China)
  • Received:2018-07-04 Online:2019-09-15 Published:2019-09-02

Abstract: With the rapid development of the Internet,the network log data in the Internet show explosive growth,and the network log contains a wealth of network security information.By analyzing network log,this paper proposed an attack IP recognition model based on access behavior and network relationship and an IP real person attribute decision model based on sliding time window.Based on the Storm real-time flow computing framework,the proposed model was implemented in order to construct a real-time computing and analysis platform for distributed network logs,and a solution to the technical problems encountered in the implementation process was given.Through the analysis and calculation of the constructed model through real data,the results show that the accuracy of the constructed attack IP identification model is 98%,the accuracy rate of the IP real property judgment model reaches 96%,and the constructed distributed network log real-time computing and analyzing platform can effectively and timely monitor network security and timely identify potential security risks in the network.

Key words: Storm, IP real rate, Attack IP identification, Distributed network log analysis platform

CLC Number: 

  • TP391
[1]WANG X L,CHEN M,XING C Y,et al.A software-defined security network mechanism for defending against DDoS attacks[J].Journal of Software,2016,27(12):3104-3119.(in Chinese)王秀磊,陈鸣,邢长友,等.一种防御DDoS攻击的软件定义安全网络机制[J].软件学报,2016,27(12):3104-3119.
[2]CAO X,FEI J L,ZHU Y F.Anti-identification model of operating system based on network spoofing[J].Computer Application,2016,36(3):661-664.(in Chinese)曹旭,费金龙,祝跃飞.基于网络欺骗的操作系统抗识别模型[J].计算机应用,2016,36(3):661-664.
[3]MAZZARIELLO C,BIFULCO R,CANONICO R.Integrating a network IDS into an open source Cloud Computing environment[C]//2010 Sixth International Conference on Information Assurance and Security (IAS).IEEE,2010.
[4]WU J.Research on Network User Access Pattern Mining Algorithm[J].Computer Engineering and Applications,2016,52(5):61-64.(in Chinese)武健.网络用户访问模式挖掘算法研究[J].计算机工程与应用,2016,52(5):61-64.
[5]ZHENG B H,JUN S,SHIROCHIN V P.An Intelligent Lightweight Intrusion Detection System with Forensics Technique[C]//IEEE Workshop on Intelligent Data Acquisition & Advanced Computing Systems:Technology & Applications.IEEE Xplore,2007.
[6]XU K Y,GONG X R,CHENG M C.Audit Log AssociationRule Mining Based on Improved Apriori Algorithm[J].ComputerApplication,2016,36(7):1847-1851.(in Chinese)徐开勇,龚雪容,成茂才.基于改进Apriori算法的审计日志关联规则挖掘[J].计算机应用,2016,36(7):1847-1851.
[7]HAN H,LU X L,REN L Y.Using data mining to discover sig-natures in network-based intrusion detection[C]//2002 International Conference on Machine Learning and Cybernetics.IEEE,2002.
[8]SUN X B,SHI F D.Research and Optimization of Apriori Algorithm Based on Hadoop[J].Computer Engineering and Design,2018,39(1):126-133.(in Chinese)孙学波,石飞达.基于Hadoop的Apriori算法研究与优化[J].计算机工程与设计,2018,39(1):126-133.
[9]HU Y P,DING W L,WANG G L.A monitoring and scheduling service for heterogeneous big data computing framework[J].Computer Science,2018,45(6):73-77,101.(in Chinese)胡雅鹏,丁维龙,王桂玲.一种面向异构大数据计算框架的监控及调度服务[J].计算机科学,2018,45(6):73-77,101.
[10]WANG G,KOSHY J,SUBRAMANIAN S,et al.Building a replicated logging system with Apache Kafka[J].Proceedings of the Vldb Endowment,2015,8(12):1654-1655.
[11]CHEN Y,ZHU N,SHI Y.Online analytic processing of big data based on Hive[J].Computer Era,2018(1):1-3.
[12]ZHENG K,WANG X.Feature selection method with joint maximal information entropy between features and class[J].Pattern Recognition,2018,77:20-29.
[13]GAO J X.NAT recognition method based on Network Traffic Features[D].Chengdu:University of Electronic Science and Technology of China,2012.(in Chinese)高骥翔.基于网络流量特征的NAT识别方法[D].成都:电子科技大学,2012.
[14]WANG C K,MENG X F.Research on Distributed Data FlowRelational Query Technology[J].Journal of Computer,2016,39(1):80-96.(in Chinese)王春凯,孟小峰.分布式数据流关系查询技术研究[J].计算机学报,2016,39(1):80-96.
[15]WANG Y,WANG C.A reliable Consumer Design Scheme based on Kafka[J].Software,2016,37(1):61-66.(in Chinese)王岩,王纯.一种基于Kafka的可靠的Consumer的设计方案[J].软件,2016,37(1):61-66.
[16]CARDELLINI V,GRASSI V,PRESTI F L,et al.Distributed QoS-aware scheduling in storm[C]//ACM International Conference on Distributed Event-Based Systems.ACM,2015:344-347.
[17]GHADERI J,SHAKKOTTAI S,SRIKANT R.Scheduling Stor-ms and Streams in the Cloud[C]//ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.ACM,2015:439-440.
[18]JIANG Y,LUO Y H,ZHU H W.Topology-based task scheduling strategy under Storm cluster[J].Computer Engineering and Applications,2018,54(7):84-88,95.(in Chinese)蒋溢,罗宇豪,朱恒伟.Storm集群下一种基于Topology的任务调度策略[J].计算机工程与应用,2018,54(7):84-88,95.
[1] ZHANG Zhou, HUANG Guo-rui, JIN Pei-quan. Task Scheduling on Storm:Current Situations and Research Prospects [J]. Computer Science, 2019, 46(9): 28-35.
[2] LIU Jing-fa, LI Fan, JIANG Sheng-yi. Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information [J]. Computer Science, 2019, 46(2): 215-222.
[3] ZHOU Wen, SHI Xue-fei, WU Yi-jian, ZHAO Wen-yun. Framework Assisting Storm Application Development Driven by Data Requirements [J]. Computer Science, 2018, 45(9): 81-88.
[4] WANG Jin-ming and WANG Yuan-fang. Parallel Mining of Densest Subgraph Based on Twitter Storm [J]. Computer Science, 2014, 41(1): 274-278.
Full text



[1] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[2] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111, 142 .
[3] JIN Rui, LIU Zuo-xue. Synchronization Protocol of TDMA Ad hoc Network Based on Time Slot Alignment[J]. Computer Science, 2018, 45(6): 84 -88,110 .
[4] WANG Yan ,XU Xian-fa. Image Segmentation Based on Saliency and Pulse Coupled Neural Network[J]. Computer Science, 2018, 45(7): 259 -263 .
[5] SI Nian-wen, WANG Heng-jun, LI Wei, SHAN Yi-dong and XIE Peng-cheng. Chinese Part-of-speech Tagging Model Using Attention-based LSTM[J]. Computer Science, 2018, 45(4): 66 -70, 82 .
[6] LI Yi, CAI Tian-xun, WU Wen-yuan. Termination Analysis of Linear Assignment Loop Program Based on k-ranking Functions[J]. Computer Science, 2018, 45(6): 151 -155 .
[7] ZHOU Feng, LI Rong-yu. Convolutional Neural Network Model for Text Classification Based on BGRU Pooling[J]. Computer Science, 2018, 45(6): 235 -240 .
[8] XU Pu-le, WANG Yang, HUANG Ya-kun, HUANG Shao-fen, ZHAO Chuan-xin and CHEN Fu-long. Chinese Place-name Address Matching Method Based on Large Data Analysis and Bayesian Decision[J]. Computer Science, 2017, 44(9): 266 -271 .
[9] JIANG Su-rong,LAN Jiang-qiao and YANG Yu-hai. Timeout Prediction of the Schedule Method for Big Data of the Intelligence Analysis Based on Hadoop[J]. Computer Science, 2014, 41(Z6): 409 -413 .
[10] LIU Jing-hua, CHEN Jing. Whole Frame Loss Concealment Method for Stereo Video Basedon Disparity Consistence[J]. Computer Science, 2018, 45(6): 270 -274,307 .