计算机科学 ›› 2018, Vol. 45 ›› Issue (10): 160-165.doi: 10.11896/j.issn.1002-137X.2018.10.030

• 信息安全 • 上一篇    下一篇

网络用户角色辨识及其恶意访问行为的发现方法

王建, 张仰森, 陈若愚, 蒋玉茹, 尤建清   

  1. 北京信息科技大学智能信息处理研究所 北京100101
  • 收稿日期:2017-09-09 出版日期:2018-11-05 发布日期:2018-11-05
  • 作者简介:王 建(1993-),男,硕士生,CCF会员,主要研究方向为大数据处理、自然语言处理,E-mail:455858538@qq.com;张仰森(1962-),男,博士,教授,CCF会员,主要研究方向为自然语言处理、人工智能,E-mail:zhangyangsen@163.com(通信作者);陈若愚(1982-),男,博士,讲师,主要研究方向为自然语言处理;蒋玉茹(1978-),女,博士,副教授,主要研究方向为自然语言处理;尤建清(1980-),男,硕士,讲师,主要研究方向为自然语言处理。
  • 基金资助:
    国家自然科学基金(61370139,61602044)资助

Identification of User’s Role and Discovery Method of Its Malicious Access Behavior in Web Logs

WANG Jian, ZHANG Yang-sen, CHEN Ruo-yu, JIANG Yu-ru, YOU Jian-qing   

  1. Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China
  • Received:2017-09-09 Online:2018-11-05 Published:2018-11-05

摘要: 随着互联网络技术的快速发展,各种恶意访问行为危及到网络的信息安全,因此辨识访问用户的角色并识别用户的恶意访问行为对于网络安全具有十分重要的理论意义和实用价值。首先,以网络日志数据为基础,通过建立IP辅助数据库,构建IP用户的日角色模型,在此基础上,引入滑动时间窗技术,将时间的变化动态地融入用户角色辨识,建立了基于滑动时间窗的用户角色动态辨识模型。然后,在分析用户恶意访问流量特征的基础上,将用户访问流量特征和用户信息熵特征进行加权,构建基于多特征的用户恶意访问行为的辨识模型。该模型能够对爆发性和高持续性的恶意访问行为以及少量但大规模分散访问的恶意行为进行识别。最后,采用大数据存储和Spark内存计算技术,对所建立的模型进行实现。实验结果表明,在网络流量产生异常时,所提出的模型能够发现具有恶意访问行为的用户,并准确且高效地辨别出该用户的角色,从而验证了其有效性。

关键词: 恶意访问行为, 滑动时间窗, 角色辨识, 数据挖掘, 网络用户

Abstract: With the rapid development of Internet technology,a variety of malicious access behavios endanger the information security of network.There is theoretical significance and practical value for network security to identify user’s role and discover malicious access behaviors.Based on Web logs,an IP assisted database was constructed to build IP u-ser’s daily role model.On this basis,the sliding time window technique was introduced,and the dynamic change of time was integrated into user’s role identification.A dynamic identification model of user’s role based on sliding time window was established.Then,analyzing the characteristics of user’s malicious access traffic,the user access traffic and thecharacteristicsof user’s information entropy were weighted to construct an identification model based on multi-characteristics of the user’s malicious access behavior.The model can not only identify explosive and highly persistent malicious access behaviors,but also identify the malicious access behaviors which are small but widely distributed.Finally,the model was implemented by using big data storage and Spark memory computing technology.The experimental results show thatthe user of malicious access behavior can be found by using the proposed model when the network traffic is abnormal,and the user’s role can be identified accurately and efficiently,thus verifying its validity.

Key words: Data mining, Identification of use’s role, Malicious access behavior, Sliding time window, Web users

中图分类号: 

  • TP391
[1]KEMMAR A,LEBBAH Y,LOUDNI S.A Constraint Programming Approach for Web Log Mining[J].International Journal of Information Technology and Web Engineering (IJITWE),2016,11(4):24-42.
[2]SISODIA D S,VERMA S,VYAS O P.Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors[J].Journal of Data Analysis and Information Processing,2015,3(1):1-10.[3]JOSHILA GRACE L K,MAHESWARI V,NAGAMALAI D. Analysis of Web Logs And Web User In Web Mining[J].International Journal of Network Security & Its Applications,2011,3(1):99-110.
[4]XU X F,YANG L,WANG W.Novel role analysis method for network domain users[J].Chinese Journal of Network and Information Security,2017,3(3):22-27.(in Chinese)
许小丰,杨力,王巍.新颖的网络域名用户关键角色识别方法[J].网络与信息安全学报,2017,3(3):22-27.
[5]CHEN M S,PARK J S,YU P S.Efficient data mining for path traversal patterns[J].IEEE Transactions on Knowledge and Data Engineering,1998,10(2):209-221.
[6]XU J J,CHEN H.CrimeNet explorer:a framework for criminal network knowledge discovery[J].ACM Transactions on Information Systems (TOIS),2005,23(2):201-226.
[7]GUO Y,BAI S,YANG Z F,et al.Analyzing Scale of Web Logs and Mining Users’ Interests [J].Chinese Journal ofCompu-ters,2005,28(9):1483-1496.(in Chinese)
郭岩,白硕,杨志峰,等.网络日志规模分析和用户兴趣挖掘[J].计算机学报,2005,28(9):1483-1496.
[8]XING D S,SHEN J Y,SONG Q B.Discovering Preferred Browsing Paths from Web Logs [J].Chinese Journal of Computers,2003,26(11):1518-1523.(in Chinese)
邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径[J].计算机学报,2003,26(11):1518-1523.
[9]JIN X.Web Log Mining Based-on Improved Double-Points Crossover Genetic Algorithm[J].Journal of Multimedia,2014,9(6):804-809.(in Chinese)
[10]YANG J G,WANG X T,LIU G Q.DDoS attack detection method based on network traffic and IP entropy[J].Application Research of Computers,2016,33(4):1145-1149.(in Chinese)
杨君刚,王新桐,刘故箐.基于流量和IP熵特性的DDoS攻击检测方法[J].计算机应用研究,2016,33(4):1145-1149.
[11]SAIED A,OVERILL R E,RADZIK T.Detection of known and unknown DDoS attacks using Artificial Neural Networks[J].Neurocomputing,2016,172(C):385-393.
[12]LEUNG K,LECKIE C.Unsupervised anomaly detection in network intrusion detection using clusters[C]∥Proceedings of Australasian Computer Science Conference.Australia,2005.333-342.
[13]RUBINSTEIN B,NELSON B,HUANG L,et al.Stealthy poisoning attacks on PCA-based anomaly detectors[J].Acm Sigmetrics Performance Evaluation Review,2009,37(2):73-74.
[14]LI Q,CHI L J,ZHANG Z X.A Novel Approach to Simulate DDoS Attack[J].International Journal of Wireless and Microwave Technologies(IJWMT),2011,1(2):33-40.
[15]SUN Z X,LI Q D.Defending DDos Attacks Based on the Source and Destination IP Address Database [J].Journal of Software,2007,18(10):2613-2623.(in Chinese)
孙知信,李清东.基于源目的IP地址对数据库的防范DDos攻击策略[J].软件学报,2007,18(10):2613-2623.
[16]GUI B X,ZHOU K,ZHOU W L.An IP Traceback Model Based Traffic Entropy Variations for DDoS Attacks[J].Journal of Chinese Computer Systems,2013,34(7):1607-1609.(in Chinese)
桂兵祥,周康,周万雷.通信流熵变量DDoS攻击IP回溯跟踪模型[J].小型微型计算机系统,2013,34(7):1607-1609.
[17]LI Q,SHEN T,GUAN Y.Research on Clustering Algorithm for Large Data Sets[J].Intelligent Computer and Applications,2012,2(5):42-45.(in Chinese)
李清,沈彤,关毅.面向大规模日志数据的聚类算法研究[J].智能计算机与应用,2012,2(5):42-45.
[18]ZHAO L.The Design and Implementation of Massive Search Logs Analysis Platform Based on Hadoop[D].Dalian:Dalian University of Technology,2013.(in Chinese)
赵龙.基于Hadoop的海量搜索日志分析平台的设计和实现[D].大连:大连理工大学,2013.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[3] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[4] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[5] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[6] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[7] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[8] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[9] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[10] 刘新斌, 王丽珍, 周丽华.
MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法
MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution
计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
[11] 刘晓楠, 宋慧超, 王洪, 江舵, 安家乐.
Grover算法改进与应用综述
Survey on Improvement and Application of Grover Algorithm
计算机科学, 2021, 48(10): 315-323. https://doi.org/10.11896/jsjkx.201100141
[12] 张煜, 陆亿红, 黄德才.
基于密度峰值的加权犹豫模糊聚类算法
Weighted Hesitant Fuzzy Clustering Based on Density Peaks
计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043
[13] 游兰, 韩雪薇, 何正伟, 肖丝雨, 何渡, 潘筱萌.
基于改进Seq2Seq的短时AIS轨迹序列预测模型
Improved Sequence-to-Sequence Model for Short-term Vessel Trajectory Prediction Using AIS Data Streams
计算机科学, 2020, 47(9): 169-174. https://doi.org/10.11896/jsjkx.190800060
[14] 袁得嵛, 章逸钒, 高见, 孙海春.
基于用户特征提取的新浪微博异常用户检测方法
Abnormal User Detection Method in Sina Weibo Based on User Feature Extraction
计算机科学, 2020, 47(6A): 364-368. https://doi.org/10.11896/JsJkx.190700008
[15] 张素梅, 张波涛.
一种基于量子耗散粒子群的评估模型构建方法
Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization
计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!