基于HMM-NN的用户点击流识别

doi:10.11896/jsjkx.210600127

摘要/Abstract

摘要： 用户行为画像分析是实现网络智能化的关键手段之一,而点击目标识别是构建用户行为画像的重要依据和基础。已有的工作主要为系统端设计,其只能反映用户对特定服务域的行为特征,不适合网络端的检测和管理。网络端用户行为分析面临的主要挑战是处于协议栈底层的网络管道无法获取应用层及系统端信息,只能依赖IP数据流,因此难以构建有效的网络端用户行为画像。因此,提出了一种新的面向中间网络的用户点击目标识别方法,该方法融合了隐马尔可夫模型(Hidden Markov Model,HMM)和神经网络(Neural Networks,NN)。HMM框架从IP流的角度描述点击流与非点击流的动态行为过程;NN用于建立HMM中的隐状态与复杂网络流行为特征之间的关系。通过评估待测请求序列与HMM-NN模型的拟合度来实现用户点击目标的识别。该方案的主要优点在于它继承了HMM的可解析性,并利用NN增强了HMM对复杂数据的描述能力;而且该方案不涉及IP流所承载的数据内容,适用于加密与非加密场景下网络端的点击行为识别,有效解决了网络端用户行为画像分析所面临的困难。基于多个实际数据集进行实验,结果表明该方案的3个常用评价指标F1,Kappa及AUC的数值分别超过已有方法0.91,0.83,0.96,证明该方法比已有的方法具有更好的性能表现。

关键词: 点击识别, 神经网络, 网络端, 隐马尔可夫模型

Abstract: User behavior profile analysis is one of the key means to realize network intelligence,while click-object recognition is an important basis and foundation for constructing user behavior profile.Most existing works are mainly designed for the system-side,and their limitation is that they can only reflect the behavior characteristics of users in a specific service domain and are not suitable for the network-side detection and management.The main challenge for network-side user behavior analysis is that the network channel at the bottom of protocol stack cannot obtain the information of both application-layer and system-side,and can only rely on IP data flows,which makes it difficult to build an effective network-side user behavior profile.In this paper,a new method of user click-object recognition for intermediate network is proposed.The proposed method combines hidden Markov model(HMM) and neural networks(NN).The HMM framework describes the dynamic behavior of click streams and non-click streams from the perspective of IP flows,while NN is used to establish the relationship between the hidden states of HMMs and complex network behavior characteristics.The attribute of a request sequence is determined by the fitting degree between the sequence and the behavior models.The main advantages of this scheme are that it inherits the parse ability of HMM,and enhances the ability of HMM to describe complex data by the embedding NN.The proposed scheme does not involve the data content carried by IP flows,which makes it suitable for click behavior recognition in network-side encryption and non-encryption scenarios,and effectively solve the challenges faced by network-side user behavior profile analysis.Experimental results based on multiple actual data sets show that the three commonly used evaluation indicators F1,Kappa and AUC exceed 0.91,0.83 and 0.96 respectively.These results indicate that the performance of the proposed scheme is better than that of existing methods.

Key words: Click streams recognition, HMM, Network side, NN

中图分类号:

TP393

费星瑞, 谢逸. 基于HMM-NN的用户点击流识别[J]. 计算机科学, 2022, 49(7): 340-349. https://doi.org/10.11896/jsjkx.210600127

FEI Xing-rui, XIE Yi. Click Streams Recognition for Web Users Based on HMM-NN[J]. Computer Science, 2022, 49(7): 340-349. https://doi.org/10.11896/jsjkx.210600127

参考文献

[1]NAJAFABADI M M,KHOSHGOFTAAR T M,CALVERT C,et al.User behavior anomaly detection for application layer DDoS attacks[C]//2017 IEEE International Conference on Information Reuse and Integration(IRI).IEEE,2017:154-161.
[2]LI H,LI H,ZHANG S,et al.Intelligent learning system based on personalized recommendation technology[J].Neural Computing and Applications,2019,31(9):4455-4462.
[3]FU Y Q,LI D S.Application driven network latency measurement analysis and optimization techniques edge computing environment:A survey[J].Journal of Computer Research and Development,2018,55(3):512-523.
[4]SPINK A,KOSHMAN S,PARK M,et al.Multitasking websearch on vivisimo.com[C]//International Conference on Information Technology:Coding and Computing(ITCĆ05)-Volume II.IEEE,2005:486-490.
[5]FALLAH M,ZARIFZADEH S.Practical Detection of ClickSpams Using Efficient Classification-Based Algorithms[J].International Journal of Information and Communication Techno-logy Research,2018,10(2):63-71.
[6]RAFTER R,SMYTH B.Passive profiling from server logs inan online recruitment environment[C]//Workshop on Intelligent Techniques for Web Personalization at the 17th International Joint Conference on Artificial Intelligence,Seattle,Washington,USA,August,2001.2001.
[7]BENEVENUTO F,RODRIGUES T,CHA M,et al.Characterizing user behavior in online social networks[C]//Proceedings of the 9th ACM SIGCOMM conference on Internet measurement.ACM,2009:49-62.
[8]ATTERER R,WNUK M,SCHMIDT A.Knowing the user'severy move:user activity tracking for website usability evaluation and implicit interaction[C]//Proceedings of the 15th International Conference on World Wide Web.ACM,2006:203-212.
[9]KAMMENHUBER N,LUXENBURGER J,FELDMANN A,et al.Web search clickstreams[C]//Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement.ACM,2006:245-250.
[10]HUANG J,WHITE R W.Parallel browsing behavior on theweb[C]//Proceedings of the 21st ACM conference on Hypertext and Hypermedia.ACM,2010:13-18.
[11]LIU Z,MAO J,WANG C,et al.Enhancing click models with mouse movement information[J].Information Retrieval Journal,2017,20(1):53-80.
[12]ZHANG M,MENG W,LEE S,et al.All your clicks belong to me:investigating click interception on the web[C]//The 28th USENIX Security Symposium(USENIX Security 19).2019:941-957.
[13]COOLEY R,MOBASHER B,SRIVASTAVA J.Data preparation for mining world wide web browsing patterns[J].Know-ledge and Information Systems,1999,1(1):5-32.
[14]HUIYING Z,WEI L.An intelligent algorithm of data pre-processing in Web usage mining[C]//The fifth World Congress on Intelligent Control and Automation(IEEE Cat.No.04EX788).IEEE,2004:3119-3123.
[15]CHITRAA V,THANAMANI A S.A novel technique for sessions identification in web usage mining preprocessing[J].International Journal of Computer Applications,2011,34(9):23-27.
[16]ANAND S,AGGARWAL R R.An efficient algorithm for data cleaning of log file using file extensions[J].International Journal of Computer Applications,2012,48(8):13-18.
[17]SCHNEIDER F,FELDMANN A,KRISHNAMURTHY B,et al.Understanding online social network usage from a network perspective[C]//Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement.ACM,2009:35-48.
[18]XIE G,ILIOFOTOU M,KARAGIANNIS T,et al.Resurf:Reconstructing web-surfing activity from network traffic[C]//2013 IFIP Networking Conference.IEEE,2013:1-9.
[19]HOUIDI Z B,SCAVO G,GHAMRI-DOUDANE S,et al.Goldmining in a river of internet content traffic[C]//International Workshop on Traffic Monitoring and Analysis.Berlin:Springer,2014:91-103.
[20]LIN X,LIU F,LIU J.Real-time user-click recognition based on spark streaming[C]//The 3rd IEEE International Conference on Computer and Communications(ICCC).IEEE,2017:2532-2536.
[21]VASSIO L,DRAGO I,MELLIA M.Detecting user actions from HTTP traces:Toward an automatic approach[C]//International Wireless Communications and Mobile Computing Conference(IWCMC).IEEE,2016:50-55.
[22]MANSOORI M,HIROSE Y,WELCH I,et al.Empirical analysis of impact of HTTP referer on malicious website behaviour and delivery[C]//The IEEE 30th International Conference on Advanced Information Networking and Applications(AINA).IEEE,2016:941-948.
[23]RIZOTHANASIS G,CARLSSON N,MAHANTI A.Identifying user actions from HTTP(S) traffic[C]//The IEEE 41st Conference on Local Computer Networks(LCN).IEEE,2016:555-558.
[24]BILMES J A.A Gentle Tutorial of the EM Algorithm and itsApplication to Parameter Estimation for Gaussian Mixture and Hidden Markov Models[R].U.C Berkeley:Technical Report,TR-97-021.International computer science institute and Department of Electrical Engineering and computer Science,1998.
[25]RABINER L R.A tutorial on hidden Markov models and selec-ted applications in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286.
[26]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[27]ZOU Q,XIE S,LIN Z,et al.Finding the best classificationthreshold in imbalanced classification[J].Big Data Research,2016,5:2-8.
[28]MENG J,ZHANG J,JIANG D L,et al.Selective ensemble classification integrated with affinity propagation clustering[J].Journal of Computer Research and Development,2018,55(5):986-993.
[29]LI J,YUN X C,LI S H,et al.HTTP malicious traffic detection method based on hybrid structure deep neural network[J].Journal on Communications,2019,40(1):24-33.
[30]PENG L,ZHANG H,YANG B,et al.Early stage internet traffic identification using data gravitation based classification[C]//The IEEE 14th Intl. Conf. on Dependable,Autonomic and Secure Computing,14th Intl. Conf. on Pervasive Intelligence and Computing,2nd Intl. Conf. on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).IEEE,2016:504-511.
[31]WYLIE C R,BARRETT L C,WYLIE C R.Advanced engineering mathematics[M].New York:McGraw-Hill,1960.
[32]CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794.

相关文章 15

[1]	宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[2]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[5]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6]	王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074
[7]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[8]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[10]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11]	齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[12]	杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[13]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[14]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[15]	刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed