Computer Science ›› 2022, Vol. 49 ›› Issue (7): 340-349.doi: 10.11896/jsjkx.210600127

• Information Security • Previous Articles     Next Articles

Click Streams Recognition for Web Users Based on HMM-NN

FEI Xing-rui, XIE Yi   

  1. Guangdong Province Key Laboratory of Information Security Technology,School of Computer Science and Engineering ,Sun Yat-senUniversity,Guangzhou 510006,China
  • Received:2021-06-16 Revised:2021-10-18 Online:2022-07-15 Published:2022-07-12
  • About author:FEI Xing-rui,born in 1993,postgra-duate.His main research interest is cyber security.
    XIE Yi,born in 1973,Ph.D,associate professor.His main research interests include networking,network security,behavior modeling and algorithms.
  • Supported by:
    National Natural Science Foundation of China(61972431),Natural Science Foundation of Guangdong Province,China(2018A030313303) and Science and Technology Development Foundation Project of Ministry of Education(2018A06002).

Abstract: User behavior profile analysis is one of the key means to realize network intelligence,while click-object recognition is an important basis and foundation for constructing user behavior profile.Most existing works are mainly designed for the system-side,and their limitation is that they can only reflect the behavior characteristics of users in a specific service domain and are not suitable for the network-side detection and management.The main challenge for network-side user behavior analysis is that the network channel at the bottom of protocol stack cannot obtain the information of both application-layer and system-side,and can only rely on IP data flows,which makes it difficult to build an effective network-side user behavior profile.In this paper,a new method of user click-object recognition for intermediate network is proposed.The proposed method combines hidden Markov model(HMM) and neural networks(NN).The HMM framework describes the dynamic behavior of click streams and non-click streams from the perspective of IP flows,while NN is used to establish the relationship between the hidden states of HMMs and complex network behavior characteristics.The attribute of a request sequence is determined by the fitting degree between the sequence and the behavior models.The main advantages of this scheme are that it inherits the parse ability of HMM,and enhances the ability of HMM to describe complex data by the embedding NN.The proposed scheme does not involve the data content carried by IP flows,which makes it suitable for click behavior recognition in network-side encryption and non-encryption scenarios,and effectively solve the challenges faced by network-side user behavior profile analysis.Experimental results based on multiple actual data sets show that the three commonly used evaluation indicators F1,Kappa and AUC exceed 0.91,0.83 and 0.96 respectively.These results indicate that the performance of the proposed scheme is better than that of existing methods.

Key words: Click streams recognition, HMM, Network side, NN

CLC Number: 

  • TP393
[1]NAJAFABADI M M,KHOSHGOFTAAR T M,CALVERT C,et al.User behavior anomaly detection for application layer DDoS attacks[C]//2017 IEEE International Conference on Information Reuse and Integration(IRI).IEEE,2017:154-161.
[2]LI H,LI H,ZHANG S,et al.Intelligent learning system based on personalized recommendation technology[J].Neural Computing and Applications,2019,31(9):4455-4462.
[3]FU Y Q,LI D S.Application driven network latency measurement analysis and optimization techniques edge computing environment:A survey[J].Journal of Computer Research and Development,2018,55(3):512-523.
[4]SPINK A,KOSHMAN S,PARK M,et al.Multitasking websearch on[C]//International Conference on Information Technology:Coding and Computing(ITCĆ05)-Volume II.IEEE,2005:486-490.
[5]FALLAH M,ZARIFZADEH S.Practical Detection of ClickSpams Using Efficient Classification-Based Algorithms[J].International Journal of Information and Communication Techno-logy Research,2018,10(2):63-71.
[6]RAFTER R,SMYTH B.Passive profiling from server logs inan online recruitment environment[C]//Workshop on Intelligent Techniques for Web Personalization at the 17th International Joint Conference on Artificial Intelligence,Seattle,Washington,USA,August,2001.2001.
[7]BENEVENUTO F,RODRIGUES T,CHA M,et al.Characterizing user behavior in online social networks[C]//Proceedings of the 9th ACM SIGCOMM conference on Internet measurement.ACM,2009:49-62.
[8]ATTERER R,WNUK M,SCHMIDT A.Knowing the user'severy move:user activity tracking for website usability evaluation and implicit interaction[C]//Proceedings of the 15th International Conference on World Wide Web.ACM,2006:203-212.
[9]KAMMENHUBER N,LUXENBURGER J,FELDMANN A,et al.Web search clickstreams[C]//Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement.ACM,2006:245-250.
[10]HUANG J,WHITE R W.Parallel browsing behavior on theweb[C]//Proceedings of the 21st ACM conference on Hypertext and Hypermedia.ACM,2010:13-18.
[11]LIU Z,MAO J,WANG C,et al.Enhancing click models with mouse movement information[J].Information Retrieval Journal,2017,20(1):53-80.
[12]ZHANG M,MENG W,LEE S,et al.All your clicks belong to me:investigating click interception on the web[C]//The 28th USENIX Security Symposium(USENIX Security 19).2019:941-957.
[13]COOLEY R,MOBASHER B,SRIVASTAVA J.Data preparation for mining world wide web browsing patterns[J].Know-ledge and Information Systems,1999,1(1):5-32.
[14]HUIYING Z,WEI L.An intelligent algorithm of data pre-processing in Web usage mining[C]//The fifth World Congress on Intelligent Control and Automation(IEEE Cat.No.04EX788).IEEE,2004:3119-3123.
[15]CHITRAA V,THANAMANI A S.A novel technique for sessions identification in web usage mining preprocessing[J].International Journal of Computer Applications,2011,34(9):23-27.
[16]ANAND S,AGGARWAL R R.An efficient algorithm for data cleaning of log file using file extensions[J].International Journal of Computer Applications,2012,48(8):13-18.
[17]SCHNEIDER F,FELDMANN A,KRISHNAMURTHY B,et al.Understanding online social network usage from a network perspective[C]//Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement.ACM,2009:35-48.
[18]XIE G,ILIOFOTOU M,KARAGIANNIS T,et al.Resurf:Reconstructing web-surfing activity from network traffic[C]//2013 IFIP Networking Conference.IEEE,2013:1-9.
[19]HOUIDI Z B,SCAVO G,GHAMRI-DOUDANE S,et al.Goldmining in a river of internet content traffic[C]//International Workshop on Traffic Monitoring and Analysis.Berlin:Springer,2014:91-103.
[20]LIN X,LIU F,LIU J.Real-time user-click recognition based on spark streaming[C]//The 3rd IEEE International Conference on Computer and Communications(ICCC).IEEE,2017:2532-2536.
[21]VASSIO L,DRAGO I,MELLIA M.Detecting user actions from HTTP traces:Toward an automatic approach[C]//International Wireless Communications and Mobile Computing Conference(IWCMC).IEEE,2016:50-55.
[22]MANSOORI M,HIROSE Y,WELCH I,et al.Empirical analysis of impact of HTTP referer on malicious website behaviour and delivery[C]//The IEEE 30th International Conference on Advanced Information Networking and Applications(AINA).IEEE,2016:941-948.
[23]RIZOTHANASIS G,CARLSSON N,MAHANTI A.Identifying user actions from HTTP(S) traffic[C]//The IEEE 41st Conference on Local Computer Networks(LCN).IEEE,2016:555-558.
[24]BILMES J A.A Gentle Tutorial of the EM Algorithm and itsApplication to Parameter Estimation for Gaussian Mixture and Hidden Markov Models[R].U.C Berkeley:Technical Report,TR-97-021.International computer science institute and Department of Electrical Engineering and computer Science,1998.
[25]RABINER L R.A tutorial on hidden Markov models and selec-ted applications in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286.
[26]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[27]ZOU Q,XIE S,LIN Z,et al.Finding the best classificationthreshold in imbalanced classification[J].Big Data Research,2016,5:2-8.
[28]MENG J,ZHANG J,JIANG D L,et al.Selective ensemble classification integrated with affinity propagation clustering[J].Journal of Computer Research and Development,2018,55(5):986-993.
[29]LI J,YUN X C,LI S H,et al.HTTP malicious traffic detection method based on hybrid structure deep neural network[J].Journal on Communications,2019,40(1):24-33.
[30]PENG L,ZHANG H,YANG B,et al.Early stage internet traffic identification using data gravitation based classification[C]//The IEEE 14th Intl. Conf. on Dependable,Autonomic and Secure Computing,14th Intl. Conf. on Pervasive Intelligence and Computing,2nd Intl. Conf. on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).IEEE,2016:504-511.
[31]WYLIE C R,BARRETT L C,WYLIE C R.Advanced engineering mathematics[M].New York:McGraw-Hill,1960.
[32]CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794.
[1] LUO Xiong-feng, ZHAI Xiang-ping. Collision Avoidance Planning for Unmanned Aerial Vehicles Based on Spatial Motion Constraints [J]. Computer Science, 2022, 49(9): 194-201.
[2] JIAN Qi-rui, CHEN Ze-mao, WU Xiao-kang. Authentication and Key Agreement Protocol for UAV Communication [J]. Computer Science, 2022, 49(8): 306-313.
[3] CHENG Fu-hao, XU Tai-hua, CHEN Jian-jun, SONG Jing-jing, YANG Xi-bei. Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory [J]. Computer Science, 2022, 49(8): 97-107.
[4] WEI Kai-xuan, FU Ying. Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising [J]. Computer Science, 2022, 49(8): 120-126.
[5] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[6] WANG Bing, WU Hong-liang, NIU Xin-zheng. Robot Path Planning Based on Improved Potential Field Method [J]. Computer Science, 2022, 49(7): 196-203.
[7] PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247.
[8] PAN Zhi-yong, CHENG Bao-lei, FAN Jian-xi, BIAN Qing-rong. Algorithm to Construct Node-independent Spanning Trees in Data Center Network BCDC [J]. Computer Science, 2022, 49(7): 287-296.
[9] LI Tang, QIN Xiao-lin, CHI He-yu, FEI Ke. Secure Coordination Model for Multiple Unmanned Systems [J]. Computer Science, 2022, 49(7): 332-339.
[10] WANG Xing-wei, XIN Jun-chang, SHAO An-lin, BI Yuan-guo, YI Xiu-shuang. Study on Development Status and Countermeasures of Industrial Intranet in Enterprises [J]. Computer Science, 2022, 49(7): 1-9.
[11] YIN Xiu, LIU Xi-lin, LIU Xi-yu. Study on Computing Capacity of Novel Numerical Spiking Neural P Systems with MultipleSynaptic Channels [J]. Computer Science, 2022, 49(6A): 223-231.
[12] WANG Xin, XIANG Ming-yue, LI Si-ying, ZHAO Ruo-cheng. Relation Prediction for Railway Travelling Group Based on Hidden Markov Model [J]. Computer Science, 2022, 49(6A): 247-255.
[13] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[14] TAN Ren-shen, XU Long-bo, ZHOU Bing, JING Zhao-xia, HUANG Xiang-sheng. Optimization and Simulation of General Operation and Maintenance Path Planning Model for Offshore Wind Farms [J]. Computer Science, 2022, 49(6A): 795-801.
[15] LIU Zhang-hui, ZHENG Hong-qiang, ZHANG Jian-shan, CHEN Zhe-yi. Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems [J]. Computer Science, 2022, 49(6A): 619-627.
Full text



No Suggested Reading articles found!