计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 340-346.doi: 10.11896/jsjkx.231000121

• 信息安全 • 上一篇    下一篇

基于会话统计编码器的恶意加密流量检测方法研究

巩思越, 刘辉, 王宝会   

  1. 北京航空航天大学软件学院 北京 100000
  • 收稿日期:2023-10-18 修回日期:2024-03-07 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 王宝会(wangbh@buaa.edu.cn)
  • 作者简介:(gongsy@buaa.edu.cn)

Malicious Encrypted Traffic Detection Method Based on Conversation Statistical Encoder Model

GONG Siyue, LIU Hui, WANG Baohui   

  1. College of Software,Beihang University,Beijing 100000,China
  • Received:2023-10-18 Revised:2024-03-07 Online:2024-11-15 Published:2024-11-06
  • About author:GONG Siyue,born in 1996,postgra-duate.His main research interests include natural language processing and malicious traffic detection.
    WANG Baohui,born in 1973,Ph.D,professor.His research interests include big data,artificial intelligence and network information security.

摘要: 随着网络技术的发展和广泛应用,加密流量已成为保护用户隐私的关键技术。但同时,恶意软件和攻击者也利用加密流量来隐藏其行为,规避传统的网络入侵检测系统。现有的恶意加密流量检测方法存在一些问题,如基于统计特征的方法需要依赖专家经验进行特征提取,且不同协议的特征无法通用;基于原始输入的深度学习方法存在信息不完整和字段填充等数据问题,对加密流量交互行为的语义表征不足。为解决上述问题,提出了一种名为会话统计编码器模型(Conversation Statistic Encoder Model,CSEM)的方法。与传统的将字节流输入深度神经网络的模式不同,该方法借鉴了transformer-encoder模型,引入了一种新的流量包特征解析方式。所提方法能够针对每个流量包构建出固定长度的向量表示,并且无需进行零填充,同时避免了特征提取过程对具体加密协议的依赖,构建了一个混合深度神经网络,为恶意加密流量检测提供了一种新的思路。在DataCon和自建数据集上对所提模型进行了验证,其在DataCon公开数据集上的召回率达到了0.991 1,精确率达到了0.940 7,F1值达到了0.965 2(相比随机森林模型F1值提升了9%),几项指标均达到了目前的最佳水平。

关键词: 会话, 加密流量检测, 编码器

Abstract: With the development and widespread application of network technology,encrypted traffic has become a key technology for protecting user privacy.However,malware and attackers also use encrypted traffic to hide their behaviors and evade traditional network intrusion detection systems.Existing malicious encrypted traffic detection methods have some pro-blems.Statistics-based methods rely on expert experience for feature extraction,and features of different protocols cannot be generalized.Deep learning methods based on raw inputs have incomplete information and field padding data issues,leading to insufficient semantic representation of encrypted traffic interactions.To solve the above problems,this paper proposes a method called “conversation statistic encoder model(CSEM)”.The method draws on the transformer encoder model and introduces a new traffic packet feature parsing method,and it is different from the traditional mode of inputting byte streams into deep neural networks.The proposed method can construct fixed-length vector representations for each traffic packet without padding zeros,while avoiding dependence on specific encrypted protocols in the feature extraction process.A hybrid deep neural network is constructed to provide a new idea for malicious encrypted traffic detection.The proposed method is verified on the DataCon dataset and self- built dataset,and the experimental results on Datacon dataset show a recall of 0.991 1,precision of 0.940 7,and F1 score of 0.965 2,reaching the current best level,and the F1 score is 9% higher than that of the random forest model.

Key words: Conversation, Encrypted traffic detection, Encoder

中图分类号: 

  • TP312
[1] CNCERT.Analysis Report on China’s Internet Network Secu-rity Monitoring Data in the First Half of 2021[EB/OL].(2021-07-31)[2023-08-15].https://www.cert.org.cn/publish/main/46/2021/20210731090556980286517/20210731090556980286517_.html.
[2] JON O.Network Traffic Analysis (NTA):A Cybersecurity‘Quick Win’[EB/OL].[2023-08-15].https://www.cisco.com/c/dam/en/us/products/collateral/security/stealthwatch/stealthwatch-esg-wp.pdf.
[3] LI Y,GUO H,HOU J,et al.A Survey of Encrypted Malicious Traffic Detection[C]//2021 International Conference on Communications,Computing,Cybersecurity,and Informatics.IEEEComputer Society,2021:1-7.
[4] FANG Y,XU Y,HUANG C,et al.Against malicious SSL/TLS encryption:identify malicious traffic based on random forest[C]//Fourth International Congress on Information and Communication Technology.Springer,2020:99-115.
[5] KHRAISAT A,GONDAL I,VAMPLEW P.An anomaly intrusion detection system using C5 decision tree classifier[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Springer,2018:149-155.
[6] LI Y,XIA J,ZHANG S,et al.An efficient intrusion detectionsystem based on support vector machines and gradually feature removal method[J].Expert Systems with Applications,2012,39(1):424-430.
[7] LIN W,KE S,TSAI C.CANN:An intrusion detection system based on combining cluster centers and nearest neighbors[J].Knowledge-based Systems,2015,78:13-21.
[8] ASHKARI A H.CICFlowmeter-V4.0 (formerly known asISCXFlowMeter) is a network traffic Bi-flow generator and analyser for anomaly detection[EB/OL].[2021-07-05].https://github.com/ahlashkari/CICFlowMeter.
[9] ASHKARI A H,DRAPER-GIL G,MAMUN M S I,et al.Cha-racterization of tor traffic using time based features[C]//International Conference on Information Systems Security and Privacy.2017:253-262.
[10] WANG W,ZHU M,WANG J,et al.End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//2017 IEEE International Conference on Intelligence and Security Informatics.IEEE,2017:43-48.
[11] BAZUHAIR W,LEE W.Detecting malign encrypted networktraffic using perlin noise and convolutional neural network[C]//2020 10th Annual Computing and Communication Workshop and Conference.IEEE,2020:200-206.
[12] CHENG J,HE R,E Y P,et al.Real-time encrypted traffic classification via lightweight neural networks[C]//GLOBECOM 2020-2020 IEEE Global Communications Conference.IEEE,2020:1-6.
[13] ZOU Y,ZHANG J,JIANG B.Detection Of Malicious Encrypted Traffic Based on Lstm Recurrent Neural Network[J].Compu-ter Applications and Software,2020,37(2):308-312.
[14] LIN X,XIONG G,GOU G,et al.ET-BERT:A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification[C]//Proceedings of the ACM Web Conference 2022.2022:633-642.
[15] ZENG Y,GU H,WEI W,et al.a deep learning based network encrypted traffic classification and intrusion detection framework[J].IEEE Access,2019,7:45182-45190.
[16] BADER O,LICHY A,HAJAJ C,et al.MalDIST:From encryp-ted traffic classification to malware traffic detection and classification[C]//2022 IEEE 19th Annual Consumer Communications &Networking Conference.IEEE,2022:527-533.
[17] GU Y H,XU H,ZHANG X Q.Encrypted malicious traffic detection based on multi-granularity characterization learning[J].Journal of Computing,2023,46(9):1888-1899.
[18] WEI J H,ZHENG R F,LIU J Y.Research on malicious TLStraffic identification based on hybrid neural network[J].Computer Engineering and Applications,2021,57(7):107-114.
[19] DEVLIN J,CHANG M,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].ar-Xiv:1810.04805,2018.
[20] 国家密码管理局.随机性检测规范[EB/OL].(2021-10-19)[2023-08-15].https://std.samr.gov.cn/hb/search/stdHBDetailed?id=E66CC4F6F8D78B7FE05397BE0A0A6C55.
[21] ANDREW R,JUAN S,JAMES N,et al.SP 800-22 Rev.1a,A Statistical Test Suite for RNGs and PRNGs for Crypto Apps | CSRC[EB/OL].(2010-04-01)[2023-07-16].https://csrc.nist.gov/publications/detail/sp/800-22/rev-1a/final.
[22] CACHIN C.Smooth entropy and Rényi entropy[C]//Interna-tional Conference on the Theory and Applications of Cryptographic Techniques.Springer,1997:193-208.
[23] DataCon社区.DataCon开放数据集-DataCon2020-加密恶意流量数据集方向开放数据集[EB/OL].(2021-11-11)[2023-08-15].https://datacon.qianxin.com/opendata/openpage?resourcesId=6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!