计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 381-385.doi: 10.11896/JsJkx.191200155

• 信息安全 • 上一篇    下一篇

基于C4.5决策树的HTTPS加密流量分类方法

邹洁1, 朱国胜1, 祁小云2, 曹扬晨1   

  1. 1 湖北大学计算机与信息工程学院 武汉430062;
    2 湖北大学化学化工学院 武汉 430062
  • 发布日期:2020-07-07
  • 通讯作者: 朱国胜(zhuguosheng@hubu.edu.cn)
  • 作者简介:292370368@qq.com
  • 基金资助:
    赛尔网络下一代互联网技术创新项目(NGII20180411)

HTTPS Encrypted Traffic Classification Method Based on C4.5 Decision Tree

ZOU Jie1, ZHU Guo-sheng1, QI Xiao-yun2 and CAO Yang-chen1   

  1. 1 School of Computer and Information Engineering,Hubei University,Wuhan 430062,China
    2 School of Chemistry and Chemical Engineering,Hubei University,Wuhan 430062,China
  • Published:2020-07-07
  • About author:ZOU Jie, born in 1996, postgraduate.Her main research interests include machine learning and network traffic analysis.
    ZHU Guo-sheng, born in 1972, Ph.D, professor.His main research interests include next-generation Internet and software-defined networks.
  • Supported by:
    This work was supported by CERNET Innovation ProJect (NGII20180411).

摘要: HTTPS协议基于原本不具有加密机制的HTTP协议。将其与SSL/TLS协议组合,在传输数据之前,客户端与服务器端之间进行一次SSL/TLS 握手,并协商通信过程中使用的加密套件,以安全地交换密钥并且实现双方的身份验证,建立安全通信线路后,对 HTTP 应用协议数据进行加密传输,防止通信内容被窃听和篡改。传统的基于有效载荷的方法已无法处理加密流量,基于流量特征和机器学习的加密流量分类和分析成为目前的主流方法,其通过建立监督学习模型,在保证加密完整性的条件下,基于网络流数据特征工程,应用C4.5决策树算法,在局域网环境中对腾讯网中应用HTTPS加密数据传输流进行分析,可有效实现对该网站HTTPS加密流量进行模块内容的精确分类。

关键词: HTTPS, SSL/TLS, 分类, 加密流量, 决策树

Abstract: The HTTPS protocol is based on the HTTP protocol that does not have an encryption mechanism.By combining with the SSL/TLS protocol,an SSL/TLS handshake is performed between the client and the server before the data is transmitted,and the cipher suite used in the communication process is negotiated to securely exchange secret keys and implement mutual authentication.After establishing a secure communication line,the HTTP application protocol data is encrypted and transmitted,preventing the risk of eavesdropping and tampering of the communication content.The traditional payload-based method can’t handle encrypted traffic.The classification and analysis of encrypted traffic based on traffic characteristics and machine learning have become the mainstream method.By establishing a supervised learning model,based on network flow data feature engineering,under the condition of ensuring encryption integrity,the C4.5 decision tree algorithm is applied in the LAN environment to analyze the application of HTTPS encrypted data transmission stream in Tencent network,which can effectively realize accurate classification of the website HTTPS encrypted traffic.

Key words: Classification, Decision tree, Encrypted traffic, HTTPS, SSL/TLS

中图分类号: 

  • TP181
[1] HOLZ R,BRAUN L,KAMMENHUBER N,et al.The SSL Landscape:A Thorough Analysis of the X.509 PKI Using Active and Passive Measurements//Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference(IMC ’11).New York,NY,USA,ACM,2011:427-444.
[2] SUN G,XUE Y,DONG Y,et al.An Novel Hybrid Method for Effectively Classifying Encrypted Traffic//Global Telecommunications Conference (GLOBECOM 2010).IEEE,2010:1-5.
[3] ARNDT D J,ZINCIR-HEYWOOD A N.A Comparison of Three Machine Learning Techniques for Encrypted Network Traffic Analysis//IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).2011:107-114.
[4] MILLER B,HUANG L,JOSEPH A D,et al.Tygar.I Know Why You Went to the Clinic:Risks and Realization of HTTPS Traffic Analysis//Privacy Enhancing Technologies,volume 8555 of Lecture Notes in Computer Science.Springer International Publishing,2014:143-163.
[5] KORCZYNSKI M A.Classifying Service Flows in the Encrypted Skype Traffic//2012 IEEE International Conference on Communications (ICC).2012:1064-1068.
[6] WANG T,CAI X,NITHYANAND R,et al.Effective attacks and provable defenses for website fingerprinting//23rd {USENIX} Security Symposium ({USENIX}.2014:143-157.
[7] CHENG G,CHEN Y X.Encrypted Traffic Identification MethodBased on Support Vector Machine.Journal of Southeast University(Natural Science Edition),2017(4):655-659.
[8] CHEN W,HU L,YANG L.Fast Identification Method of Encrypted Traffic Based on Load Characteristics.Computer Engineering.2012(12):22-25.
[9] ZHANG B Y.Analysis of the Principle and Application of HTTPS Protocol.Network Security Technology and Application,2016(7):36-37.
[10] XU P,LIN S.Traffic Classification Method Based on C4.5 Decision Tree .Journal of Software,2009(10):2692-2704.
[11] LIU K.Research on feature selection in network flow classification .Yangzhou:Yangzhou University,2013:18-19.
[12] ZHOU Z H.Machine Learning .BeiJing:Tsinghua University Press,2016:73-79.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[4] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[7] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[8] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[9] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[10] 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建.
基于注意力机制和多任务学习的阿尔茨海默症分类
Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning
计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
[11] 李小伟, 舒辉, 光焱, 翟懿, 杨资集.
自然语言处理在简历分析中的应用研究综述
Survey of the Application of Natural Language Processing for Resume Analysis
计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134
[12] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[13] 黄少滨, 孙雪薇, 李熔盛.
基于跨句上下文信息的神经网络关系分类方法
Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network
计算机科学, 2022, 49(6A): 119-124. https://doi.org/10.11896/jsjkx.210600150
[14] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[15] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!