计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 39-47.doi: 10.11896/jsjkx.231000118

• 紧凑数据结构 • 上一篇    下一篇

一种基于部分数据的多级剪枝Obfs4混淆流量识别方法

徐宸涵1, 黄河1, 孙玉娥2, 杜扬1   

  1. 1 苏州大学计算机科学与技术学院 江苏 苏州215006
    2 苏州大学轨道交通学院 江苏 苏州215131
  • 收稿日期:2023-10-18 修回日期:2023-11-27 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 孙玉娥(sunye12@suda.edu.cn)
  • 作者简介:(chenhan_xu@outlook.com)
  • 基金资助:
    国家自然科学基金(62332013,62072322,U20A20182,62202322)

Multi-level Pruning Obfs4 Obfuscated Traffic Recognition Method Based on Partial Data

XU Chenhan1, HUANG He1, SUN Yu'e2, DU Yang1   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu215006,China
    2 School of Rail Transportation,Soochow University,Suzhou,Jiangsu 215131,China
  • Received:2023-10-18 Revised:2023-11-27 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62332013,62072322,U20A20182,62202322).

摘要: Obfs4混淆流量是匿名通信网络Tor的一种承载流量,因其强匿名的特性而被滥用于非法网络活动,因此识别Obfs4混淆流量对预防利用Tor网络进行的网络犯罪具有重要作用。现有识别策略往往侧重于分析Obfs4流量特征,将完整流样本利用机器学习或深度学习技术进行精细化识别,但处于在线流识别的应用场景下时间开销偏高,且识别准确度在Obfs4应用间隔到达时间反检测技术(Inter-arrival Timing,IAT)后有所下降。为此,提出了一种基于部分数据的多级剪枝Obfs4混淆流量识别方法,仅收集每个流最先到达的少量数据包进行多轮快速过滤,并重点针对IAT模式特性设计识别方法,提升了Obfs4流量识别的效率和鲁棒性。该方法将识别过程分为握手阶段和加密通信阶段。在握手阶段,充分挖掘Obfs4握手数据包的隐含语义,进行随机性、时序和长度分布特征的粗粒度快速剪枝;在加密通信阶段,先对每个流的前若干数据包进行特征提取,并提高IAT相关特征的权重,最后利用XGBoost分类方法进行细粒度识别。实验结果表明,在包括了应用IAT技术的混淆流量的数据集上,使用流的前30~50个数据包能达到99%的正确率和精确度,平均每条流的处理时间在毫秒级。

关键词: Obfs4, 混淆流量识别, 多级剪枝, 间隔到达时间反检测, 极致梯度提升

Abstract: Obfs4 obfuscated traffic,carried by the anonymous communication network Tor,is often misused for illicit online acti-vities due to its strong anonymity.Consequently,the identification of Obfs4 obfuscated traffic plays a critical role in preventing cybercrime via the Tor network.Existing methods tend to focus on the analysis of Obfs4 traffic features,utilize machine learning or deep learning techniques for the precise identification of entire flow samples.However,in the realm of flow recognition,it often results in considerable time overhead.Recognition accuracy also decreases notably with the incorporation of inter-arrival timing(IAT) technology in Obfs4.In response,a multi-level pruning method for Obfs4 obfuscated traffic recognition based on partial data is proposed.This approach involves collecting only a small number of initial packets from each flow for several rounds of rapid filtering,and is specifically designed to enhance the efficiency and reliability of Obfs4 traffic identification by focusing on the IAT pattern.The approach breaks down the process into two key phases:a handshake phase and an encrypted communication phase.During the handshake phase,it thoroughly explores the underlying meanings in Obfs4 handshake packets,enabling quick filtering based on broad characteristics like randomness,timing,and length distribution.In the encrypted communication phase,it extracts features from the first packets of each flow and places greater importance on features related to IAT.Finally,fine-grained identification is accomplished using the XGBoost classification method.Experimental findings indicate that despite the implementation of IAT technology,leveraging the initial 30~50 data packets from the flow yields a 99% accuracy rate,with an average processing time per flow measured in milliseconds.

Key words: Obfs4, Obfuscated traffic recognition, Multi-level pruning, Inter-arrival time reverse detection, XGBoost

中图分类号: 

  • TP391
[1]DINGLEDINE R,MATHEWSON N,SYVERSON P F.Tor:The second-generation onion router[C]//Proceedings of the 13th USENIX Security Symposium.2004,4:303-320.
[2]The Tor Project.Tor Metrics[EB/OL].(2023-10-07) [2023-10-07].https://metrics.torproject.org/networksize.html.
[3]FAERØY A.Meek[EB/OL].(2020-06-15) [2023-10-07].ht-tps://gitlab.torproject.org/legacy/trac/-/wikis/doc/meek.
[4]KADIANAKIS G,WILEY B,ANGEL Y,et al.Obfs2(TheTwobfuscator)[EB/OL].(2013-02-08) [2023-10-07].https://github.com/Null-Hypothesis/obfsproxy/blob/master/doc/obfs2/obfs2-protocol-spec.txt.
[5]KADIANAKIS G,WILEY B,ANGEL Y,et al.Obfs3(TheThreebfuscator)[EB/OL].(2013-01-23) [2023-10-07].https://github.com/Null-Hypothesis/obfsproxy/blob/master/doc/obfs3/obfs3-protocol-spec.txt.
[6]ANGEL Y,MARTÍ D.Obfs4-The Obfourscator[EB/OL].(2023-10-05) [2023-10-07].https://github.com/Yawning/obfs4.
[7]FIFIELD D,BOCOVICH C,BREAULT A,et al.SnowFlake.[EB/OL].(2021-11-04) [2023-10-07].https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/Technical%20Overview.
[8]BERNSTEIN D J,HAMBURG M,KRASNOVA A,et al.Elligator:elliptic-curve points indistinguish-able from uniform random strings[C]//Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security.2013:967-980.
[9]LIANG D,HE Y Z.Obfs4 Traffic Identification Based on Multiple-feature Fusion[C]//2020 IEEE International Conference on Power,Intelligent Computing and Systems(ICPICS).IEEE,2020:323-327.
[10]HE Y,HU L P,GAO R.Detection of tor traffic hiding under obfs4 protocol based on two-level filtering[C]//2019 2nd International Conference on Data Intelligence and Security(ICDIS).IEEE,2019:195-200.
[11]YAO Z J,GE J G,WU Y L,et al.Encrypted traffic classification based on Gaussian mixture models and Hidden Markov Models[J].Journal of Network and Computer Applications,2020,166:102711.
[12]WANG X B,LI Z Y,HUANG W T,et al.Towards Comprehensive Analysis of Tor Hidden Service Access Behavior Identification Under Obfs4 Scenario[C]//Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications.2021:205-210.
[13]HE G F,YANG M,LUO J Z,et al.Online Identification of Tor Anonymous Communication Traffic[J].Journal of Software,2013,24(3):540-556.
[14]GAO R.Research on Anonymous Network Traffic Identification for Obfs4[D].Beijing:Beijing Jiaotong University,2018.
[15]CHEN T Q,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794.
[16]SMADIA S,ALMOMANIB O,MOHAMMADC A,et al.VPN Encrypted Traffic classification using XGBoost[J].International Journal,2021,9(7):960-966.
[17]MANJU N,HARISH B S,PRAJWAL V.Ensemble feature selection and classification of internet traffic using XGBoost classifier[J].International Journal of Computer Network and Information Security,2019,11(7):37-44.
[18]XU W L,ZOU F T.Obfuscated tor traffic identification based on sliding window[J].Security and Communication Networks,2021,2021:1-11.
[19]国家标准化管理委员会.信息安全技术 二元序列随机性检测方法 [EB/OL].(2016-08-29) [2023-10-07].https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=46D7E3E9C4B81DF460052FFEB706CAB0.
[20]MA C C,DU X H,CAO L F.Improved KNN algorithm for fine-grained classification of encrypted network flow[J].Electronics,2020,9(2):324.
[21]SUN G L,CHEN T,SU Y Y,et al.Internet traffic classification based on incremental support vector machines[J].Mobile Networks and Applications,2018,23:789-796.
[22]SOLEIMANI M H M,MANSOORIZADEH M,NASSIRI M.Real-time identification of three Tor pluggable transports using machine learning techniques[J].The Journal of Supercompu-ting,2018,74(10):4910-4927.
[23]LOTFOLLAHI M,ZADE R S H,SIAVOSHANI J M,et al.Deep packet:A novel approach for encrypted traffic classification using deep learning[J].Soft Computing,2020,24(3):1999-2012.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!