计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 39-47.doi: 10.11896/jsjkx.231000118
徐宸涵1, 黄河1, 孙玉娥2, 杜扬1
XU Chenhan1, HUANG He1, SUN Yu'e2, DU Yang1
摘要: Obfs4混淆流量是匿名通信网络Tor的一种承载流量,因其强匿名的特性而被滥用于非法网络活动,因此识别Obfs4混淆流量对预防利用Tor网络进行的网络犯罪具有重要作用。现有识别策略往往侧重于分析Obfs4流量特征,将完整流样本利用机器学习或深度学习技术进行精细化识别,但处于在线流识别的应用场景下时间开销偏高,且识别准确度在Obfs4应用间隔到达时间反检测技术(Inter-arrival Timing,IAT)后有所下降。为此,提出了一种基于部分数据的多级剪枝Obfs4混淆流量识别方法,仅收集每个流最先到达的少量数据包进行多轮快速过滤,并重点针对IAT模式特性设计识别方法,提升了Obfs4流量识别的效率和鲁棒性。该方法将识别过程分为握手阶段和加密通信阶段。在握手阶段,充分挖掘Obfs4握手数据包的隐含语义,进行随机性、时序和长度分布特征的粗粒度快速剪枝;在加密通信阶段,先对每个流的前若干数据包进行特征提取,并提高IAT相关特征的权重,最后利用XGBoost分类方法进行细粒度识别。实验结果表明,在包括了应用IAT技术的混淆流量的数据集上,使用流的前30~50个数据包能达到99%的正确率和精确度,平均每条流的处理时间在毫秒级。
中图分类号:
| [1]DINGLEDINE R,MATHEWSON N,SYVERSON P F.Tor:The second-generation onion router[C]//Proceedings of the 13th USENIX Security Symposium.2004,4:303-320. [2]The Tor Project.Tor Metrics[EB/OL].(2023-10-07) [2023-10-07].https://metrics.torproject.org/networksize.html. [3]FAERØY A.Meek[EB/OL].(2020-06-15) [2023-10-07].ht-tps://gitlab.torproject.org/legacy/trac/-/wikis/doc/meek. [4]KADIANAKIS G,WILEY B,ANGEL Y,et al.Obfs2(TheTwobfuscator)[EB/OL].(2013-02-08) [2023-10-07].https://github.com/Null-Hypothesis/obfsproxy/blob/master/doc/obfs2/obfs2-protocol-spec.txt. [5]KADIANAKIS G,WILEY B,ANGEL Y,et al.Obfs3(TheThreebfuscator)[EB/OL].(2013-01-23) [2023-10-07].https://github.com/Null-Hypothesis/obfsproxy/blob/master/doc/obfs3/obfs3-protocol-spec.txt. [6]ANGEL Y,MARTÍ D.Obfs4-The Obfourscator[EB/OL].(2023-10-05) [2023-10-07].https://github.com/Yawning/obfs4. [7]FIFIELD D,BOCOVICH C,BREAULT A,et al.SnowFlake.[EB/OL].(2021-11-04) [2023-10-07].https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/Technical%20Overview. [8]BERNSTEIN D J,HAMBURG M,KRASNOVA A,et al.Elligator:elliptic-curve points indistinguish-able from uniform random strings[C]//Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security.2013:967-980. [9]LIANG D,HE Y Z.Obfs4 Traffic Identification Based on Multiple-feature Fusion[C]//2020 IEEE International Conference on Power,Intelligent Computing and Systems(ICPICS).IEEE,2020:323-327. [10]HE Y,HU L P,GAO R.Detection of tor traffic hiding under obfs4 protocol based on two-level filtering[C]//2019 2nd International Conference on Data Intelligence and Security(ICDIS).IEEE,2019:195-200. [11]YAO Z J,GE J G,WU Y L,et al.Encrypted traffic classification based on Gaussian mixture models and Hidden Markov Models[J].Journal of Network and Computer Applications,2020,166:102711. [12]WANG X B,LI Z Y,HUANG W T,et al.Towards Comprehensive Analysis of Tor Hidden Service Access Behavior Identification Under Obfs4 Scenario[C]//Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications.2021:205-210. [13]HE G F,YANG M,LUO J Z,et al.Online Identification of Tor Anonymous Communication Traffic[J].Journal of Software,2013,24(3):540-556. [14]GAO R.Research on Anonymous Network Traffic Identification for Obfs4[D].Beijing:Beijing Jiaotong University,2018. [15]CHEN T Q,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794. [16]SMADIA S,ALMOMANIB O,MOHAMMADC A,et al.VPN Encrypted Traffic classification using XGBoost[J].International Journal,2021,9(7):960-966. [17]MANJU N,HARISH B S,PRAJWAL V.Ensemble feature selection and classification of internet traffic using XGBoost classifier[J].International Journal of Computer Network and Information Security,2019,11(7):37-44. [18]XU W L,ZOU F T.Obfuscated tor traffic identification based on sliding window[J].Security and Communication Networks,2021,2021:1-11. [19]国家标准化管理委员会.信息安全技术 二元序列随机性检测方法 [EB/OL].(2016-08-29) [2023-10-07].https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=46D7E3E9C4B81DF460052FFEB706CAB0. [20]MA C C,DU X H,CAO L F.Improved KNN algorithm for fine-grained classification of encrypted network flow[J].Electronics,2020,9(2):324. [21]SUN G L,CHEN T,SU Y Y,et al.Internet traffic classification based on incremental support vector machines[J].Mobile Networks and Applications,2018,23:789-796. [22]SOLEIMANI M H M,MANSOORIZADEH M,NASSIRI M.Real-time identification of three Tor pluggable transports using machine learning techniques[J].The Journal of Supercompu-ting,2018,74(10):4910-4927. [23]LOTFOLLAHI M,ZADE R S H,SIAVOSHANI J M,et al.Deep packet:A novel approach for encrypted traffic classification using deep learning[J].Soft Computing,2020,24(3):1999-2012. | 
| 
 | ||