Computer Science ›› 2024, Vol. 51 ›› Issue (4): 39-47.doi: 10.11896/jsjkx.231000118

• Compact Data Structure • Previous Articles     Next Articles

Multi-level Pruning Obfs4 Obfuscated Traffic Recognition Method Based on Partial Data

XU Chenhan1, HUANG He1, SUN Yu'e2, DU Yang1   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu215006,China
    2 School of Rail Transportation,Soochow University,Suzhou,Jiangsu 215131,China
  • Received:2023-10-18 Revised:2023-11-27 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62332013,62072322,U20A20182,62202322).

Abstract: Obfs4 obfuscated traffic,carried by the anonymous communication network Tor,is often misused for illicit online acti-vities due to its strong anonymity.Consequently,the identification of Obfs4 obfuscated traffic plays a critical role in preventing cybercrime via the Tor network.Existing methods tend to focus on the analysis of Obfs4 traffic features,utilize machine learning or deep learning techniques for the precise identification of entire flow samples.However,in the realm of flow recognition,it often results in considerable time overhead.Recognition accuracy also decreases notably with the incorporation of inter-arrival timing(IAT) technology in Obfs4.In response,a multi-level pruning method for Obfs4 obfuscated traffic recognition based on partial data is proposed.This approach involves collecting only a small number of initial packets from each flow for several rounds of rapid filtering,and is specifically designed to enhance the efficiency and reliability of Obfs4 traffic identification by focusing on the IAT pattern.The approach breaks down the process into two key phases:a handshake phase and an encrypted communication phase.During the handshake phase,it thoroughly explores the underlying meanings in Obfs4 handshake packets,enabling quick filtering based on broad characteristics like randomness,timing,and length distribution.In the encrypted communication phase,it extracts features from the first packets of each flow and places greater importance on features related to IAT.Finally,fine-grained identification is accomplished using the XGBoost classification method.Experimental findings indicate that despite the implementation of IAT technology,leveraging the initial 30~50 data packets from the flow yields a 99% accuracy rate,with an average processing time per flow measured in milliseconds.

Key words: Obfs4, Obfuscated traffic recognition, Multi-level pruning, Inter-arrival time reverse detection, XGBoost

CLC Number: 

  • TP391
[1]DINGLEDINE R,MATHEWSON N,SYVERSON P F.Tor:The second-generation onion router[C]//Proceedings of the 13th USENIX Security Symposium.2004,4:303-320.
[2]The Tor Project.Tor Metrics[EB/OL].(2023-10-07) [2023-10-07].https://metrics.torproject.org/networksize.html.
[3]FAERØY A.Meek[EB/OL].(2020-06-15) [2023-10-07].ht-tps://gitlab.torproject.org/legacy/trac/-/wikis/doc/meek.
[4]KADIANAKIS G,WILEY B,ANGEL Y,et al.Obfs2(TheTwobfuscator)[EB/OL].(2013-02-08) [2023-10-07].https://github.com/Null-Hypothesis/obfsproxy/blob/master/doc/obfs2/obfs2-protocol-spec.txt.
[5]KADIANAKIS G,WILEY B,ANGEL Y,et al.Obfs3(TheThreebfuscator)[EB/OL].(2013-01-23) [2023-10-07].https://github.com/Null-Hypothesis/obfsproxy/blob/master/doc/obfs3/obfs3-protocol-spec.txt.
[6]ANGEL Y,MARTÍ D.Obfs4-The Obfourscator[EB/OL].(2023-10-05) [2023-10-07].https://github.com/Yawning/obfs4.
[7]FIFIELD D,BOCOVICH C,BREAULT A,et al.SnowFlake.[EB/OL].(2021-11-04) [2023-10-07].https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/Technical%20Overview.
[8]BERNSTEIN D J,HAMBURG M,KRASNOVA A,et al.Elligator:elliptic-curve points indistinguish-able from uniform random strings[C]//Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security.2013:967-980.
[9]LIANG D,HE Y Z.Obfs4 Traffic Identification Based on Multiple-feature Fusion[C]//2020 IEEE International Conference on Power,Intelligent Computing and Systems(ICPICS).IEEE,2020:323-327.
[10]HE Y,HU L P,GAO R.Detection of tor traffic hiding under obfs4 protocol based on two-level filtering[C]//2019 2nd International Conference on Data Intelligence and Security(ICDIS).IEEE,2019:195-200.
[11]YAO Z J,GE J G,WU Y L,et al.Encrypted traffic classification based on Gaussian mixture models and Hidden Markov Models[J].Journal of Network and Computer Applications,2020,166:102711.
[12]WANG X B,LI Z Y,HUANG W T,et al.Towards Comprehensive Analysis of Tor Hidden Service Access Behavior Identification Under Obfs4 Scenario[C]//Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications.2021:205-210.
[13]HE G F,YANG M,LUO J Z,et al.Online Identification of Tor Anonymous Communication Traffic[J].Journal of Software,2013,24(3):540-556.
[14]GAO R.Research on Anonymous Network Traffic Identification for Obfs4[D].Beijing:Beijing Jiaotong University,2018.
[15]CHEN T Q,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794.
[16]SMADIA S,ALMOMANIB O,MOHAMMADC A,et al.VPN Encrypted Traffic classification using XGBoost[J].International Journal,2021,9(7):960-966.
[17]MANJU N,HARISH B S,PRAJWAL V.Ensemble feature selection and classification of internet traffic using XGBoost classifier[J].International Journal of Computer Network and Information Security,2019,11(7):37-44.
[18]XU W L,ZOU F T.Obfuscated tor traffic identification based on sliding window[J].Security and Communication Networks,2021,2021:1-11.
[19]国家标准化管理委员会.信息安全技术 二元序列随机性检测方法 [EB/OL].(2016-08-29) [2023-10-07].https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=46D7E3E9C4B81DF460052FFEB706CAB0.
[20]MA C C,DU X H,CAO L F.Improved KNN algorithm for fine-grained classification of encrypted network flow[J].Electronics,2020,9(2):324.
[21]SUN G L,CHEN T,SU Y Y,et al.Internet traffic classification based on incremental support vector machines[J].Mobile Networks and Applications,2018,23:789-796.
[22]SOLEIMANI M H M,MANSOORIZADEH M,NASSIRI M.Real-time identification of three Tor pluggable transports using machine learning techniques[J].The Journal of Supercompu-ting,2018,74(10):4910-4927.
[23]LOTFOLLAHI M,ZADE R S H,SIAVOSHANI J M,et al.Deep packet:A novel approach for encrypted traffic classification using deep learning[J].Soft Computing,2020,24(3):1999-2012.
[1] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[2] SUN Fu-quan, LIANG Ying. Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm [J]. Computer Science, 2022, 49(6A): 309-313.
[3] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[4] LIAO Bin, WANG Zhi-ning, LI Min, SUN Rui-na. Integrating XGBoost and SHAP Model for Football Player Value Prediction and Characteristic Analysis [J]. Computer Science, 2022, 49(12): 195-204.
[5] YU Juan, ZHANG Chen. Cross-lingual Term Alignment with Kernel-XGBoost [J]. Computer Science, 2022, 49(11A): 211000111-6.
[6] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[7] GONG Zhui-fei, WEI Chuan-jia. Complex Network Link Prediction Method Based on Topology Similarity and XGBoost [J]. Computer Science, 2021, 48(12): 226-230.
[8] WANG Mao-guang, YANG Hang. Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble [J]. Computer Science, 2021, 48(11A): 71-76.
[9] WANG Xiao-di, LIU Xin, YU Xiao. Adaptive Frequency Domain Model for Multivariate Time Series Forecasting [J]. Computer Science, 2021, 48(11A): 204-210.
[10] SONG Ling-ling, WANG Shi-hui, YANG Chao, SHENG Xiao. Application Research of Improved XGBoost in Imbalanced Data Processing [J]. Computer Science, 2020, 47(6): 98-103.
[11] ZHAO Rui-jie, SHI Yong, ZHANG Han, LONG Jun, XUE Zhi. Webshell File Detection Method Based on TF-IDF [J]. Computer Science, 2020, 47(11A): 363-367.
[12] WANG Xiao-hui, ZHANG Liang, LI Jun-qing, SUN Yu-cui, TIAN Jie, HAN Rui-yi. Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest [J]. Computer Science, 2020, 47(11A): 454-458.
[13] CUI Yan-peng,SHI Ke-xing,HU Jian-wei. Research of Webshell Detection Method Based on XGBoost Algorithm [J]. Computer Science, 2018, 45(6A): 375-379.
[14] LEI Xue-mei, XIE Yi-tong. Improved XGBoostModel Based on Genetic Algorithm for Hypertension Recipe Recognition [J]. Computer Science, 2018, 45(6A): 476-481.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!