Computer Science ›› 2022, Vol. 49 ›› Issue (10): 319-326.doi: 10.11896/jsjkx.210800268

• Information Security • Previous Articles     Next Articles

Field Segmentation of Binary Protocol Based on Probability Model

YANG Zi-ji, PAN Yan, ZHU Yue-fei, LI Xiao-wei   

  1. Strategic Support Force Information Engineering University,Zhengzhou 450001,China
    State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China
  • Received:2021-08-30 Revised:2021-12-03 Online:2022-10-15 Published:2022-10-13
  • About author:YANG Zi-ji,born in 1994,postgra-duate.His main research interests include protocol reverse engineering and network traffic classification.
    ZHU Yue-fei,born in 1962,Ph.D,professor,Ph.D supervisor.His main research interests include information security and public key cryptography.
  • Supported by:
    National Key R&D Program of China(2019QY1300).

Abstract: Field segmentation is the basis of protocol format inference.The subsequent steps of protocol format inference,such as message structure identification,field semantic inference and field value constraint inference,highly depend on the quality of field segmentation.Field segmentation of binary protocol is a big challenge because of the lack of character coding and delimitation,the flexibility of field length and the expansiveness of field range.To improve feature construction and decision rules,this paper proposes a novel binary protocol field segmentation method based on probability model.First,it constructs the field boundary constraint relationship of binary protocol messages from the internal structure of message and the value change between messages.Then,it combines various constraints in the way of probability,calculating the probability of each position becoming the boundary by factor graph model.Finally,the most likely field boundaries are obtained from probability.Experiments show that the proposed method can achieve more accurate and robust results than the traditional methods in binary protocol field segmentation.

Key words: Field segmentation, Factor graph, Probability model, Protocol reverse

CLC Number: 

  • TP393
[1]SOPHOS.The Dirty Secrets of Network Firewalls[EB/OL].(2018-04-10) [2021-06-10].https://secure2.sophos.com/en-us/medialibrary/Gated-Assets/white-papers/firewall-dirty-sec-rets-report.pdf.
[2]WU L F,HONG Z,PAN P.Network Protocol Reverse Analysis and Application[M].Beijing:National Defense Industry Press,2016:11-12.
[3]YAN X Y.Research and Implementation on the Key Technologies for Binary Private Protocol Reverse [D].Zhengzhou:Strategic Support Force Information Engineering University,2018.
[4]YE Y P,ZHANG Z,WANG F,et al.NETPLIER:Probabilistic Network Protocol Reverse Engineering from Message Traces[C]//28th Annual Network and Distributed System Security Symposium.2021.
[5]LIVSHITS B,NORI A V,RAJAMANI S K,et al.Merlin:Spe-cification inference for explicit information flow problems [J].ACM Sigplan Notices,2009,44(6):75-86.
[6]COZZIE A,STRATTON F,XUE H,et al.Digging for DataStructures[C]//OSDI.2008:255-266.
[7]BEDDOE M A.Network protocol analysis using bioinformatics algorithms [J].Toorcon,2004,26(6):1095-1098.
[8]CUI W,PAXSON V,WEAVER N,et al.Protocol-Independent Adaptive Replay of Application Dialog[C]//13th Annual Network and Distributed System Security Symposium.2006.
[9]CUI W,KANNAN J,WANG H J.Discoverer:Automatic Protocol Reverse Engineering from Network Traces[C]//USENIX Security Symposium.2007:1-14.
[10]KRUEGER T,KRÄMER N,RIECK K.ASAP:Automatic semantics-aware analysis of network payloads[C]//International Workshop on Privacy and Security Issues in Data Mining and Machine Learning.Springer,2010:50-63.
[11]PAN F,HONG Z,DU Y X,et al.Recursive Clustering Based Method for Message Structure Extraction [J].Journal of Sichuan University(Engineering Science Edition),2012,44(6):137-142.
[12]WANG Y,LI X,MENG J,et al.Biprominer:Automatic mining of binary protocol features[C]//2011 12th International Confe-rence on Parallel and Distributed Computing,Applications and Technologies.IEEE,2011:179-184.
[13]WANG Y,YUN X,SHAFIQ M Z,et al.A semantics aware approach to automated reverse engineering unknown protocols[C]//2012 20th IEEE International Conference on Network Protocols(ICNP).IEEE,2012:1-10.
[14]LI M,YU S Z.Noise-Tolerant and optimal segm-entation ofmessage formats for unknown application-layer protocols [J].Journal of Software,2013,24(3):604-617.
[15]BOSSERT G,GUIHÉRY F,HIET G.Towards automated protocol reverse engineering using semantic information[C]//Proceedings of the 9th ACM Symposium on Information,Computer and Communications Security.ACM,2014:51-62.
[16]ZHANG Z,ZHANG Z,LEE P P C,et al.ProWord:An unsupervised approach to protocol feature word extraction[C]//IEEE INFOCOM 2014-IEEE Conference on Computer Communications.IEEE,2014:1393-1401.
[17]BERMUDEZ I,TONGAONKAR A,ILIOFOTOU M,et al.Automatic protocol field inference for deeper protocol understan-ding[C]//2015 IFIP Networking Conference(IFIP Networking).IEEE,2015:1-9.
[18]BERMUDEZ I,TONGAONKAR A,ILIOFOTOU M,et al.Towards automatic protocol field inference [J].Computer Communications,2016,84:40-51.
[19]KLEBER S,KOPP H,KARGL F.{NEMESYS}:Network Message Syntax Reverse Engineering by Analysis of the Intrinsic Structure of Individual Messages[C]//12th {USENIX} Workshop on Offensive Technologies({WOOT} 18).2018.
[20]SUN F H,WANG S,ZHANG C R,et al.Unsupervised field segmentation of unknown protocol messages [J].Computer Communications,2019,146:121-130.
[21]JIANG D,LI C,MA L,et al.ABInfer:A Novel Field Boundaries Inference Approach for Protocol Reverse Engineering[C]//2020 IEEE 6th International Conference on Big Data Security on Cloud(Big Data Security),IEEE International Conference on High Performance and Smart Computing,(HPSC) and IEEE International Conference on Intelligent Data and Security(IDS).IEEE,2020:19-23.
[22]WANG X,LV K,LI B.IPART:an automatic protocol reverseengineering tool based on global voting expert for industrial protocols[J].International Journal of Parallel,Emergent and Distributed Systems,2020,35(3):376-395.
[23]LIU O,ZHENG B,SUN W,et al.A Data-driven Approach for Reverse Engineering Electric Power Protocols[J].Journal of Signal Processing Systems,2021,93(Jan):1-9.
[24]KATOH K,MISAWA K,KUMA K,et al.MAFFT:a novelmethod for rapid multiple sequence alignment based on fast Fourier transform [J].Nucleic Acids Research,2002,30(14):3059-3066.
[25]KATOH K,STANDLEY D M.MAFFT multiple sequencealignment software version 7:improvements in performance and usability [J].Molecular Biology and Evolution,2013,30(4):772-780.
[26]KLEBER S,MAILE L,KARGL F.Survey of protocol reverse engineering algorithms:Decomposition of tools for static traffic analysis [J].IEEE Communications Surveys & Tutorials,2018,21(1):526-561.
[27]SHLEZINGER N,FARSAD N,ELDAR Y C,et al.Data-driven factor graphs for deep symbol detection[C]//2020 IEEE International Symposium on Information Theory(ISIT).IEEE,2020:2682-2687.
[28]GIENGER A,SAWODNY O.Data-based Process Monitoringand Iterative Fault Diagnosis using Factor Graphs[C]//2020 IEEE International Conference on Industrial Technology(ICIT).IEEE,2020:35-40.
[29]KOTIANG S,ESLAMI A.Boolean factor graph model for biological systems:the yeast cell-cycle network[J].BMC bioinformatics,2021,22(1):1-27.
[30]LEANZA A,REINA G,BLANCO-CLARACO J L.A Factor-Graph-Based Approach to Vehicle Sideslip Angle Estimation[J].Sensors,2021,21(16):5409.
[31]ANKAN A,PANDA A.pgmpy:Probabilistic graphical models using python[C]//Proceedings of the 14th Python in Science Conference(SCIPY).2015:6-11.
[1] ZHAO Hui-qun, WU Kai-feng. Big Data Valuation Algorithm [J]. Computer Science, 2020, 47(9): 110-116.
[2] LI Yi-hao, HONG Zheng, LIN Pei-hong, FENG Wen-bo. Message Format Inference Method Based on Rough Set Clustering [J]. Computer Science, 2020, 47(12): 319-326.
[3] ZHANG Hong-ze, HONG Zheng, WANG Chen, FENG Wen-bo, WU Li-fa. Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method [J]. Computer Science, 2019, 46(6): 80-89.
[4] BIAN Xiao-li. Low Complexity Bayesian Sparse Signal Algorithm Based on Stretched Factor Graph [J]. Computer Science, 2018, 45(6A): 135-139.
[5] YANG Bei, ZHOU Lan-jiang, YU Zheng-tao and LIU Li-jia. Research on Semi-supervised Learning Based Approach for Lao Part of Speech Tagging [J]. Computer Science, 2016, 43(9): 103-106.
[6] YU Juan,HE Yu-yao and FENG Xiao-hua. Solving HW/SW Partitioning Problem by Improved Estimation of Distribution Algorithm [J]. Computer Science, 2014, 41(9): 285-289.
[7] ZHENG Ying and LI Da-hui. Research on Information Extration Model for Microblog Content [J]. Computer Science, 2014, 41(2): 270-275.
[8] LIANG Jia-rong,HUA Ren-jie. Reliability Analysis of star Network with Link Failures [J]. Computer Science, 2010, 37(6): 106-110.
[9] TONG Sheng-Qin ,DENG Yong-Qiang(Department of Electronics and Information, Huazhong University of Science and Technology, Wuhan 430074). [J]. Computer Science, 2007, 34(11): 41-43.
[10] . [J]. Computer Science, 2006, 33(7): 16-19.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!