计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 319-326.doi: 10.11896/jsjkx.210800268

• 信息安全 • 上一篇    下一篇

基于概率模型的二进制协议字段划分方法

杨资集, 潘雁, 祝跃飞, 李小伟   

  1. 战略支援部队信息工程大学 郑州 450001
    数字工程与先进计算国家重点实验室 郑州 450001
  • 收稿日期:2021-08-30 修回日期:2021-12-03 出版日期:2022-10-15 发布日期:2022-10-13
  • 通讯作者: 祝跃飞(yfzhu17@sina.com)
  • 作者简介:(zijiyang@yeah.net)
  • 基金资助:
    国家重点研发计划(2019QY1300)

Field Segmentation of Binary Protocol Based on Probability Model

YANG Zi-ji, PAN Yan, ZHU Yue-fei, LI Xiao-wei   

  1. Strategic Support Force Information Engineering University,Zhengzhou 450001,China
    State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China
  • Received:2021-08-30 Revised:2021-12-03 Online:2022-10-15 Published:2022-10-13
  • About author:YANG Zi-ji,born in 1994,postgra-duate.His main research interests include protocol reverse engineering and network traffic classification.
    ZHU Yue-fei,born in 1962,Ph.D,professor,Ph.D supervisor.His main research interests include information security and public key cryptography.
  • Supported by:
    National Key R&D Program of China(2019QY1300).

摘要: 字段划分是协议格式推断的基础,协议格式推断的后续步骤,如报文结构识别、字段语义推断和字段取值约束判定,高度依赖于字段划分质量。二进制协议缺少字符编码和定界符,字段长度取值灵活,值域变化丰富,因此字段划分难度较大。针对相关研究存在的特征构造维度单一和判决规则简单等问题,提出了一种基于概率模型的二进制协议字段划分方法。以二进制协议报文为研究对象,从报文内在结构、报文间取值变化等维度构造字段边界约束关系,然后用概率的方式将各种约束组合在一起,利用因子图模型计算各个位置成为边界的概率,从中得出最有可能的字段边界。实验结果表明,相比传统方法,所提方法在二进制协议字段边界识别中精准度更高、鲁棒性更强。

关键词: 字段划分, 因子图, 概率模型, 协议逆向

Abstract: Field segmentation is the basis of protocol format inference.The subsequent steps of protocol format inference,such as message structure identification,field semantic inference and field value constraint inference,highly depend on the quality of field segmentation.Field segmentation of binary protocol is a big challenge because of the lack of character coding and delimitation,the flexibility of field length and the expansiveness of field range.To improve feature construction and decision rules,this paper proposes a novel binary protocol field segmentation method based on probability model.First,it constructs the field boundary constraint relationship of binary protocol messages from the internal structure of message and the value change between messages.Then,it combines various constraints in the way of probability,calculating the probability of each position becoming the boundary by factor graph model.Finally,the most likely field boundaries are obtained from probability.Experiments show that the proposed method can achieve more accurate and robust results than the traditional methods in binary protocol field segmentation.

Key words: Field segmentation, Factor graph, Probability model, Protocol reverse

中图分类号: 

  • TP393
[1]SOPHOS.The Dirty Secrets of Network Firewalls[EB/OL].(2018-04-10) [2021-06-10].https://secure2.sophos.com/en-us/medialibrary/Gated-Assets/white-papers/firewall-dirty-sec-rets-report.pdf.
[2]WU L F,HONG Z,PAN P.Network Protocol Reverse Analysis and Application[M].Beijing:National Defense Industry Press,2016:11-12.
[3]YAN X Y.Research and Implementation on the Key Technologies for Binary Private Protocol Reverse [D].Zhengzhou:Strategic Support Force Information Engineering University,2018.
[4]YE Y P,ZHANG Z,WANG F,et al.NETPLIER:Probabilistic Network Protocol Reverse Engineering from Message Traces[C]//28th Annual Network and Distributed System Security Symposium.2021.
[5]LIVSHITS B,NORI A V,RAJAMANI S K,et al.Merlin:Spe-cification inference for explicit information flow problems [J].ACM Sigplan Notices,2009,44(6):75-86.
[6]COZZIE A,STRATTON F,XUE H,et al.Digging for DataStructures[C]//OSDI.2008:255-266.
[7]BEDDOE M A.Network protocol analysis using bioinformatics algorithms [J].Toorcon,2004,26(6):1095-1098.
[8]CUI W,PAXSON V,WEAVER N,et al.Protocol-Independent Adaptive Replay of Application Dialog[C]//13th Annual Network and Distributed System Security Symposium.2006.
[9]CUI W,KANNAN J,WANG H J.Discoverer:Automatic Protocol Reverse Engineering from Network Traces[C]//USENIX Security Symposium.2007:1-14.
[10]KRUEGER T,KRÄMER N,RIECK K.ASAP:Automatic semantics-aware analysis of network payloads[C]//International Workshop on Privacy and Security Issues in Data Mining and Machine Learning.Springer,2010:50-63.
[11]PAN F,HONG Z,DU Y X,et al.Recursive Clustering Based Method for Message Structure Extraction [J].Journal of Sichuan University(Engineering Science Edition),2012,44(6):137-142.
[12]WANG Y,LI X,MENG J,et al.Biprominer:Automatic mining of binary protocol features[C]//2011 12th International Confe-rence on Parallel and Distributed Computing,Applications and Technologies.IEEE,2011:179-184.
[13]WANG Y,YUN X,SHAFIQ M Z,et al.A semantics aware approach to automated reverse engineering unknown protocols[C]//2012 20th IEEE International Conference on Network Protocols(ICNP).IEEE,2012:1-10.
[14]LI M,YU S Z.Noise-Tolerant and optimal segm-entation ofmessage formats for unknown application-layer protocols [J].Journal of Software,2013,24(3):604-617.
[15]BOSSERT G,GUIHÉRY F,HIET G.Towards automated protocol reverse engineering using semantic information[C]//Proceedings of the 9th ACM Symposium on Information,Computer and Communications Security.ACM,2014:51-62.
[16]ZHANG Z,ZHANG Z,LEE P P C,et al.ProWord:An unsupervised approach to protocol feature word extraction[C]//IEEE INFOCOM 2014-IEEE Conference on Computer Communications.IEEE,2014:1393-1401.
[17]BERMUDEZ I,TONGAONKAR A,ILIOFOTOU M,et al.Automatic protocol field inference for deeper protocol understan-ding[C]//2015 IFIP Networking Conference(IFIP Networking).IEEE,2015:1-9.
[18]BERMUDEZ I,TONGAONKAR A,ILIOFOTOU M,et al.Towards automatic protocol field inference [J].Computer Communications,2016,84:40-51.
[19]KLEBER S,KOPP H,KARGL F.{NEMESYS}:Network Message Syntax Reverse Engineering by Analysis of the Intrinsic Structure of Individual Messages[C]//12th {USENIX} Workshop on Offensive Technologies({WOOT} 18).2018.
[20]SUN F H,WANG S,ZHANG C R,et al.Unsupervised field segmentation of unknown protocol messages [J].Computer Communications,2019,146:121-130.
[21]JIANG D,LI C,MA L,et al.ABInfer:A Novel Field Boundaries Inference Approach for Protocol Reverse Engineering[C]//2020 IEEE 6th International Conference on Big Data Security on Cloud(Big Data Security),IEEE International Conference on High Performance and Smart Computing,(HPSC) and IEEE International Conference on Intelligent Data and Security(IDS).IEEE,2020:19-23.
[22]WANG X,LV K,LI B.IPART:an automatic protocol reverseengineering tool based on global voting expert for industrial protocols[J].International Journal of Parallel,Emergent and Distributed Systems,2020,35(3):376-395.
[23]LIU O,ZHENG B,SUN W,et al.A Data-driven Approach for Reverse Engineering Electric Power Protocols[J].Journal of Signal Processing Systems,2021,93(Jan):1-9.
[24]KATOH K,MISAWA K,KUMA K,et al.MAFFT:a novelmethod for rapid multiple sequence alignment based on fast Fourier transform [J].Nucleic Acids Research,2002,30(14):3059-3066.
[25]KATOH K,STANDLEY D M.MAFFT multiple sequencealignment software version 7:improvements in performance and usability [J].Molecular Biology and Evolution,2013,30(4):772-780.
[26]KLEBER S,MAILE L,KARGL F.Survey of protocol reverse engineering algorithms:Decomposition of tools for static traffic analysis [J].IEEE Communications Surveys & Tutorials,2018,21(1):526-561.
[27]SHLEZINGER N,FARSAD N,ELDAR Y C,et al.Data-driven factor graphs for deep symbol detection[C]//2020 IEEE International Symposium on Information Theory(ISIT).IEEE,2020:2682-2687.
[28]GIENGER A,SAWODNY O.Data-based Process Monitoringand Iterative Fault Diagnosis using Factor Graphs[C]//2020 IEEE International Conference on Industrial Technology(ICIT).IEEE,2020:35-40.
[29]KOTIANG S,ESLAMI A.Boolean factor graph model for biological systems:the yeast cell-cycle network[J].BMC bioinformatics,2021,22(1):1-27.
[30]LEANZA A,REINA G,BLANCO-CLARACO J L.A Factor-Graph-Based Approach to Vehicle Sideslip Angle Estimation[J].Sensors,2021,21(16):5409.
[31]ANKAN A,PANDA A.pgmpy:Probabilistic graphical models using python[C]//Proceedings of the 14th Python in Science Conference(SCIPY).2015:6-11.
[1] 赵会群, 吴凯锋.
一种大数据估价算法
Big Data Valuation Algorithm
计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156
[2] 李毅豪, 洪征, 林培鸿, 冯文博.
基于粗糙集聚类的报文格式推断方法
Message Format Inference Method Based on Rough Set Clustering
计算机科学, 2020, 47(12): 319-326. https://doi.org/10.11896/jsjkx.191000193
[3] 张宁, 石鸿伟, 郑朗, 单子豪, 吴浩翔.
基于PCANet的价值成长多因子选股模型
PCANet-based Multi-factor Stock Selection Model for Value Growth
计算机科学, 2020, 47(11A): 64-67. https://doi.org/10.11896/jsjkx.200300086
[4] 夏奴奴, 杨晋吉, 赵淦森, 莫晓珊.
基于概率模型的云辅助的轻量级无证书认证协议的形式化验证
Formal Verification of Cloud-aided Lightweight Certificateless Authentication Protocol Based on Probabilistic Model
计算机科学, 2019, 46(8): 206-211. https://doi.org/10.11896/j.issn.1002-137X.2019.08.034
[5] 张洪泽, 洪征, 王辰, 冯文博, 吴礼发.
基于闭合序列模式挖掘的未知协议格式推断方法
Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method
计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011
[6] 周女琪, 周宇.
基于概率模型检测的Web服务组合多目标验证
Multi-objective Verification of Web Service Composition Based on Probabilistic Model Checking
计算机科学, 2018, 45(8): 288-294. https://doi.org/10.11896/j.issn.1002-137X.2018.08.052
[7] 卞孝丽.
基于拉伸因子图的低复杂度贝叶斯稀疏信号算法研究
Low Complexity Bayesian Sparse Signal Algorithm Based on Stretched Factor Graph
计算机科学, 2018, 45(6A): 135-139.
[8] 刘爽, 魏欧, 郭宗豪.
基于概率模型检测和遗传算法的基因调控网络的无限范围优化控制
Infinite-horizon Optimal Control of Genetic Regulatory Networks Based on Probabilistic Model Checking and Genetic Algorithm
计算机科学, 2018, 45(10): 313-319. https://doi.org/10.11896/j.issn.1002-137X.2018.10.058
[9] 杜伊,何洋,洪玫.
概率模型检测在动态能耗管理中的应用
Application of Probabilistic Model Checking in Dynamic Power Management
计算机科学, 2018, 45(1): 261-266. https://doi.org/10.11896/j.issn.1002-137X.2018.01.046
[10] 刘付勇,高贤强,张著.
基于改进贝叶斯概率模型的推荐算法
Improved Bayesian Probabilistic Model Based Recommender System
计算机科学, 2017, 44(5): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2017.05.052
[11] 郭宗豪,魏欧.
使用模型检测解决概率布尔网络优化控制
Optimal Control of Probabilistic Boolean Networks Using Model Checking
计算机科学, 2017, 44(5): 193-198. https://doi.org/10.11896/j.issn.1002-137X.2017.05.035
[12] 杨蓓,周兰江,余正涛,刘丽佳.
半监督学习的老挝语词性标注方法研究
Research on Semi-supervised Learning Based Approach for Lao Part of Speech Tagging
计算机科学, 2016, 43(9): 103-106. https://doi.org/10.11896/j.issn.1002-137X.2016.09.019
[13] 张恒巍,韩继红,寇 广,卫 波.
云计算环境中服务动态选择算法研究
Research on Service Dynamic Selection Algorithm in Cloud Computing
计算机科学, 2015, 42(5): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2015.05.050
[14] 开金宇,缪淮扣,高洪皓.
Web服务计算组合流程QoS验证
Verification QoS of Web Services Compositional Processes
计算机科学, 2015, 42(12): 120-123.
[15] 余娟,贺昱曜,冯晓华.
改进的分布估计算法求解软硬件划分问题
Solving HW/SW Partitioning Problem by Improved Estimation of Distribution Algorithm
计算机科学, 2014, 41(9): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2014.09.054
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!