计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 319-326.doi: 10.11896/jsjkx.210800268
杨资集, 潘雁, 祝跃飞, 李小伟
YANG Zi-ji, PAN Yan, ZHU Yue-fei, LI Xiao-wei
摘要: 字段划分是协议格式推断的基础,协议格式推断的后续步骤,如报文结构识别、字段语义推断和字段取值约束判定,高度依赖于字段划分质量。二进制协议缺少字符编码和定界符,字段长度取值灵活,值域变化丰富,因此字段划分难度较大。针对相关研究存在的特征构造维度单一和判决规则简单等问题,提出了一种基于概率模型的二进制协议字段划分方法。以二进制协议报文为研究对象,从报文内在结构、报文间取值变化等维度构造字段边界约束关系,然后用概率的方式将各种约束组合在一起,利用因子图模型计算各个位置成为边界的概率,从中得出最有可能的字段边界。实验结果表明,相比传统方法,所提方法在二进制协议字段边界识别中精准度更高、鲁棒性更强。
中图分类号:
[1]SOPHOS.The Dirty Secrets of Network Firewalls[EB/OL].(2018-04-10) [2021-06-10].https://secure2.sophos.com/en-us/medialibrary/Gated-Assets/white-papers/firewall-dirty-sec-rets-report.pdf. [2]WU L F,HONG Z,PAN P.Network Protocol Reverse Analysis and Application[M].Beijing:National Defense Industry Press,2016:11-12. [3]YAN X Y.Research and Implementation on the Key Technologies for Binary Private Protocol Reverse [D].Zhengzhou:Strategic Support Force Information Engineering University,2018. [4]YE Y P,ZHANG Z,WANG F,et al.NETPLIER:Probabilistic Network Protocol Reverse Engineering from Message Traces[C]//28th Annual Network and Distributed System Security Symposium.2021. [5]LIVSHITS B,NORI A V,RAJAMANI S K,et al.Merlin:Spe-cification inference for explicit information flow problems [J].ACM Sigplan Notices,2009,44(6):75-86. [6]COZZIE A,STRATTON F,XUE H,et al.Digging for DataStructures[C]//OSDI.2008:255-266. [7]BEDDOE M A.Network protocol analysis using bioinformatics algorithms [J].Toorcon,2004,26(6):1095-1098. [8]CUI W,PAXSON V,WEAVER N,et al.Protocol-Independent Adaptive Replay of Application Dialog[C]//13th Annual Network and Distributed System Security Symposium.2006. [9]CUI W,KANNAN J,WANG H J.Discoverer:Automatic Protocol Reverse Engineering from Network Traces[C]//USENIX Security Symposium.2007:1-14. [10]KRUEGER T,KRÄMER N,RIECK K.ASAP:Automatic semantics-aware analysis of network payloads[C]//International Workshop on Privacy and Security Issues in Data Mining and Machine Learning.Springer,2010:50-63. [11]PAN F,HONG Z,DU Y X,et al.Recursive Clustering Based Method for Message Structure Extraction [J].Journal of Sichuan University(Engineering Science Edition),2012,44(6):137-142. [12]WANG Y,LI X,MENG J,et al.Biprominer:Automatic mining of binary protocol features[C]//2011 12th International Confe-rence on Parallel and Distributed Computing,Applications and Technologies.IEEE,2011:179-184. [13]WANG Y,YUN X,SHAFIQ M Z,et al.A semantics aware approach to automated reverse engineering unknown protocols[C]//2012 20th IEEE International Conference on Network Protocols(ICNP).IEEE,2012:1-10. [14]LI M,YU S Z.Noise-Tolerant and optimal segm-entation ofmessage formats for unknown application-layer protocols [J].Journal of Software,2013,24(3):604-617. [15]BOSSERT G,GUIHÉRY F,HIET G.Towards automated protocol reverse engineering using semantic information[C]//Proceedings of the 9th ACM Symposium on Information,Computer and Communications Security.ACM,2014:51-62. [16]ZHANG Z,ZHANG Z,LEE P P C,et al.ProWord:An unsupervised approach to protocol feature word extraction[C]//IEEE INFOCOM 2014-IEEE Conference on Computer Communications.IEEE,2014:1393-1401. [17]BERMUDEZ I,TONGAONKAR A,ILIOFOTOU M,et al.Automatic protocol field inference for deeper protocol understan-ding[C]//2015 IFIP Networking Conference(IFIP Networking).IEEE,2015:1-9. [18]BERMUDEZ I,TONGAONKAR A,ILIOFOTOU M,et al.Towards automatic protocol field inference [J].Computer Communications,2016,84:40-51. [19]KLEBER S,KOPP H,KARGL F.{NEMESYS}:Network Message Syntax Reverse Engineering by Analysis of the Intrinsic Structure of Individual Messages[C]//12th {USENIX} Workshop on Offensive Technologies({WOOT} 18).2018. [20]SUN F H,WANG S,ZHANG C R,et al.Unsupervised field segmentation of unknown protocol messages [J].Computer Communications,2019,146:121-130. [21]JIANG D,LI C,MA L,et al.ABInfer:A Novel Field Boundaries Inference Approach for Protocol Reverse Engineering[C]//2020 IEEE 6th International Conference on Big Data Security on Cloud(Big Data Security),IEEE International Conference on High Performance and Smart Computing,(HPSC) and IEEE International Conference on Intelligent Data and Security(IDS).IEEE,2020:19-23. [22]WANG X,LV K,LI B.IPART:an automatic protocol reverseengineering tool based on global voting expert for industrial protocols[J].International Journal of Parallel,Emergent and Distributed Systems,2020,35(3):376-395. [23]LIU O,ZHENG B,SUN W,et al.A Data-driven Approach for Reverse Engineering Electric Power Protocols[J].Journal of Signal Processing Systems,2021,93(Jan):1-9. [24]KATOH K,MISAWA K,KUMA K,et al.MAFFT:a novelmethod for rapid multiple sequence alignment based on fast Fourier transform [J].Nucleic Acids Research,2002,30(14):3059-3066. [25]KATOH K,STANDLEY D M.MAFFT multiple sequencealignment software version 7:improvements in performance and usability [J].Molecular Biology and Evolution,2013,30(4):772-780. [26]KLEBER S,MAILE L,KARGL F.Survey of protocol reverse engineering algorithms:Decomposition of tools for static traffic analysis [J].IEEE Communications Surveys & Tutorials,2018,21(1):526-561. [27]SHLEZINGER N,FARSAD N,ELDAR Y C,et al.Data-driven factor graphs for deep symbol detection[C]//2020 IEEE International Symposium on Information Theory(ISIT).IEEE,2020:2682-2687. [28]GIENGER A,SAWODNY O.Data-based Process Monitoringand Iterative Fault Diagnosis using Factor Graphs[C]//2020 IEEE International Conference on Industrial Technology(ICIT).IEEE,2020:35-40. [29]KOTIANG S,ESLAMI A.Boolean factor graph model for biological systems:the yeast cell-cycle network[J].BMC bioinformatics,2021,22(1):1-27. [30]LEANZA A,REINA G,BLANCO-CLARACO J L.A Factor-Graph-Based Approach to Vehicle Sideslip Angle Estimation[J].Sensors,2021,21(16):5409. [31]ANKAN A,PANDA A.pgmpy:Probabilistic graphical models using python[C]//Proceedings of the 14th Python in Science Conference(SCIPY).2015:6-11. |
[1] | 赵会群, 吴凯锋. 一种大数据估价算法 Big Data Valuation Algorithm 计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156 |
[2] | 李毅豪, 洪征, 林培鸿, 冯文博. 基于粗糙集聚类的报文格式推断方法 Message Format Inference Method Based on Rough Set Clustering 计算机科学, 2020, 47(12): 319-326. https://doi.org/10.11896/jsjkx.191000193 |
[3] | 张宁, 石鸿伟, 郑朗, 单子豪, 吴浩翔. 基于PCANet的价值成长多因子选股模型 PCANet-based Multi-factor Stock Selection Model for Value Growth 计算机科学, 2020, 47(11A): 64-67. https://doi.org/10.11896/jsjkx.200300086 |
[4] | 夏奴奴, 杨晋吉, 赵淦森, 莫晓珊. 基于概率模型的云辅助的轻量级无证书认证协议的形式化验证 Formal Verification of Cloud-aided Lightweight Certificateless Authentication Protocol Based on Probabilistic Model 计算机科学, 2019, 46(8): 206-211. https://doi.org/10.11896/j.issn.1002-137X.2019.08.034 |
[5] | 张洪泽, 洪征, 王辰, 冯文博, 吴礼发. 基于闭合序列模式挖掘的未知协议格式推断方法 Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method 计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011 |
[6] | 周女琪, 周宇. 基于概率模型检测的Web服务组合多目标验证 Multi-objective Verification of Web Service Composition Based on Probabilistic Model Checking 计算机科学, 2018, 45(8): 288-294. https://doi.org/10.11896/j.issn.1002-137X.2018.08.052 |
[7] | 卞孝丽. 基于拉伸因子图的低复杂度贝叶斯稀疏信号算法研究 Low Complexity Bayesian Sparse Signal Algorithm Based on Stretched Factor Graph 计算机科学, 2018, 45(6A): 135-139. |
[8] | 刘爽, 魏欧, 郭宗豪. 基于概率模型检测和遗传算法的基因调控网络的无限范围优化控制 Infinite-horizon Optimal Control of Genetic Regulatory Networks Based on Probabilistic Model Checking and Genetic Algorithm 计算机科学, 2018, 45(10): 313-319. https://doi.org/10.11896/j.issn.1002-137X.2018.10.058 |
[9] | 杜伊,何洋,洪玫. 概率模型检测在动态能耗管理中的应用 Application of Probabilistic Model Checking in Dynamic Power Management 计算机科学, 2018, 45(1): 261-266. https://doi.org/10.11896/j.issn.1002-137X.2018.01.046 |
[10] | 刘付勇,高贤强,张著. 基于改进贝叶斯概率模型的推荐算法 Improved Bayesian Probabilistic Model Based Recommender System 计算机科学, 2017, 44(5): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2017.05.052 |
[11] | 郭宗豪,魏欧. 使用模型检测解决概率布尔网络优化控制 Optimal Control of Probabilistic Boolean Networks Using Model Checking 计算机科学, 2017, 44(5): 193-198. https://doi.org/10.11896/j.issn.1002-137X.2017.05.035 |
[12] | 杨蓓,周兰江,余正涛,刘丽佳. 半监督学习的老挝语词性标注方法研究 Research on Semi-supervised Learning Based Approach for Lao Part of Speech Tagging 计算机科学, 2016, 43(9): 103-106. https://doi.org/10.11896/j.issn.1002-137X.2016.09.019 |
[13] | 张恒巍,韩继红,寇 广,卫 波. 云计算环境中服务动态选择算法研究 Research on Service Dynamic Selection Algorithm in Cloud Computing 计算机科学, 2015, 42(5): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2015.05.050 |
[14] | 开金宇,缪淮扣,高洪皓. Web服务计算组合流程QoS验证 Verification QoS of Web Services Compositional Processes 计算机科学, 2015, 42(12): 120-123. |
[15] | 余娟,贺昱曜,冯晓华. 改进的分布估计算法求解软硬件划分问题 Solving HW/SW Partitioning Problem by Improved Estimation of Distribution Algorithm 计算机科学, 2014, 41(9): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2014.09.054 |
|