计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 556-560.doi: 10.11896/JsJkx.190900035

• 交叉&应用 • 上一篇    下一篇

私有二进制协议中变长域的格式挖掘方法

徐旭东, 张志祥, 张献   

  1. 海军工程大学电子工程学院 武汉 430033
  • 发布日期:2020-07-07
  • 通讯作者: 张志祥(hgzzx@sina.com)
  • 作者简介:qddxxd@163.com

Format Mining Method of Variable-length Domain in Private Binary Protocol

XU Xu-dong, ZHANG Zhi-xiang and ZHANG Xian   

  1. College of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China
  • Published:2020-07-07
  • About author:XU Xu-dong, born in 1995, candidate.His main and research interests include software quality assurance, protocol reverse, etc.
    ZHANG Zhi-xiang, born in 1967, Ph.D associate, professor.His main research interests include software quality assu-rance, artificial intelligence, etc.

摘要: 协议逆向工程是模糊测试领域的重要环节之一。针对目前私有二进制协议中对于变长域的格式挖掘工作没有很好的系统方法和变长域的关键词域边界挖掘不理想的问题,提出对变长域中的长度域和关键词域分别处理的方法。对于长度域,利用渐进多序列对比的结果,使用迭代窗的挖掘方式分别挖掘全局长度域和局部长度域,在SNMP协议构造的数据集上进行测试,具有较好的边界挖掘效果;对于关键词域,针对已有方法中无法挖掘关键词域前边界的问题,改进投票专家算法,增加反向查找树,能同时挖掘出关键词域的前边界和后边界,在ICMP和HTTP协议构造的数据集上测试,相对于传统的投票专家算法有较大改进。

关键词: 迭代窗, 二进制协议, 渐进多序列对比, 投票专家算法, 协议格式挖掘

Abstract: Protocol reverse engineering is one of the important steps in fuzzy test field.Aiming at the problem that there is no good systematic method for the format mining of variable-length domain and the mining of keyword domain boundary of variable-length domain is not ideal in the private binary protocol,a method to deal with the length domain and keyword domain separately in variable-length domain is proposed.For the length domain,using the results of progressive multi-sequence alignment,the global length domain and the local length domain are respectively mined by using the iterative window mining method,and test on the data set constructed by SNMP protocol shows it has a good boundary mining effect.For the keyword domain,in view of the problem that the former boundary of the keyword domain cannot be mined with the existing methods,by improving the voting expert algorithm,and adding the reverse search tree,the front the back boundaries of the keyword domain can be mined at the same time.Test on the data set constructed by ICMP and HTTP protocol show that,there is greatl improvement compared with the traditional voting expert algorithm.

Key words: Binary protocol, Iterative window, Progressive multiple sequence comparison, Protocol format mining, Voting expert algorithm

中图分类号: 

  • TP393
[1] 黄影,邹颀伟,范科峰.基于Fuzzing测试的工控网络协议漏洞挖掘技术.通信学报,2018,39(S2):185-192.
[2] 张钊,温巧燕,唐文.协议规范挖掘研究综述.计算机工程与应用,2013,49(9):1-9.
[3] 钟晓欢.基于文本类型的应用层协议逆向解析技术的研究.北京邮电大学,2014.
[4] 李美剑.基于动态二进制分析的协议模型逆向提取及其应用研究.长沙:国防科学技术大学,2014.
[5] 罗建桢,余顺争,蔡君.基于最大似然概率的协议关键词长度确定方法.通信学报,2016,37(6):119-128.
[6] BOSSERT G,FRDRIC G,HIET G.Towards automated protocol reverse engineering using semantic information//Acm Symposium on Information.ACM,2014.
[7] LI H,SHUAI B,WANG J,et al.IEEE 2015 11th International Conference on Computational Intelligence and Security (CIS)-Shenzhen,China (2015.12.19-2015.12.20)//2015 11th International Conference on Computational Intelligence and Secu-rity (CIS)-Protocol Reverse Engineering Using LDA and Association Analysis.2015:312-316.
[8] TAO S,YU H,LI Q.Bit-oriented format extraction approach for automatic binary protocol reverse engineering.Iet Communications,2016,10(6):709-716.
[9] 闫小勇,李青.基于最佳路径搜索的二进制协议格式关键词边界确定方法.计算机应用,2018,38(6):206-211.
[10] 侯方杰,王雷,王嵩,等.基于位置的自动化网络流协议逆向分析方法.计算机工程,2019,45(5):84-87.
[11] ZHANG Z,ZHANG Z,LEE P P C,et al.ProWord:An unsupervised approach to protocol feature word extraction//Infocom,IEEE.2014.
[12] COHEN P,ADAMS N.An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes//International Conference on Advances in Intelligent Data Analysis.Springer-Verlag,2001.
[13] COHEN P,ADAMS N,HEERINGA B.Voting experts:An unsupervised algorithm for segmenting sequences.IOS Press,2007.
[14] HERINGA J.Needleman-WunschAlgorithm//Encyclopedic Dictionary of Genetics,Genomics,and Proteomics.2004.
[15] HUNG C L,LIN Y S,LIN C Y,et al.CUDA ClustalW:An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs.Computational Biology & Chemistry,2015,58:62-68.
[1] 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君.
基于最长连续间隔的未知二进制协议格式推断
Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval
计算机科学, 2020, 47(8): 313-318. https://doi.org/10.11896/jsjkx.190700031
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!