计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 319-326.doi: 10.11896/jsjkx.191000193

• 信息安全 • 上一篇    下一篇

基于粗糙集聚类的报文格式推断方法

李毅豪, 洪征, 林培鸿, 冯文博   

  1. 中国人民解放军陆军工程大学 南京 210000
  • 收稿日期:2019-10-29 修回日期:2020-04-12 发布日期:2020-12-17
  • 通讯作者: 洪征(hz5215@163.com)
  • 作者简介:enhancelee@foxmail.com
  • 基金资助:
    国家重点研发计划基金资助项目(2017YFB0802900)

Message Format Inference Method Based on Rough Set Clustering

LI Yi-hao, HONG Zheng, LIN Pei-hong, FENG Wen-bo   

  1. Army Engineering University of PLA Nanjing 210000,China
  • Received:2019-10-29 Revised:2020-04-12 Published:2020-12-17
  • About author:LI Yi-hao,born in 1996postgraduate.His main research interests includecyberspace security and protocol reverse engineering.
    HONG Zeng,born in 1979Ph.Dasso-ciate professor.His main research in-terests include cyberspace security and protocol reverse engineering.
  • Supported by:
    National Key R&D Program of China (2017YFB0802900).

摘要: 报文聚类是报文格式推断的基础现有的报文聚类方法大多以报文的全局相似性为聚类的标准这类聚类方法的准确率往往不高进而影响后续报文格式提取的准确率.针对这一问题文中提出了一种基于粗糙集聚类的报文格式推断方法该方法包括预处理、基于粗糙集的聚类、特征词提取和报文格式推断4个阶段.首先通过数据预处理分离出目标报文中的业务类报文和控制类报文;其次按照粗糙集理论中基于属性划分样本的方法对报文的统计特征进行聚类这种聚类方法能够准确获取报文序列的局部特征能够达到较好的聚类效果;然后根据长度、频率和位置特征来提取协议特征词;最后将协议特征词分为必选字段和可选字段并用它们来描述报文格式.实验结果表明该方法能够准确地获取协议的报文格式.

关键词: 报文格式推断, 报文聚类, 粗糙集理论, 特征词提取, 协议逆向工程

Abstract: Message clustering is an important procedure of message format inference.Most of the existing message clustering methods take message global similarity as the clustering criteria.Howeverthe accuracy of such clustering methods is often not high enoughand affects the accuracy of subsequent message format extraction.To solve this problemthis paper proposes a message format inference method based on rough set clusteringwhich consists of preprocessing phaserough-setbased clustering phasefeature word extraction phase and message format extraction phase.Firstlymessages are separated into business messages and control messages.Secondlymessages are clustered on the basis of position attributions according to rough set theoryand the clustering method considers local features of message sequences which ensures high accuracy of message clustering.Thirdlyprotocol feature words are extracted according to lengthfrequency and position characteristics.Finallyprotocol feature words are classified into mandatory fields and optional fieldsand they are used to represent message formats.Experimental results show that the proposed method can extract message formats precisely.

Key words: Feature word extraction, Message clustering, Messages format inference, Protocol reverse engineering, Rough set theory

中图分类号: 

  • TP398.08
[1] WU L F,HONG Z,PAN F.Network Protocol Reverse Analysis and Application[M]//National Defense Industry Press.Beijing,China,2016:11-12.
[2] DUCHÊNE J,GUERNIC C L,ALATA E,et al.Protocol Re-verse Engineering:Challenges and Obfuscation[C]//International Conference on Risks and Security of Internet and Systems.2017.
[3] BEDDOE M.Protocol information project[EB/OL].(2004-10-05)[2019-06-25].http://www.4tphi.net/~awalters/PI/PI.html.
[4] HE C,LIU F,ZENG X.Clustering Analysis of Unknown Proto-col Message Sequence[J].Communications Technology ,2017,50(2):277-286.
[5] LU Z Y,LI G S,SHEN Y Z,et al.Unknown protocol message clustering algorithm based on continuous features[J].Journal of Shandong University (Natural Science),2018,54(5):1-7.
[6] LI W M,ZHANG A F,LIU J C,et al.An Automatic Network Protocol Fuzz Testing and Vulnerability Discovering Method[J].Chinese Journal of Computers,2011,34(2):242-255.
[7] LI Y,LI Q,ZHAGN X.Outline Format Signature Construction Method Based on Separate Protocol Message[J].Journal of Information Engineering University,2018,19(2):134-139.
[8] YOUNG-HOON,GOO K S S,BYEONG-MIN CHAE,et al.Framework for precise protocol reverse engineering based on network traces[M]//2018 IEEE/IFIP Network Operations and Management Symposium.2018.
[9] BICHENG C,RENHUI L,YUNFEI Z,et al.Research on non-standard industrial control protocol formats reverse[J].Computer Technology and Its Applications,2018,44(4):126-129.
[10] YANG L,QING L,XIA Z.Automatic protocol format signature construction algorithm based on discrete series protocol message[J].Journal of Computer Applications,2017,37(4):954-969.
[11] ZHANG Z,ZHANG Z,LEE P P C,et al.Proword:An unsupervised approach to protocol feature word extraction[C]//IEEE Conference on Computer Communications.Toronto,Canada,2014:1393-1401.
[12] PAWLAK Z.Rough sets[J].International Journal of Computer and Information Sciences,1982,11(5):341-356.
[13] ZHANG H Z,HONG Z,WANG C,et al.Closed Sequential Patterns Mining Based Unknown Protocol Formal Inference Me-thod[J].Computer Science,2019,46(6):80-89.
[14] X Z,DING S Y,LI O,et al.Keyword Sequence Extraction Basedon Byte Entropy Iterative Segmentation[C]//presented at the
2017 3rd IEEE International Conference on Computer and Communications.Chengdu,China,2017.
[15] KUROSE J F,ROSS K W.Computer Networking:A Top-Down Approach Featuring the Internet[M].Addison-Wesley,2002.
[16] WRCCDC Public Archive traces[DB/OL].[2019-07-08].https://archive.wrccdc.org/pcaps/2019/.
[17] MCHUGH J.Testing intrusion detection systems:a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performedby lincoln laboratory[J].ACM Transactions on Informationand System Security,2000,3(4):262-294.
[18] BOSSERT G,HIET G,HENIN T.Modelling to simulate botnet command and control protocols for the valuation of network intrusion detection systems[C]//2011 Conference on Network and Information System Security (SAR-SSI).La Rochelle:IEEE,2011:1-8.
[19] KLEBER S,MAILE L,KARGL F.Survey of Protocol Reverse Engineering Algorithms:Decomposition of Tools for Static Traffic Analysis[J].IEEE Communications Surveys &Tuto-rials,2019,21(1):526-561.
[20] NARAYAN J,SHUKLA S K,CLANCY T C.A Survey of Automatic Protocol Reverse Engineering Tools[J].Acm Computing Surveys,2015,48(3):1-26.
[1] 王生武,陈红梅.
基于粗糙集和改进鲸鱼优化算法的特征选择方法
Feature Selection Method Based on Rough Sets and Improved Whale Optimization Algorithm
计算机科学, 2020, 47(2): 44-50. https://doi.org/10.11896/jsjkx.181202285
[2] 张洪泽, 洪征, 王辰, 冯文博, 吴礼发.
基于闭合序列模式挖掘的未知协议格式推断方法
Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method
计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011
[3] 焦娜.
基于分割策略的特征选择算法
Feature Selection Algorithm Based on Segmentation Strategy
计算机科学, 2018, 45(10): 43-46. https://doi.org/10.11896/j.issn.1002-137X.2018.10.008
[4] 焦娜.
相容关系下的分割知识约简算法的研究
Research on Vertical Segmentation Knowledge Reduction Algorithm Based on Tolerance Rough Set Theory
计算机科学, 2016, 43(1): 49-52. https://doi.org/10.11896/j.issn.1002-137X.2016.01.011
[5] 焦 娜.
基于差异关系的变精度粗糙集知识约简算法研究
Research on Knowledge Reduction Algorithm Based on Variable Precision Tolerance Rough Set Theory
计算机科学, 2015, 42(5): 265-269. https://doi.org/10.11896/j.issn.1002-137X.2015.05.053
[6] 王永生,郑雪峰,锁延锋.
一种基于信息粒度的动态属性约简求解算法
Dynamic Algorithm for Computing Attribute Reduction Based on Information Granularity
计算机科学, 2015, 42(4): 213-216. https://doi.org/10.11896/j.issn.1002-137X.2015.04.043
[7] 衷锦仪,叶东毅.
基于模糊数风险最小化的拓展决策粗糙集模型
Extended Decision-theoretic Rough Set Models Based on Fuzzy Minimum Cost
计算机科学, 2014, 41(3): 50-54.
[8] 刘洋,张卓,周清雷.
医疗健康数据的模糊粗糙集规则挖掘方法研究
Research on Fuzzy Rough Sets Based Rule Induction Methods for Healthcare Data
计算机科学, 2014, 41(12): 164-167. https://doi.org/10.11896/j.issn.1002-137X.2014.12.035
[9] 韦碧鹏,吕跃进,李金海,李大林.
不完备不协调序决策系统的属性约简与规则提取
Attribute Reduction and Rule Acquisition in Incomplete and Inconsistent Ordered Decision Systems
计算机科学, 2013, 40(Z11): 160-164.
[10] 钱文彬,杨炳儒,徐章艳,谢永红.
基于差别矩阵的不一致决策表规则获取算法
Rule Extraction Algorithm Based on Discernibility Matrix in Inconsistent Decision Table
计算机科学, 2013, 40(6): 215-218.
[11] 钱文彬,杨炳儒,徐章艳,李慧.
一种高效的核属性动态更新算法
Efficient Dynamic Updating Algorithm of the Computation of Core in Decision Table
计算机科学, 2012, 39(7): 210-214.
[12] 刘盾,李天瑞,李华雄.
区间决策粗糙集
Interval-valued Decision-theoretic Rough Sets
计算机科学, 2012, 39(7): 178-181.
[13] 刘 盾,李天瑞,梁德翠.
模糊数决策粗糙集
Fuzzy Decision-theoretic Rough Sets
计算机科学, 2012, 39(12): 25-29.
[14] 张贤勇,熊方,莫智文,程伟.
变精度上近似与程度下近似粗糙集模型的正域及其算法
Positive Region and its Algorithms in Rough Set Model of Variable Precision Upper Approximation and Grade Lower Approximation
计算机科学, 2012, 39(1): 248-251.
[15] 吴伟,李楠,郭茂耘.
粗糙集及PSO优化BP网络的故障诊断研究
Fault Diagnosis Research by Rough Set Theory and the PSo-BP Neural Network
计算机科学, 2011, 38(11): 200-203.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!