计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 80-89.doi: 10.11896/j.issn.1002-137X.2019.06.011

所属专题: 网络通信

• 网络与通信 • 上一篇    下一篇

基于闭合序列模式挖掘的未知协议格式推断方法

张洪泽1, 洪征1, 王辰2, 冯文博1, 吴礼发1   

  1. (中国解放军陆军工程大学指挥控制工程学院 南京210000)1
    (中国人民解放军32179部队 北京100000)2
  • 收稿日期:2018-05-22 发布日期:2019-06-24
  • 通讯作者: 洪 征(1979-),男,博士,副教授,主要研究方向为信息安全,E-mail:hz5215@163.com
  • 作者简介:张洪泽(1993-),男,硕士生,主要研究方向为信息安全;王 辰(1990-),男,硕士,研究实习员,主要研究方向为信息安全;冯文博(1994-),男,硕士生,主要研究方向为信息安全;吴礼发(1968-),男,博士,教授,主要研究方向为信息安全。
  • 基金资助:
    国家重点研发计划项目(2017YFB0802900)资助。

Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method

ZHANG Hong-ze1, HONG Zheng1, WANG Chen2, FENG Wen-bo1, WU Li-fa1   

  1. (Institute of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210000,China)1
    (Unit 32179 of PLA,Beijing 100000,China)2
  • Received:2018-05-22 Published:2019-06-24

摘要: 现有的基于网络流量的协议格式推断方法只提取报文关键字的平坦序列,并没有考虑报文关键字之间的顺序、并列与层次关系的结构特性;此外,报文样本中的噪音往往导致关键字识别的准确率偏低。文中提出了一种自动识别未知协议报文关键字并推断报文结构的方法。所提出的方法在收集未知协议实体程序通信报文的基础上,采用二阶段闭合模式挖掘策略对通信报文实施闭合序列模式挖掘,识别协议关键字并生成包含具有关键字组合关系的关键字序列;在此基础上提取关键字之间的顺序、并列以及层次关系,进而推断报文结构。协议关键字识别过程中采用设置最小支持度阈值的方法,可直接分析实际网络中包含噪音的报文样本,保证了关键字识别的准确率。实验结果表明,所提出的协议格式推断方法被应用于文本协议和二进制协议时,对报文关键字识别与报文结构推断均能取得理想的推断效果。

关键词: 报文结构推断, 闭合序列模式挖掘, 网络流量, 协议格式推断, 协议逆向工程

Abstract: Current protocol format inferring methods based on network traffic can only extract flat sequence of keywords,and they do not consider the structural features of message keywords,such as sequential,hierarchical and parallel relation between the keywords.Additionally,the noise in message samples always lead to low recognition accuracy of keywords.This paper presented a method to automatically identify keywords of unknown protocol message and infer the message structure.Based on the collected communication messages of the unknown protocol,the method implements two-phase closed sequential patterns to identify protocol keywords and generate keywords sequence with keyword composition relation,extract sequential,hierarchical and parallel relation of the keywords,and then infer messages structure inference.To ensure recognition accuracy of the keywords,the method analyzes message samples directly containing noise by setting minimum support in keywordsidentification procedure.Experimental results show that the proposed method performs well in keywords identification and message structure inference for both text protocol and binary protocol.

Key words: Closed sequential patterns mining, Message structure inference, Network traffic, Protocol format inference, Protocol reverse engineering

中图分类号: 

  • TP398.08
[1]吴礼发,洪征,潘璠.网络协议逆向分析及应用[M].北京:国防工业出版社,2016:10-13.
[2]DUCHÊNE J,GUERNIC C L,ALATA E,et al.State of the art of network protocol reverse engineering tools[J].Journal of Computer Virology and Hacking Techniques,2017,14(2):1-16.
[3]NARAYAN J,SHUKLA S K,CLANCY T C.A Survey of Automatic Protocol Reverse Engineering Tools[J].Acm Computing Surveys,2015,48(3):1-26.
[4]LUO J Z,YU S Z.Position-based automatic reverse engineering of network protocols[J].Journal of Network & Computer Applications,2013,36(3):1070-1077.
[5]ZHANG Z,ZHANG Z,LEE P P C,et al.ProWord:An unsupervised approach to protocol feature word extraction[C]∥INFOCOM,2014 Proceedings IEEE.IEEE,2014:1393-1401.
[6]CAI J,LUO J Z,LEI F.Analyzing network protocols of application layer using hidden semi-Markov model[J].Mathematical Problems in Engineering,2016,2016:1-14.
[7]LUO J Z,YU S Z,CAI J.Method for determining the lengths of protocol keywords based on maximum likelihood probability[J].Journal of Software,2016,37(6):119-128.(in Chinese)
罗建桢,余顺争,蔡君.基于最大似然概率的协议关键词长度确定方法[J].通信学报,2016,37(6):119-128.
[8]CUI W,KANNAN J,WANG H J.Discoverer:Automatic Protocol Reverse Engineering from Network Traces[C]∥Proedings of the 16th USENIX Security Symposium.Berkeley:ACM,2007:1-14.
[9]BEDDOEM M.Protocol Information Project.[EB/OL].(2004-10-5) [2018-01-20].http://www.4tphi.net/~awalters/PI/PI.html.
[10]BOSSERT G,HIET G,HENIN T.Modelling to simulate botnet command and control protocols for thee valuation of network intrusion detection systems∥2011 Conference on Network and Information Systems Security (SAR-SSI).La Rochelle:IEEE,2011:1-8. [11]KRUEGER T,KRAEMER N.PRISMA:Protocol Inspection and State Machine Analysis[J].Journal of the American Che-mical Society,2015,98(25):8101-8107.
[12]LIN Z,JIANG X,XU D,et al.Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution[C]∥Network and Distributed System Security Symposium,NDSS 2008.San Diego,California,USA,DBLP,2008.
[13]LIN Z,ZHANG X,XU D.Reverse Engineering Input Syntactic Structure from Program Execution and Its Applications[J].IEEE Transactions on Software Engineering,2010,36(5):688-703.
[14]LI M,YU Z S.Noise-Tolerant and Optimal Segmentation of Message Formats for Unknown Application-Layer Protocols[J].Journal of Software,2013,24(3):604-617.(in Chinese)
黎敏,余顺争.抗噪的未知应用层协议报文格式最佳分段方法[J].软件学报,2013,24(3):604-617.
[15]WANG Z,JIANG X,CUI W,et al.ReFormat:Automatic reverse engineering of encrypted messages[C]∥Proc of the 14th European Conf on Research in Computer Security[S.l.]:Springer,2010:200-215.
[16]CROKER D,OVERELL P.Augmented BNF for syntax specifications:ABNF [R/OL].http://tools.ietf.org/html/rfc4234.
[17]YAN X,HAN J,AFSHAR R.CloSpan:Mining Closed Sequential Patterns in Large Databases[C]∥Siam International Conference on Data Mining.San Francisco,CA,USA,DBLP,2003:166-177.
[18]LI W M,ZHANG A F,LIU J C,et al.An automatic network protocol fuzz testing and vulnerability discovering method[J].Chinese Journal of Computers,2011,34(2):242-255.(in Chinese)
李伟明,张爱芳,刘建财,等.网络协议的自动化模糊测试漏洞挖掘方法[J].计算机学报,2011,34(2):242-255.
[19]ZHANG J,WANG Y,YANG D.CCSpan:Mining closed conti-guous sequential patterns[J].Knowledge-Based Systems,2015,89:1-13.
[20]BROWN P F,DESOUZA P V,MERCER R L,et al.Class-based n-gram models of natural language[J].Computational Linguistics,1990,18(4):467-479.
[21]WANG J,HAN J.BIDE:Efficient Mining of Frequent Closed Sequences[C]∥International Conference on Data Engineering,2004.IEEE,2004:79-90.
[22]ADAMO J M.Data Mining for Association Rules and Sequential Patterns[M].Berlin:Springer,2001.
[23]FOURNIER-VIGER P,GOMARIZ A,GUENICHE T,et al. SPMF:a Java open-source pattern mining library[J].Journal of Machine Learning Research,2014,15(1):3389-3393.
[24]HOLMES G,DONKIN A,WITTEN I H.WEKA:a machine learning workbench[C]∥Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems,1994.IEEE,2002:357-361.
[25]NETRESEC.MACCDC traces[EB/OL].[2017-10-16].http://www.netresec.com/?page=MACCDC.
[26]MAHONEY M V,CHAN P K.An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection[C]∥Vinga G,ed.Proc.of the 6th Symp.on Recent Advances in Intrusion detection.Berlin,Heidelberg:Springer-Verlag,2003:220-237.
[27]KLEIN P N.Computing the edit-distance between unrooted ordered trees[C]∥Proceedings of the 6th annual European Symposium on Algorithms.Berlin:Springer-Verlag,1998:91-102.
[1] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[2] 向昌盛, 陈志刚.
面向海量数据的网络流量混沌预测模型
Chaotic Prediction Model of Network Traffic for Massive Data
计算机科学, 2021, 48(5): 289-293. https://doi.org/10.11896/jsjkx.200400056
[3] 杨超, 刘志.
基于TASEP模型的复杂网络级联故障研究
Study on Complex Network Cascading Failure Based on Totally Asymmetric Simple Exclusion Process Model
计算机科学, 2020, 47(9): 265-269. https://doi.org/10.11896/jsjkx.190700069
[4] 李毅豪, 洪征, 林培鸿, 冯文博.
基于粗糙集聚类的报文格式推断方法
Message Format Inference Method Based on Rough Set Clustering
计算机科学, 2020, 47(12): 319-326. https://doi.org/10.11896/jsjkx.191000193
[5] 姚立霜, 刘丹, 裴作飞, 王云锋.
基于EMD聚类的实时网络流量预测模型
Real-time Network Traffic Prediction Model Based on EMD and Clustering
计算机科学, 2020, 47(11A): 316-320. https://doi.org/10.11896/jsjkx.200100085
[6] 陈胜, 朱国胜, 祁小云, 雷龙飞, 吴善超, 吴梦宇.
基于深度神经网络的自定义用户异常行为检测
Custom User Anomaly Behavior Detection Based on Deep Neural Network
计算机科学, 2019, 46(11A): 442-445.
[7] 赵博, 张华峰, 张驯, 赵金雄, 孙碧颖, 袁晖.
基于EMD的电厂网络流量异常检测方法
EMD-based Anomaly Detection for Network Traffic in Power Plants
计算机科学, 2019, 46(11A): 464-468.
[8] 张涛,张颖江.
基于矢量空间重构的网络流量预测算法
Network Traffic Prediction Algorithm Based on Vector Space Reconstruction
计算机科学, 2016, 43(7): 111-114. https://doi.org/10.11896/j.issn.1002-137X.2016.07.019
[9] 陆兴华,陈平华.
基于定量递归联合熵特征重构的缓冲区流量预测算法
Traffic Prediction Algorithm in Buffer Based on Recurrence Quantification Union Entropy Feature Reconstruction
计算机科学, 2015, 42(4): 68-71. https://doi.org/10.11896/j.issn.1002-137X.2015.04.012
[10] 柏骏,夏靖波,赵小欢.
一种基于EMD和RVM的自相似网络流量预测模型
Prediction Model of Network Traffic Based on EMD and RVM
计算机科学, 2015, 42(1): 122-125. https://doi.org/10.11896/j.issn.1002-137X.2015.01.029
[11] 姚东,罗军勇,陈武平,尹美娟.
基于改进非广延熵特征提取的双随机森林实时入侵检测方法
Online Double Random Forests Intrusion Detection Based on Non-extensive Entropy Features Extraction
计算机科学, 2013, 40(12): 192-196.
[12] 李长荣,吴迪.
基于多核优化的网络协议解析类系统应用研究
Research on Application of Network Protocol Parsing Class System Based on Multi-core Optimization
计算机科学, 2013, 40(11): 85-88.
[13] 谢胜军,殷锋,周绪川.
基于递归率REC特征的网络流量相空间重构监测
Testing of Network Traffic Series in Reconstructed Phase Space Based on Recurrence Rate Feature
计算机科学, 2013, 40(11): 48-51.
[14] 陈卫民,陈志刚.
基于PSR-LSSVM的网络流量预测
Network Traffic Prediction Based on Phase Space Reconstruction and Least Square Support Vector Machine
计算机科学, 2012, 39(7): 92-95.
[15] 江逸楠,李瑞莹,黄宁,康锐.
网络可靠性评估方法综述
Survey on Network Reliability Evaluation Methods
计算机科学, 2012, 39(5): 9-13.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!