Computer Science ›› 2019, Vol. 46 ›› Issue (6): 80-89.doi: 10.11896/j.issn.1002-137X.2019.06.011

Special Issue: Network and communication

Previous Articles     Next Articles

Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method

ZHANG Hong-ze1, HONG Zheng1, WANG Chen2, FENG Wen-bo1, WU Li-fa1   

  1. (Institute of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210000,China)1
    (Unit 32179 of PLA,Beijing 100000,China)2
  • Received:2018-05-22 Published:2019-06-24

Abstract: Current protocol format inferring methods based on network traffic can only extract flat sequence of keywords,and they do not consider the structural features of message keywords,such as sequential,hierarchical and parallel relation between the keywords.Additionally,the noise in message samples always lead to low recognition accuracy of keywords.This paper presented a method to automatically identify keywords of unknown protocol message and infer the message structure.Based on the collected communication messages of the unknown protocol,the method implements two-phase closed sequential patterns to identify protocol keywords and generate keywords sequence with keyword composition relation,extract sequential,hierarchical and parallel relation of the keywords,and then infer messages structure inference.To ensure recognition accuracy of the keywords,the method analyzes message samples directly containing noise by setting minimum support in keywordsidentification procedure.Experimental results show that the proposed method performs well in keywords identification and message structure inference for both text protocol and binary protocol.

Key words: Closed sequential patterns mining, Message structure inference, Network traffic, Protocol format inference, Protocol reverse engineering

CLC Number: 

  • TP398.08
[1]吴礼发,洪征,潘璠.网络协议逆向分析及应用[M].北京:国防工业出版社,2016:10-13.
[2]DUCHÊNE J,GUERNIC C L,ALATA E,et al.State of the art of network protocol reverse engineering tools[J].Journal of Computer Virology and Hacking Techniques,2017,14(2):1-16.
[3]NARAYAN J,SHUKLA S K,CLANCY T C.A Survey of Automatic Protocol Reverse Engineering Tools[J].Acm Computing Surveys,2015,48(3):1-26.
[4]LUO J Z,YU S Z.Position-based automatic reverse engineering of network protocols[J].Journal of Network & Computer Applications,2013,36(3):1070-1077.
[5]ZHANG Z,ZHANG Z,LEE P P C,et al.ProWord:An unsupervised approach to protocol feature word extraction[C]∥INFOCOM,2014 Proceedings IEEE.IEEE,2014:1393-1401.
[6]CAI J,LUO J Z,LEI F.Analyzing network protocols of application layer using hidden semi-Markov model[J].Mathematical Problems in Engineering,2016,2016:1-14.
[7]LUO J Z,YU S Z,CAI J.Method for determining the lengths of protocol keywords based on maximum likelihood probability[J].Journal of Software,2016,37(6):119-128.(in Chinese)
罗建桢,余顺争,蔡君.基于最大似然概率的协议关键词长度确定方法[J].通信学报,2016,37(6):119-128.
[8]CUI W,KANNAN J,WANG H J.Discoverer:Automatic Protocol Reverse Engineering from Network Traces[C]∥Proedings of the 16th USENIX Security Symposium.Berkeley:ACM,2007:1-14.
[9]BEDDOEM M.Protocol Information Project.[EB/OL].(2004-10-5) [2018-01-20].http://www.4tphi.net/~awalters/PI/PI.html.
[10]BOSSERT G,HIET G,HENIN T.Modelling to simulate botnet command and control protocols for thee valuation of network intrusion detection systems∥2011 Conference on Network and Information Systems Security (SAR-SSI).La Rochelle:IEEE,2011:1-8. [11]KRUEGER T,KRAEMER N.PRISMA:Protocol Inspection and State Machine Analysis[J].Journal of the American Che-mical Society,2015,98(25):8101-8107.
[12]LIN Z,JIANG X,XU D,et al.Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution[C]∥Network and Distributed System Security Symposium,NDSS 2008.San Diego,California,USA,DBLP,2008.
[13]LIN Z,ZHANG X,XU D.Reverse Engineering Input Syntactic Structure from Program Execution and Its Applications[J].IEEE Transactions on Software Engineering,2010,36(5):688-703.
[14]LI M,YU Z S.Noise-Tolerant and Optimal Segmentation of Message Formats for Unknown Application-Layer Protocols[J].Journal of Software,2013,24(3):604-617.(in Chinese)
黎敏,余顺争.抗噪的未知应用层协议报文格式最佳分段方法[J].软件学报,2013,24(3):604-617.
[15]WANG Z,JIANG X,CUI W,et al.ReFormat:Automatic reverse engineering of encrypted messages[C]∥Proc of the 14th European Conf on Research in Computer Security[S.l.]:Springer,2010:200-215.
[16]CROKER D,OVERELL P.Augmented BNF for syntax specifications:ABNF [R/OL].http://tools.ietf.org/html/rfc4234.
[17]YAN X,HAN J,AFSHAR R.CloSpan:Mining Closed Sequential Patterns in Large Databases[C]∥Siam International Conference on Data Mining.San Francisco,CA,USA,DBLP,2003:166-177.
[18]LI W M,ZHANG A F,LIU J C,et al.An automatic network protocol fuzz testing and vulnerability discovering method[J].Chinese Journal of Computers,2011,34(2):242-255.(in Chinese)
李伟明,张爱芳,刘建财,等.网络协议的自动化模糊测试漏洞挖掘方法[J].计算机学报,2011,34(2):242-255.
[19]ZHANG J,WANG Y,YANG D.CCSpan:Mining closed conti-guous sequential patterns[J].Knowledge-Based Systems,2015,89:1-13.
[20]BROWN P F,DESOUZA P V,MERCER R L,et al.Class-based n-gram models of natural language[J].Computational Linguistics,1990,18(4):467-479.
[21]WANG J,HAN J.BIDE:Efficient Mining of Frequent Closed Sequences[C]∥International Conference on Data Engineering,2004.IEEE,2004:79-90.
[22]ADAMO J M.Data Mining for Association Rules and Sequential Patterns[M].Berlin:Springer,2001.
[23]FOURNIER-VIGER P,GOMARIZ A,GUENICHE T,et al. SPMF:a Java open-source pattern mining library[J].Journal of Machine Learning Research,2014,15(1):3389-3393.
[24]HOLMES G,DONKIN A,WITTEN I H.WEKA:a machine learning workbench[C]∥Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems,1994.IEEE,2002:357-361.
[25]NETRESEC.MACCDC traces[EB/OL].[2017-10-16].http://www.netresec.com/?page=MACCDC.
[26]MAHONEY M V,CHAN P K.An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection[C]∥Vinga G,ed.Proc.of the 6th Symp.on Recent Advances in Intrusion detection.Berlin,Heidelberg:Springer-Verlag,2003:220-237.
[27]KLEIN P N.Computing the edit-distance between unrooted ordered trees[C]∥Proceedings of the 6th annual European Symposium on Algorithms.Berlin:Springer-Verlag,1998:91-102.
[1] WANG Xin-tong, WANG Xuan, SUN Zhi-xin. Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network [J]. Computer Science, 2022, 49(8): 314-322.
[2] XIANG Chang-sheng, CHEN Zhi-gang. Chaotic Prediction Model of Network Traffic for Massive Data [J]. Computer Science, 2021, 48(5): 289-293.
[3] LI Yi-hao, HONG Zheng, LIN Pei-hong, FENG Wen-bo. Message Format Inference Method Based on Rough Set Clustering [J]. Computer Science, 2020, 47(12): 319-326.
[4] YAO Li-shuang, LIU Dan, PEI Zuo-fei, WANG Yun-feng. Real-time Network Traffic Prediction Model Based on EMD and Clustering [J]. Computer Science, 2020, 47(11A): 316-320.
[5] CHEN Sheng, ZHU Guo-sheng, QI Xiao-yun, LEI Long-fei, WU Shan-chao, WU Meng-yu. Custom User Anomaly Behavior Detection Based on Deep Neural Network [J]. Computer Science, 2019, 46(11A): 442-445.
[6] ZHAO Bo, ZHANG Hua-feng, ZHANG Xun, ZHAO Jin-xiong, SUN Bi-ying, YUAN Hui. EMD-based Anomaly Detection for Network Traffic in Power Plants [J]. Computer Science, 2019, 46(11A): 464-468.
[7] ZHANG Tao and ZHANG Ying-jiang. Network Traffic Prediction Algorithm Based on Vector Space Reconstruction [J]. Computer Science, 2016, 43(7): 111-114.
[8] LU Xing-hua and CHEN Ping-hua. Traffic Prediction Algorithm in Buffer Based on Recurrence Quantification Union Entropy Feature Reconstruction [J]. Computer Science, 2015, 42(4): 68-71.
[9] BAI Jun, XIA Jing-bo and ZHAO Xiao-huan. Prediction Model of Network Traffic Based on EMD and RVM [J]. Computer Science, 2015, 42(1): 122-125.
[10] WEI De-bin,PAN Cheng-sheng and HAN Rui. Self-similarity Analysis of Satellite Network Traffic [J]. Computer Science, 2013, 40(5): 67-69.
[11] YAO Dong,LUO Jun-yong,CHEN Wu-ping and YIN Mei-juan. Online Double Random Forests Intrusion Detection Based on Non-extensive Entropy Features Extraction [J]. Computer Science, 2013, 40(12): 192-196.
[12] LI Chang-rong and WU Di. Research on Application of Network Protocol Parsing Class System Based on Multi-core Optimization [J]. Computer Science, 2013, 40(11): 85-88.
[13] XIE Sheng-jun,YIN Feng and ZHOU Xu-chuan. Testing of Network Traffic Series in Reconstructed Phase Space Based on Recurrence Rate Feature [J]. Computer Science, 2013, 40(11): 48-51.
[14] . Network Traffic Prediction Based on Phase Space Reconstruction and Least Square Support Vector Machine [J]. Computer Science, 2012, 39(7): 92-95.
[15] . Botnet Propagation Model with Two-factor on Scale-free Network [J]. Computer Science, 2012, 39(10): 78-81.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!