计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 313-318.doi: 10.11896/jsjkx.190700031
陈庆超1, 王韬1, 冯文博2, 尹世庄1, 刘丽君1
CHEN Qing-chao1, WANG Tao1, FENG Wen-bo2, YIN Shi-zhuang1, LIU Li-jun1
摘要: 在未知二进制协议的格式推断过程中, 常常引入大量的先验知识, 实验操作复杂且准确率不高。为此, 文中提出了一种人为设定较少参数、操作简单、准确率较高的方法进行未知二进制协议格式推断, 将预处理的协议数据进行层次聚类, 以CH(Calinski-Harabasz)系数为评价标准获得最优聚类, 通过对聚类所得结果进行改进的序列对比以获得带有间隔的协议数据序列, 统计合并连续间隔, 以分析协议格式。实验结果表明, 提出的二进制协议格式推断方法能够推断出未知二进制协议80%以上的字段间隔, 相较于AutoReEngine算法中的格式推断方法, 所提方法的F1-Measure值整体上提升了约30%。
中图分类号:
[1]DUCHENE J, LE GUERNIC C, ALATA E, et al.State of the art of network protocol reverse engineering tools[J].Journal of Computer Virology and Hacking Techniques, 2018, 14(1):53-68. [2]LUO J Z, YU S Z.Position-based automatic reverse engineering of network protocols[J].Journal of Network and Computer Applications, 2013, 36(3):1070-1077. [3]LI M, YU S Z.Noise-Tolerant and Optimal Segmentation of Message Formats for Unknown Application-Layer Protocols [J].Journal of Software, 2013(3):604-617. [4]ZHANG Z, ZHANG Z, LEE P P, et al.ProWord:An unsupervised approach to protocol feature word extraction[C]∥International Conference on Computer Communications.2014:1393-1401. [5]MUHAMAD F N, AHMAD R B, ASI S M, et al.Performance Analysis Of Needleman-Wunsch Algorithm (Global) And Smith-Waterman Algorithm (Local) In Reducing Search Space And Time For Dna Sequence Alignment[C]∥Journal of Physics:Conference Series.IOP Publishing, 2018, 1019(1):012085. [6]TAO S, YU H, LI Q.Bit-oriented format extraction approach for automatic binary protocol reverse engineering[J].IET Communications, 2016, 10(6):709-716. [7]YAN X, LI Q.Method for determining boundaries of binary protocol format keywords based on optimal path search[J].Journal of Computer Applications, 2018, 38(6):1726-1731. [8]WANG Y, LI X, MENG J, et al.Biprominer:Automatic Mining of Binary Protocol Features[C]∥International Conference on Parallel & Distributed Computing.IEEE, 2012:179-184. [9]HOU F J, WANG L, WANG S, et al.Position-based Automated Protocol Reverse Engineer on Network Flows[J].Computer Engineering, 2019, 45(5):84-87. [10]LIU J L, FU G Y, LI H L, et al.Proprietary protocol fuzzing method based on improved voting expert algorithm[J].Compu-ter Engineering and Applications, 2018, 54(12):98-104. [11]MENG F, ZHANG C, WU G.Protocol reverse based on hierarchical clustering and probability alignment from network traces[C]∥2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA).IEEE, 2018:443-447. [12]LI Y, LI Q, ZHANG X.Automatic protocol format signatureconstruction algorithm based on discrete series protocol message [J].Journal of Computer Applications, 2017, 37(4):954-959. [13]WU Y.Research on Encryption Identification and frequent patterns mining of unknown protocol bitstreams[D].Shijiazhuang:Army Engineering University, 2015:130-132. [14]ASHKENAZY H, SELA I, LEVY KARIN E, et al.Multiple sequence alignment averaging improves phylogeny reconstruction[J].Systematic Biology, 2018, 68(1):117-130. [15]HASHEEM Y M, MOHAMAD K M, ABDI A N E, et al.Mo-bile Forensic Images and Videos Signature Pattern Matching using M-Aho-Corasick[J].International Journal of Advanced Computer Science and Applications, 2016, 7(7):261-264. [16]QIAO Z, GOTO K, OHSHIMA T, et al.Dictionary matching:review of the aho-corasick algorithm and vision for large dictio-naries[C]∥Proceedings of the 8th International Conference on Information Systems and Technologies.ACM, 2018:4. [17]LEI D, WANG T, WANG X H, et al.Unknown protocol frame segmentation algorithm based on preamble mining [J].Journal of Computer Applications, 2017, 37(2):440-444. [18]LIAO Y L, LI Y C, CHEN N C, et al.Adaptively Banded Smith-Waterman Algorithm for Long Reads and Its Hardware Accelerator[C]∥2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).IEEE, 2018:1-9. [19]LI T, LIU Y, ZHANG C, et al.A noise-tolerant system for protocol formats extraction from binary data[C]∥2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA).IEEE, 2014:862-865. [20]TRIFILO A, BURSCHKA S, BIERSACK E.Traffic to protocol reverse engineering[C]∥2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.IEEE, 2009:1-8. [21]SUN F, WANG S, ZHANG C, et al.Unsupervised field segmentation of unknown protocol messages[J].Computer Communications, 2019, 146:121-130. [22]WRCCDC:Pcaps from the Western Regional Collegiate CyberDefense Competition[OL].https://archive.wrccdc.org/pcaps/. [23]CSDN.S7协议数据集[OL].https://download.csdn.net/down-load/jizhuan0248/10780517. |
[1] | 鲁淑霞, 张振莲. 基于最优间隔的AdaBoostv算法的非平衡数据分类 Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin 计算机科学, 2021, 48(11): 184-191. https://doi.org/10.11896/jsjkx.200900107 |
[2] | 徐旭东, 张志祥, 张献. 私有二进制协议中变长域的格式挖掘方法 Format Mining Method of Variable-length Domain in Private Binary Protocol 计算机科学, 2020, 47(6A): 556-560. https://doi.org/10.11896/JsJkx.190900035 |
[3] | 张云帆,周宇,黄志球. 基于语义相似度的API使用模式推荐 Semantic Similarity Based API Usage Pattern Recommendation 计算机科学, 2020, 47(3): 34-40. https://doi.org/10.11896/jsjkx.190300053 |
[4] | 李毅豪, 洪征, 林培鸿, 冯文博. 基于粗糙集聚类的报文格式推断方法 Message Format Inference Method Based on Rough Set Clustering 计算机科学, 2020, 47(12): 319-326. https://doi.org/10.11896/jsjkx.191000193 |
[5] | 张洪泽, 洪征, 王辰, 冯文博, 吴礼发. 基于闭合序列模式挖掘的未知协议格式推断方法 Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method 计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011 |
[6] | 张峰. 机会网络中基于节点相遇间隔的缓存管理策略 Node Encounter Interval Based Buffer Management Strategy in Opportunistic Networks 计算机科学, 2019, 46(5): 57-61. https://doi.org/10.11896/j.issn.1002-137X.2019.05.008 |
[7] | 夏英, 李刘杰, 张旭, 裴海英. 基于层次聚类的不平衡数据加权过采样方法 Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data 计算机科学, 2019, 46(4): 22-27. https://doi.org/10.11896/j.issn.1002-137X.2019.04.004 |
[8] | 吴祎凡, 崔艳鹏, 胡建伟. 基于层次聚类的警报处理方法 Alert Processing Method Based on Hierarchical Clustering 计算机科学, 2019, 46(4): 203-209. https://doi.org/10.11896/j.issn.1002-137X.2019.04.032 |
[9] | 徐晓玲, 金忠, 贲圣兰. 基于标签敏感最大间隔准则的人脸年龄两步估计算法 Facial Age Two-steps Estimation Algorithm Based on Label-sensitive Maximum Margin Criterion 计算机科学, 2018, 45(6): 284-290. https://doi.org/10.11896/j.issn.1002-137X.2018.06.050 |
[10] | 王树怡,董东. 基于聚类和偏序序列的API用法模式挖掘 Mining of API Usage Pattern Based on Clustering and Partial Order Sequences 计算机科学, 2017, 44(Z6): 486-490. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.108 |
[11] | 李锋,谢嗣弘. 基于无监督学习的移动心电信号异常诊断研究 Study on Abnormal Diagnosis of Moving ECG Signals Based on Unsupervised Learning 计算机科学, 2017, 44(Z11): 68-71. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.013 |
[12] | 李寒,佟宁,陈峰. 一种基于层次聚类的软件架构恢复方法 Hierarchical Clustering Based Software Architecture Recovery Approach 计算机科学, 2017, 44(4): 75-78. https://doi.org/10.11896/j.issn.1002-137X.2017.04.016 |
[13] | 熊振亚,林正浩,任浩琪. 基于跳转轨迹的分支目标缓冲研究 Efficient BTB Based on Taken Trace 计算机科学, 2017, 44(3): 195-201. https://doi.org/10.11896/j.issn.1002-137X.2017.03.042 |
[14] | 林梦雷,刘景华,王晨曦,林耀进. 基于标记权重的多标记特征选择算法 Multi-label Feature Selection Algorithm Based on Label Weighting 计算机科学, 2017, 44(10): 289-295. https://doi.org/10.11896/j.issn.1002-137X.2017.10.052 |
[15] | 王洋,沈记全. 基于发车时刻表的单线公交组合调度模型 Single Line Transit Mixed Scheduling Model Based on Vehicle Departure Timetable 计算机科学, 2017, 44(10): 269-275. https://doi.org/10.11896/j.issn.1002-137X.2017.10.049 |
|