Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval

CHEN Qing-chao1, WANG Tao1, FENG Wen-bo2, YIN Shi-zhuang1, LIU Li-jun1   

  1. 1 Equipment Simulation Training Center, Army Engineering University, Shijiazhuang 050003, China
    2 College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China
  • Online:2020-08-15 Published:2020-08-10
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2017YFB0802900) and Natural Science Foundation of Jiangsu Province, China (BK20161469).

Abstract: In the process of format inference of unknown binary protocols, a large amount of prior knowledge is often introduced, the experimental operation is complex and the accuracy of the results is low.For this reason, a method that requires less artificial setting of parameters, simple operation and higher accuracy is proposed to infer the unknown binary protocol format.The preprocessed protocol data is clustered hierarchically, and the optimal clustering is obtained by using CH (Calinski-Harabasz) coefficient as the evaluation criteria.Through the improved sequence comparison of the clustering results, the protocol data sequence with interval is obtained, continuous intervals are counted and merged to analyze protocol formats.The experimental results show that the binary protocol format inference method proposed in this paper can infer more than 80% of the field intervals in the unknown binary protocol.Compared with the format inference method in AutoReEngine algorithm, the F1-Measure value of the proposed method is improved by about 30% as a whole.

Key words: Binary protocol, Format inference, Hierarchical clustering, Interval, Sequence alignment

CLC Number: 

  • TP393
