计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 332-335.doi: 10.11896/jsjkx.190900116

• 信息安全 • 上一篇    

多级字典存储的未知文本协议候选关键词链式合并方法

陈庆超1, 王韬1, 尹世庄1, 冯文博2   

  1. 1 陆军工程大学装备模拟训练中心 石家庄 050003
    2 陆军工程大学指挥控制工程学院 南京 210007
  • 收稿日期:2019-09-17 修回日期:2019-11-21 发布日期:2020-12-17
  • 通讯作者: 王韬(a13592247640@foxmail.com)
  • 作者简介:cqc62808@163.com
  • 基金资助:
    国家重点研发计划(2017YFB0802900);江苏省自然科学基金(BK20161469)

Chain Merging Method for Unknown Text Protocol Candidate Keyword Stored in Multi-levelDictionary

CHEN Qing-chao1, WANG Tao1, YIN Shi-zhuang1, FENG Wen-bo2   

  1. 1 Equipment Simulation Training CenterArmy Engineering University Shijiazhuang 050003,China
    2 College of Command and Control Engineering Army Engineering University Nanjing 210007,China
  • Received:2019-09-17 Revised:2019-11-21 Published:2020-12-17
  • About author:CHEN Qing-chao,born in 1996postgraduate.His main research interests include cyber security and so on.
    WANG Tao,born in 1964Ph.Dprofessor.His main research interests include cyber security and cryptography.
  • Supported by:
    National Basic Research Program of China(2017YFB0802900) and Natural Science Foundation of Jiangsu Pro-vince,China (BK20161469).

摘要: 关键词提取是进行未知网络协议逆向的关键步骤.鉴于现有的关键词提取方法存在精确度不高、需要较多先验知识、操作繁琐等问题提出了一种基于位置信息的关键词自动化提取算法.首先通过Trigram分词获取候选关键词附加上位置信息后将其组织成多级字典;在此基础上根据位置信息将传统的对候选关键词进行树状合并改进为对其进行链式合并以获得更精确的最长候选关键词.实验结果表明当设置频繁度阈值为0.6时该方法即可以准确提取出文本协议的关键词.同时分析了频繁度的设置对实验效果的影响并讨论了基于频繁序列对关键词进行挖掘的相关算法的局限性.

关键词: Trigram, 多级字典, 关键词提取, 链式, 未知文本协议, 位置信息

Abstract: Keyword extraction is a key step in the reverse engineering of unknown network protocols.The existing keyword extraction methods have some problemssuch as low accuracycomplex operation and more prior knowledge is required.Thereforean automatic keyword extraction algorithm based on location information is proposed.Firstthe candidate keywords are obtained by Trigram word segmentation.After adding the location informationthese keywords are organized into a multi-level dictionary.On this basisthe traditional tree merging of candidate keywords is improved to chain merging according to the location informationso as to obtain more precise and the longest candidate keywords.The experimental results show thatwhen the frequency threshold is set to 0.6this method can accurately extract the keywords of text protocol.At the same timethe influence of frequency setting on experimental result is analyzedand the limitations of related algorithms for keyword mining based on frequent sequences are also discussed.

Key words: Chain, Keyword extraction, Location information, Multi-level dictionary, Trigram, Unknown text protocol

中图分类号: 

  • TP393
[1] DUCHENE J,LE GUERNIC C,ALATA E,et al.State of the art of network protocol reverse engineering tools[J].Journal of Computer Virology and Hacking Techniques,2018,14(1):53-68.
[2] Beddoe M A.Network protocol analysis using bioinformatics algorithms[OL].http://www.4tphi.net/~awalters/PI/pi.pdf.
[3] SIJA B D,GOO Y H,SHIM K S,et al.A survey of automatic protocol reverse engineering approaches,methods,and tools on the inputs and outputs view[J].Security and Communication Networks,2018,2018:1-17.
[4] CUI W,KANNAN J,WANG H J.Discoverer:Automatic Protocol Reverse Engineering from Network Traces[C]//USENIX Security Symposium.2007:1-14.
[5] PAN F,HONG Z,DU Y Y,et al.Recursive Clustering BasedMethod for Message Structure Extraction[J].Journal of Sichuan University (Engineering Science Edition),2012,44(6):137-142.
[6] BISWAS S K,BORDOLOI M,SHREYA J.A graph based key-word extraction model using collective node weight[J].Expert
Systems with Applications,2018,97:51-59.
[7] KLEBER S,MAILE L,KARGL F.Survey of Protocol Reverse Engineering Algorithms:Decomposition of Tools for Static Traffic Analysis[J].IEEE Communications Surveys &Tutorials,2018,21(1):526-561.
[8] OUSIRIMANEECHAI N,SINTHUPINYO S.Extraction ofTrend Keywords and Stop Words from Thai Facebook Pages Using Character n-Grams[J].International Journal of Machine Learning and Computing,2018,8(6):589-594.
[9] LIN M S,HAN X J,SONG W,et al.Based on multi-thread and multi-factor weighted keyword extraction algorithm[J].Computer Engineering and Design,2013,34(7):2398-2402.
[10] KRUEGER T K N P.Protocol Inspection and State MachineAnalysis[J].Journal of the American Chemical Society,2014,98(25):8101-8107.
[11] ZHANG Z,ZHANG Z,LEE P P,et al.Proword:An unsupervised approach to protocol feature word extraction[C]//IEEE INFOCOM 2014-IEEE Conference on Computer Communications.2014:1393-1401.
[12] LUO J Z,YU S Z.Position-based automatic reverse engineering of network protocols[J].Journal of Network and Computer Applications,2013,36(3):1070-1077.
[13] HONG Z,TIAN Y F,ZHANG H Z,et al.Extended prefix tree based protocol format inference[J].Computer Engineering and Applications,2018,54(12):19-25.
[14] HOU F J,WANG L,WANG S,et al.Position-based automated protocol reverse engineer on network flows[J/OL].Computer Engineering.https://doi.org/10.19678/j.jssn.1000-3428.0050950.
[15] ERMAN J,ARLITT M,MAHANTI A.Traffic classificationusing clustering algorithms[C]//Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data.2006:281-286.
[16] NIYAZMAND T,IZADI I.Pattern mining in alarm flood sequences using a modified PrefixSpan algorithm[J].ISA Transactions,2019,90:287-293.
[17] LI Y,LI Q,ZHANG X.Separate Protocol Message-Based Format Signature Construction Method for Variable Field[J].Journal of Information Engineering University,2018,19(1):30-38.
[18] PARK S H,SYNN J,KWON O H,et al.Apriori-based textmining method for the advancement of the transportation management plan in expressway work zones[J].The Journal of Supercomputing,2018,74(3):1283-1298.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 李素, 宋宝燕, 李冬, 王俊陆.
面向金融活动的复合区块链关联事件溯源方法
Composite Blockchain Associated Event Tracing Method for Financial Activities
计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068
[3] 余晗青, 杨贞, 殷志坚.
基于区域激活策略的Tiny YOLOv3目标检测算法
Tiny YOLOv3 Target Detection Algorithm Based on Region Activation Strategy
计算机科学, 2021, 48(6A): 118-121. https://doi.org/10.11896/jsjkx.200700122
[4] 毛湘科, 黄少滨, 余秦勇.
一种基于图的文档关键词和摘要协同抽取方法研究
Graph Based Collaborative Extraction Method for Keywords and Summary from Documents
计算机科学, 2021, 48(10): 44-50. https://doi.org/10.11896/jsjkx.200900082
[5] 纪明轩, 宋玉蓉.
一种基于对数位置表示和自注意力的机器翻译新模型
New Machine Translation Model Based on Logarithmic Position Representation and Self-attention
计算机科学, 2020, 47(11A): 86-91. https://doi.org/10.11896/jsjkx.200200003
[6] 徐立.
基于加权TextRank的文本关键词提取方法
Text Keyword Extraction Method Based on Weighted TextRank
计算机科学, 2019, 46(6A): 142-145.
[7] 杨玥,张德生.
中文文本的主题关键短语提取技术
Technology of Extracting Topical Keyphrases from Chinese Corpora
计算机科学, 2017, 44(Z11): 432-436. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.092
[8] 陈湘涛,肖碧文.
基于位置信息的显露序列模式挖掘研究
Emerging Sequences Pattern Mining Based on Location Information
计算机科学, 2017, 44(7): 175-179. https://doi.org/10.11896/j.issn.1002-137X.2017.07.031
[9] 王青芸,程春玲.
基于位置信息的移动SNS数据动态划分复制算法
Mobile SNS Data Dynamic Partitioning and Replication Algorithm Based on Location Information
计算机科学, 2017, 44(3): 220-225. https://doi.org/10.11896/j.issn.1002-137X.2017.03.046
[10] 庞松超,罗长远,韩东东,庞涵滢.
一种新的航空自组网混合路由算法
Aeronautical Ad hoc Network Hybrid Routing Algorithm
计算机科学, 2016, 43(5): 56-61. https://doi.org/10.11896/j.issn.1002-137X.2016.05.010
[11] 席瑞,李玉军,侯孟书.
室内定位方法综述
Survey on Indoor Localization
计算机科学, 2016, 43(4): 1-6. https://doi.org/10.11896/j.issn.1002-137X.2016.04.001
[12] 陈伟鹤,刘云.
基于词或词组长度和频数的短中文文本关键词提取算法
Keyword Extraction Algorithm Based on Length and Frequency of Words or Phrases for Short Chinese Texts
计算机科学, 2016, 43(12): 50-57. https://doi.org/10.11896/j.issn.1002-137X.2016.12.009
[13] 阿力甫·阿不都克里木,李晓.
基于TextRank算法和互信息相似度的维吾尔文关键词提取及文本分类
Uyghur Keyword Extraction and Text Classification Based on TextRank Algorithm and Mutual Information Similarity
计算机科学, 2016, 43(12): 36-40. https://doi.org/10.11896/j.issn.1002-137X.2016.12.006
[14] 李响,孙华志.
一种新型的防范历史攻击的k-匿名算法
New k-anonymization Algorithm for Preventing Historical Attacks
计算机科学, 2015, 42(8): 194-197.
[15] 何远舵,陈之昀,王亚沙.
一种面向浏览式购物行为模式的LBS购书移动应用
Browse-shopping-behavior-pattern-oriented Indoor LBS Mobile Application for Book Shopping
计算机科学, 2015, 42(12): 32-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!