计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 330-337.doi: 10.11896/jsjkx.220700073
魏涛, 李志华, 王长杰, 程顺航
WEI Tao, LI Zhihua, WANG Changjie, CHENG Shunhang
摘要: 针对如何从开源网络安全报告中高效挖掘威胁情报的问题,提出了一种基于威胁情报命名实体识别(Threat Intelligence Named Entity Recognition,TI-NER)算法的威胁情报挖掘(TI-NER-based Intelligence Mining,TI-NER-IM)方法。首先,收集了近10年的物联网安全报告并进行标注,构建威胁情报实体识别数据集;其次,针对传统实体识别模型在威胁情报IoC攻击指示器挖掘领域的不足,提出了基于自注意力机制和字符嵌入的威胁情报实体识别(Threat Intelligence Entity Identification based on Self-attention Mechanism and Character Embedding,TIEI-SMCE)模型,该模型融合字符嵌入信息,再通过自注意力机制捕获单词间潜在的依赖权重、语境等特征,从而准确地识别威胁情报IoC实体;然后,基于TIEI-SMCE模型,提出了一种威胁情报命名实体识别算法;最后,集成上述模型和算法,进一步提出了一种新的威胁情报挖掘方法。TI-NER-IM方法能实现从非结构化、半结构化网络安全报告中自动挖掘威胁情报IoC实体。实验结果表明,与BERT-BiLSTM-CRF模型相比,TI-NER-IM方法的F1值提升了1.43%。
中图分类号:
[1]CASCAVILLAG,TAMBURRI D A,VAN DEN HEUVEL W J.Cybercrime threat intelligence:A systematic multi-vocal lite-rature review[J].Computers & Security,2021,105:102258. [2]BIANCHIG,CONTI M,DARGAHI T,et al.Editorial for theSpecial Issue on Sustainable Cyber Forensics and Threat Intelligence[J].IEEE Transactions on Sustainable Computing,2021,6(2):182-183. [3]WU H,LI X,GAO Y.An effective approach of named entityrecognition for cyber threat intelligence[C]//2020 IEEE 4th Information Technology,Networking,Electronic and Automation Control Conference(ITNEC).IEEE,2020,1:1370-1374. [4]BARNUM S.Standardizing cyber threat intelligence information with the structured threat information expression(stix)[J].Mitre Corporation,2012,11:1-22. [5]MOHIT B.Named entity recognition[M]//Natural LanguageProcessing of Semitic Languages.Berlin:Springer,2014:221-245. [6]LI J,SUN A X,HAN J L,et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70. [7]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isallyou need[C]//Proceedings of the 2017 Advances in Neural Information Processing Systems.California,2017:5998-6008. [8]LEE C.LSTM-CRF models for named entity recognition[J].IEICE Transactions on Information and Systems,2017,100(4):882-887. [9]ARKHIPOV M Y,BURTSEV M S.Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition[C]//Conference on Artificial Intelligence and Natural Language.Cham:Springer,2017:91-103. [10]DASGUPTA S,PIPLAI A,KOTAL A,et al.A comparativestudy of deep learning based named entity recognition algorithms for cybersecurity[C]//2020 IEEE International Confe-rence on Big Data(Big Data).IEEE,2020:2596-2604. [11]LIU S,YANG H,LI J,et al.Chinese Named Entity Recognition Method in History and Culture Field Based on BERT[J].International Journal of Computational Intelligence Systems,2021,14(1):1-10. [12]HAO W,KEROU L,ZHEN M,et al.Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J].Data Analysis and Knowledge Discovery,2021,5(7):10-25. [13]THIVAHARAN S,SRIVATSUN G,SARATHAMBEKAI S.A survey on python libraries used for social media content scraping[C]//2020 International Conference on Smart Electro-nics and Communication(ICOSEC).IEEE,2020:361-366. [14]MIAHM S U,SULAIMAN J,SARWAR T B,et al.Sentenceboundary extraction from scientific literature of electric double layer capacitor domain:tools and techniques[J].Applied Sciences,2022,12(3):1352. [15]LIU X,CHEN H,XIA W.Overview of Named Entity Recognition[J].Journal of Contemporary Educational Research,2022,6(5):65-68. [16]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-trainingof Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186. [17]NIU Z,ZHONG G,YU H.A review on the attention mechanism of deep learning[J].Neurocomputing,2021,452:48-62. [18]YU B,FAN Z.A comprehensive review of conditional random fields:variants,hybrids and applications[J].Artificial Intelligence Review,2020,53(6):4289-4333. [19]LI J,SUN A,HAN J,et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70. [20]LI Z,CHEN Q A,YANG R,et al.Threat detection and investigation with system-level provenance graphs:a survey[J].Computers & Security,2021,106:102282. |
|