Computer Science ›› 2023, Vol. 50 ›› Issue (6): 330-337.doi: 10.11896/jsjkx.220700073

• Computer Network • Previous Articles     Next Articles

Cybersecurity Threat Intelligence Mining Algorithm for Open Source Heterogeneous Data

WEI Tao, LI Zhihua, WANG Changjie, CHENG Shunhang   

  1. School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2022-07-07 Revised:2022-09-06 Online:2023-06-15 Published:2023-06-06
  • About author:WEI Tao,born in 1998,postgraduate.His main research interests include information system analysis and information security.LI Zhihua,born in 1969,Ph.D,professor,master supervisor.His main research interests include the key techno-logies and information security of the end edge cloud,and its intersection with cutting-edge disciplines such as artificial intelligence.
  • Supported by:
    Intelligent Manufacturing Project of the Ministry of Industry and Information Technology(ZH-XZ-180004) and Fundamental Research Funds for the Central Universities of Ministry of Education of China(JUSRP211A41,JUSRP42003).

Abstract: To address the problem of how to efficiently mine threat intelligence from open source network security reports,a TI-NER-based intelligence mining(TI-NER-IM) method is proposed.Firstly,the IoT cybersecurity reports of nearly 10 years are collected and annotated to construct a threat intelligence entity identification dataset.Secondly,in view of the lack of performance of traditional entity recognition models in the field of threat intelligence mining,a threat intelligence entity identification based on self-attention mechanism and character embedding(TIEI-SMCE) model is proposed,which fuses character embedding information.The potential dependency weights between words,contexts and other characteristics are then captured through self-attention mechanism to accurately identify threat intelligence entities.Thirdly,a threat intelligence named entity recognition(TI-NER) algorithm based on TIEI-SMCE model is proposed.Finally,a TI-NER-based intelligence mining(TI-NER-IM) method is designed and proposed.TI-NER-IM method enables automated mining of threat intelligence from unstructured and semi-structured security reports.Eexperimental results show that compared with the BERT-BiLSTM-CRF model,TI-NER-IM's F1 value increases by 1.43%.

Key words: Threat intelligence mining, Natural language processing, Entity extraction, Indicators of compromise

CLC Number: 

  • TP393.08
[1]CASCAVILLAG,TAMBURRI D A,VAN DEN HEUVEL W J.Cybercrime threat intelligence:A systematic multi-vocal lite-rature review[J].Computers & Security,2021,105:102258.
[2]BIANCHIG,CONTI M,DARGAHI T,et al.Editorial for theSpecial Issue on Sustainable Cyber Forensics and Threat Intelligence[J].IEEE Transactions on Sustainable Computing,2021,6(2):182-183.
[3]WU H,LI X,GAO Y.An effective approach of named entityrecognition for cyber threat intelligence[C]//2020 IEEE 4th Information Technology,Networking,Electronic and Automation Control Conference(ITNEC).IEEE,2020,1:1370-1374.
[4]BARNUM S.Standardizing cyber threat intelligence information with the structured threat information expression(stix)[J].Mitre Corporation,2012,11:1-22.
[5]MOHIT B.Named entity recognition[M]//Natural LanguageProcessing of Semitic Languages.Berlin:Springer,2014:221-245.
[6]LI J,SUN A X,HAN J L,et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70.
[7]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isallyou need[C]//Proceedings of the 2017 Advances in Neural Information Processing Systems.California,2017:5998-6008.
[8]LEE C.LSTM-CRF models for named entity recognition[J].IEICE Transactions on Information and Systems,2017,100(4):882-887.
[9]ARKHIPOV M Y,BURTSEV M S.Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition[C]//Conference on Artificial Intelligence and Natural Language.Cham:Springer,2017:91-103.
[10]DASGUPTA S,PIPLAI A,KOTAL A,et al.A comparativestudy of deep learning based named entity recognition algorithms for cybersecurity[C]//2020 IEEE International Confe-rence on Big Data(Big Data).IEEE,2020:2596-2604.
[11]LIU S,YANG H,LI J,et al.Chinese Named Entity Recognition Method in History and Culture Field Based on BERT[J].International Journal of Computational Intelligence Systems,2021,14(1):1-10.
[12]HAO W,KEROU L,ZHEN M,et al.Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J].Data Analysis and Knowledge Discovery,2021,5(7):10-25.
[13]THIVAHARAN S,SRIVATSUN G,SARATHAMBEKAI S.A survey on python libraries used for social media content scraping[C]//2020 International Conference on Smart Electro-nics and Communication(ICOSEC).IEEE,2020:361-366.
[14]MIAHM S U,SULAIMAN J,SARWAR T B,et al.Sentenceboundary extraction from scientific literature of electric double layer capacitor domain:tools and techniques[J].Applied Sciences,2022,12(3):1352.
[15]LIU X,CHEN H,XIA W.Overview of Named Entity Recognition[J].Journal of Contemporary Educational Research,2022,6(5):65-68.
[16]KENTON J D M W C,TOUTANOVA L K.BERT:Pre-trainingof Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[17]NIU Z,ZHONG G,YU H.A review on the attention mechanism of deep learning[J].Neurocomputing,2021,452:48-62.
[18]YU B,FAN Z.A comprehensive review of conditional random fields:variants,hybrids and applications[J].Artificial Intelligence Review,2020,53(6):4289-4333.
[19]LI J,SUN A,HAN J,et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70.
[20]LI Z,CHEN Q A,YANG R,et al.Threat detection and investigation with system-level provenance graphs:a survey[J].Computers & Security,2021,106:102282.
[1] WANG Lin, MENG Zuqiang, YANG Lina. Chinese Sentiment Analysis Based on CNN-BiLSTM Model of Multi-level and Multi-scale Feature Extraction [J]. Computer Science, 2023, 50(5): 248-254.
[2] ZHANG Hu, ZHANG Guangjun. Document-level Event Extraction Based on Multi-granularity Entity Heterogeneous Graph [J]. Computer Science, 2023, 50(5): 255-261.
[3] ZHEN Tiange, SONG Mingyang, JING Liping. Incorporating Multi-granularity Extractive Features for Keyphrase Generation [J]. Computer Science, 2023, 50(4): 181-187.
[4] LIU Pan, GUO Yanming, LEI Jun, LAO Mingrui, LI Guohui. Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction [J]. Computer Science, 2023, 50(3): 276-281.
[5] ZHENG Cheng, MEI Liang, ZHAO Yiyan, ZHANG Suhang. Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks [J]. Computer Science, 2023, 50(1): 221-228.
[6] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[7] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[8] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[9] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[10] ZHU Yi-na, CAO Yang, ZHONG Jing-yue, ZHENG Yong-zhi. Survey on Event Extraction Technology [J]. Computer Science, 2022, 49(12): 264-273.
[11] LIU Xiao-ying, WANG Huai, WU Jisiguleng. GAN and Chinese WordNet Based Text Summarization Technology [J]. Computer Science, 2022, 49(12): 301-304.
[12] Abudukelimu ABULIZI, ZHANG Yu-ning, Alimujiang YASEN, GUO Wen-qiang, Abudukelimu HALIDANMU. Survey of Research on Extended Models of Pre-trained Language Models [J]. Computer Science, 2022, 49(11A): 210800125-12.
[13] XU Hui, WANG Zhong-qing, LI Shou-shan, ZHANG Min. Personalized Dialogue Generation Integrating Sentimental Information [J]. Computer Science, 2022, 49(11A): 211100019-6.
[14] CHEN Zhi-yi, SUI Jie. DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection [J]. Computer Science, 2022, 49(1): 101-107.
[15] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!