计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 292-298.doi: 10.11896/jsjkx.190600156

• 信息安全 • 上一篇    下一篇

一种基于漏洞威胁模式的网络表示学习算法

黄易1,2, 申国伟1,2, 赵文波1, 郭春1,2   

  1. 1 贵州大学计算机科学与技术学院 贵阳550025
    2 贵州大学贵州省公共大数据重点实验室 贵阳550025
  • 收稿日期:2019-06-19 出版日期:2020-07-15 发布日期:2020-07-16
  • 通讯作者: 申国伟(gwshen@gzu.edu.cn)
  • 作者简介:yHuang_Addy@163.com
  • 基金资助:
    国家自然科学基金(61802081);贵州省科技重大专项计划项目(20183001);贵州省科技计划(20161052,20171051)

Network Representation Learning Algorithm Based on Vulnerability Threat Schema

HUANG Yi1,2, SHEN Guo-wei1,2, ZHAO Wen-bo1, GUO Chun1,2   

  1. 1 Department of Computer Science and Technology,Guizhou University,Guiyang 550025,China
    2 Guizhou Provincial Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China
  • Received:2019-06-19 Online:2020-07-15 Published:2020-07-16
  • About author:HUANG Yi,born in 1997,postgraduate,is a member of China Computer Federation.Her main research interests include representation learning and network security.
    SHEN Guo-wei,born in 1986,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include cyberspace security and big data.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61802081),National Science and Technology Major Project of the Ministry of Science and Technology of Guizhou Province,China(20183001) and Guizhou Provincial Science and Technology Plan (20161052,20171051)

摘要: 威胁情报分析可为网络攻防提供有效的攻防信息,而细粒度的挖掘即网络威胁情报数据中的安全实体及实体间的关系,是网络威胁情报分析研究的热点。传统的机器学习算法,在被应用到大规模网络威胁情报数据分析中时,面临着稀疏、高维等问题,进而难以有效地捕获网络信息。为此,针对网络安全漏洞的分类问题,文中提出了一种基于漏洞威胁模式的网络表示学习算法——HSEN2vec。该算法旨在最大限度地捕获异构安全实体网络的结构和语义信息,并从中获得安全实体的低维向量表示。该算法首先基于漏洞威胁模式获取异构安全实体网络的结构信息,随后通过Skip-gram模型建模,并通过负采样技术进行有效预测进而得到最终的向量表示。实验结果表明,在国家安全漏洞数据上,与其他方法相比,利用所提算法进行漏洞分类的准确率等评价指标有所提升。

关键词: 网络表示学习, 异构安全实体网络, 威胁模式, 漏洞

Abstract: Threat intelligence analysis can provide effective attack and defense information for network attack and defense,and fine-grained mining,that is,the relationship between security entities and entities in network threat intelligence data,is a hotspot of network threat intelligence analysis research.Traditional machine learning algorithms,when applied to large-scale network threat intelligence data analysis,face sparse,high-dimensional and other issues,and thus it is difficult to effectively capture network information.To this end,a network representation learning algorithm based on vulnerability threat schema——HSEN2vec for the classification of network security vulnerabilities is proposed.The algorithm aims to capture the structure and semantic information of the heterogeneous security entity network to the maximum extent,and obtains the low-dimensional vector representation of the security entity.In the algorithm,the structural information of the heterogeneous security entity network is obtained based on the vulnerability threat schema,and then modeled by the Skip-gram model,and the effective prediction is performed by the negative sampling technique to obtain the final vector representation.The experimental results show that in the national security vulnerability data,compared with other methods,the learning algorithm proposed in this paper improves the accuracy of vulnerability classification and other evaluation indicators.

Key words: Network representation learning, Heterogeneous security entity network, Threat schema, Vulnerability

中图分类号: 

  • TP393.0
[1] YANG P A,WU Y,SU L Y,et al.Overview of Threat Intelligence Sharing Technologies in Cyberspace[J].ComputerScie-nce,2018,45(6):9-18,26.
[2] LI C,ZHOU Y.Analysis on Threat Intelligence in Big Data Environment[J].Journal of Intelligence,2017,36(9):24-30.
[3] QIN Y,SHEN G W,ZHAO W B,et al.Research on the method of network security entity recognition based on deep neural network[J].Journal of Naning University(Natural Science),2019,55(1):29-40.
[4] ZHANG Y C,WEI Q,LIU Z L,et al.Architecture of vulnerabi-lity discovery technique for information systems[J].Journal on Communications,2011,32(2):42-47.
[5] LI J H.Overview of the technologies of threat intelligence sen-sing,sharing and analysis in cyber space[J].Chinese Journal of Network and Information Security,2016,2(2):16-29.
[6] TU C C,YANG C,LIU Z Y,et al.Network representationlearning:an overview[J].Scientia Sinica Informationis,2017,47(8):980-996.
[7] GAO H,HUANG H.Deep Attributed Network Embedding[C]//IJCAI.2018:3364-3370.
[8] LIU Z M,MA H,LIU S X,et al.A Network Representation Learning Algorithm Fusing with Textual Attribute Information of Nodes[J].Computer Engineering,2018(11):165-171.
[9] YIN B C,WANG W T,WANG L C.Review of Deep Learning[J].Journal of Beijing University of Technology,2015,41(1):48-59.
[10] PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceeding of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,2014:701-710.
[11] SHI C,SUN Y Z.Research Progress of Heterogeneous Network Representation Learning[J].Communications of the CCF,2018,14(3):16-20.
[12] SHI C,SUN Y Z,PHILIP S Y.Research Status And Future Development Of Heterogeneous Information Network [J].Communications of the CCF,2017,13(11):36-42.
[13] WANG X,CUI P,ZHU W W.On the Basic Problems in Network Representation Learning[J].Communications of the CCF,2018,14(3):12-15.
[14] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems.2013:3111-3119.
[15] SHEN W,HAN J,WANG J,et al.Shine+:A general frame-work for domain-specific entity linking with heterogeneous in-formation networks[J].IEEE Transactions on Knowledge Data Engineering,2018,30(2):353-366.
[16] YANG C,LIU M,HE F,et al.Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery[C]//Joint European Conference on Machine Learning and Knowledge Disco-very in Databases.Springer,2018:37-54.
[17] LIU Y F,LI R F.Graph Regularized Semi-Supervised Learning on Heterogeneous Information Networks[J].Journal of Computer Research and Development,2015,52(3):606-613.
[18] GROVER A,LESKOVEC J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,2016:855-864.
[19] SUN Y,HAN J,YAN X.Pathsim:Meta path-based top-k similarity search in heterogeneous information networks[J].Proceedings of the VLDB Endowment,2011,4(11):992-1003.
[20] DU Y P,LIU J X,ZHANG J L.Multi-semantic Metapath Based Classification Method in Heterogeneous Information Network [J].Pattern Recognition and Artificial Intelligence,2017,30(12):1100-1107.
[21] HUANG L W,LI D Y,MA Y T,et al.A Meta Path-Based Link Prediction Model for Heterogeneous Information Networks[J].Chinese Journal of Computers,2014,37(4):848-858.
[22] DONG Y,CHAWLA N V,SWAMI A.metapath2vec:Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM:135-144.
[23] TANG J,QU M,MEI Q.Pte:Predictive text embeddingthrough large-scale heterogeneous text networks[C]//Procee-dings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2015:1165-1174.
[24] RONG X.word2vec parameter learning explained[J].arXiv:1141.2378.
[1] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[2] 蒋宗礼, 李苗苗, 张津丽. 基于融合元路径图卷积的异质网络表示学习[J]. 计算机科学, 2020, 47(7): 231-235.
[3] 龚扣林, 周宇, 丁笠, 王永超. 基于BiLSTM模型的漏洞检测[J]. 计算机科学, 2020, 47(5): 295-300.
[4] 张虎, 周晶晶, 高海慧, 王鑫. 融合节点结构和内容的网络表示学习方法[J]. 计算机科学, 2020, 47(12): 119-124.
[5] 顾秋阳, 琚春华, 吴功兴. 融入深度自编码器与网络表示学习的社交网络信息推荐模型[J]. 计算机科学, 2020, 47(11): 101-112.
[6] 方皓, 吴礼发, 吴志勇. 基于符号执行的Return-to-dl-resolve利用代码自动生成方法[J]. 计算机科学, 2019, 46(2): 127-132.
[7] 冶忠林, 赵海兴, 张科, 朱宇. 基于多视图集成的网络表示学习算法[J]. 计算机科学, 2019, 46(1): 117-125.
[8] 李佳莉, 陈永乐, 李志, 孙利民. 基于协议状态图遍历的RTSP协议漏洞挖掘[J]. 计算机科学, 2018, 45(9): 171-176.
[9] 王伟, 杨本朝, 李光松, 斯雪明. 异构冗余系统的安全性分析[J]. 计算机科学, 2018, 45(9): 183-186.
[10] 锁延锋,王少杰,秦宇,李秋香,丰大军,李京春. 工业控制系统的安全技术与应用研究综述[J]. 计算机科学, 2018, 45(4): 25-33.
[11] 邓兆琨, 陆余良, 朱凯龙, 黄晖. 基于符号执行技术的网络程序漏洞检测系统[J]. 计算机科学, 2018, 45(11A): 325-329.
[12] 曾赛文,文中华,戴良伟,袁润. 基于不确定攻击图的攻击路径的网络安全分析[J]. 计算机科学, 2017, 44(Z6): 351-355.
[13] 张亚丰,洪征,吴礼发,周振吉,孙贺. 基于状态的工控协议Fuzzing测试技术[J]. 计算机科学, 2017, 44(5): 132-140.
[14] 缪旭东,王永春,曹星辰,方峰. 基于模式匹配的安全漏洞检测方法[J]. 计算机科学, 2017, 44(4): 109-113.
[15] 万燕,赵希,王国林. 基于OVAL的安卓漏洞检测评估系统[J]. 计算机科学, 2017, 44(4): 79-81.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .