一种可快速迁移的领域知识图谱构建方法

doi:10.11896/jsjkx.210900018

摘要/Abstract

摘要： 领域知识图谱能清晰可视化地表示领域实体关系,高效准确地获取领域知识。构建领域知识图谱有助于推进相关领域的信息化发展,但构建领域知识图谱需要领域专家耗费大量的人力与时间成本,且很难迁移到其他领域中。为减少人工耗费,提升知识图谱构建方法的普适性,文中提出一种不依赖大量人工本体构建与数据标记的领域知识图谱通用构建方法;通过领域词典构建、数据获取与清洗、实体维护与链接、图谱更新与可视化4个步骤构建相关领域知识图谱。文中以网络安全领域为例构建知识图谱,详细介绍构建流程。同时,为确保图谱信息的领域相关性,文中提出一种基于BERT(Bidirectional Encoder Representations from Transformers)迁移模型与注意力机制的融合模型,该模型在文本分类中得到87.14%的F1值和93.51%的准确率。

关键词: 实体分类, 网络安全, 文本分类, 知识图谱构建

Abstract: Domain knowledge graph can clearly and visually represent domain entity relations,acquire knowledge efficiently and accurately.The construction of domain knowledge graph is helpful to promote the development of information technology in rela-ted fields,but the construction of domain knowledge graph requires huge manpower and time costs of experts,and it is difficult to migrate to other fields.In order to reduce the manpower cost and improve the versatility of knowledge graph construction me-thod,this paper proposes a general construction method of domain knowledge graph,which does not rely on a large of artificial ontology construction and data markup.The domain knowledge graph is constructed through four steps:domain dictionary construction,data acquisition and cleaning,entity linking and maintenance,and graph updating and visualization.This paper takes the domain of network security as an example to construct the knowledge graph and details the build process.At the same time,in order to improve the domain correlation of entities in the knowledge graph,a fusion model based on BERT(Bidirectional Encoder Representations from Transformers) and attention mechanism model is proposed in this paper.The F-score of this model in text classification is 87.14%,and the accuracy is 93.51%.

Key words: Entity classification, Knowledge graph construction, Network security, Text classification

中图分类号:

TP391

邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法[J]. 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018

DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method[J]. Computer Science, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018

参考文献

[1] BERNERSLEE T,HENDLER J,LASSILA O.The semanticWeb[J].Scientific American,2001,284(5):34-43.
[2] SINGHAL A.Introducing the Knowledge Graph:Things,NotStrings[EB/OL].[2013-04-10].http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html.
[3] FENSEL D,ŞIMŞEK U,ANGELE K,et al.Why We NeedKnowledge Graphs:Applications [M]//Knowledge Graphs.Cham:Springer,2020:95-112.
[4] ERXLEBEN F,GÜNTHER M,KRÖTZSCH M,et al.Introducing Wikidata to the linked data web[C]//International Semantic Web Conference.Cham:Springer,2014:50-65.
[5] XU B,LIANG J,XIE C,et al.CN-DBpedia2:An Extraction and Verification Framework for Enriching Chinese Encyclopedia Knowledge Base[J].Data Intelligence,2019,1(3):271-288.
[6] LIU Q,LI Y,DUAN H,et al.Knowledge Graph Construction Technique[J].Journal of Computer Research and Development,2016,53(3):582-600.
[7] ZHANG C X,PENG C,LUO M Q,et al.Construction of Mathe-matics Course Knowledge Graph and Its Reasoning[J].Computer Science,2020,47(S2):573-578.
[8] CHEN X J,XIANG Y.Construction and Application of Enterprise Risk Knowledge Graph[J].Computer Science,2020,47(11):237-243.
[9] GENG Z Q,CHEN G F,HAN Y M,et al.Semantic relation extraction using sequential and tree-structured LSTM with attention[J].Information Sciences,2020,509:183-192.
[10] YANG Y J,XU B,HU J W,et al.Accurate and efficient method for constructing domain knowledge graph[J].Ruan Jian Xue Bao/Journal of Software,2018,29(10):2931-2947.
[11] TOSI M D L,DOS REIS J C.Scikgraph:a knowledge graph approach to structure a scientific field[J].Journal of Informetrics,2021,15(1):101109.
[12] www.thinkpink.com.WebCrawler's History[EB/OL].Ar-chived from the original on 2005-11-28.Retrieved 2019-01-09.http://thinkpink.com/bp/WebCrawler/History.html.
[13] CHO J,GARCIA-MOLINA H,PAGE L.Efficient crawlingthrough URL ordering[J].Computer networks and ISDN systems,1998,30(1/2/3/4/5/6/7):161-172.
[14] LAWRENCE S,GILES C L.Accessibility of information on the web[J].Intelligence,2000,11(1):32-39.
[15] ABITEBOUL S,PREDA M,COBENA G.Adaptive on-line pageimportance computation[C]//Proceedings of the 12th International Conference on World Wide Web.2003:280-290.
[16] DANESHPAJOUH S,NASIRI M M,GHODSI M.A Fast Community Based Algorithm for Generating Web Crawler Seeds Set[C]//WEBIST(2).2008:98-105.
[17] MENCZER F.ARACHNID:Adaptive retrieval agents choosing heuristic neighborhoods for information discovery[C]//Machine Learning-international Workshop then Conference Morgan Kaufmann Publishers.INC,1997:227-235.
[18] DONG H,HUSSAIN F K.SOF:a semi-supervised ontology-learning-based focused crawler[J].Concurrency and Computation:Practice and Experience,2013,25(12):1755-1770.
[19] SHKAPENYUK V,SUEL T.Design and implementation of ahigh-performance distributed web crawler[C]//Proceedings 18th International Conference on Data Engineering.IEEE,2002:357-368.
[20] LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//ICML 2001.2001:282-289.
[21] GRUBER A,WEISS Y,ROSEN-ZVI M.Hidden topic markovmodels[C]//Artificial Intelligence and Statistics.PMLR,2007:163-170.
[22] ŽUKOV-GREGORIČA,BACHRACH Y,COOPE S.Namedentity recognition with parallel recurrent neural networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association forComputational Linguistics,2018:69-74.
[23] LIN B Y C,LEE D H,SHEN M,et al.TriggerNER:learningwith entity triggers as explanations for named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2020:8503-8511.
[24] RAJU S,PINGALI P,VARMA V.An unsupervised approachtoproduct attribute extraction[C]//Proceedings of the 2009 European Conference on Information Retrieval,LNCS 5478.Berlin:Springer,2009:796-800.
[25] SHINZATO K,SEKINE S.Unsupervised extraction of at-tributesand their values from product description[C]//Procee-dings of the6th International Joint Conference on Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2013:1339-1347.
[26] LOGAN R L IV,HUMEAU S,SINGH S.Multimodal attributeextraction[C]//Proceedings of the 6th Workshop on Automated Knowledge Base Construction at NIPS 2017.Red Hook,NY:Curran Associates Inc.,2017.
[27] ZENG D J,LIU K,LAI S W,et al.Relation classification viaconvolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers.Dublin:Dublin City University and Association for Computational Linguistics,2014:2335-2344.
[28] LI Y,LONG G D,SHEN T,et al.Self-attention enhancedselec-tive gate with entity-aware embedding for distantly supervised relation extraction[C]//Proceedings of the 34th AAAI Confe-rence on Artificial Intelligence.Palo Alto,CA:AAAI Press,2020:8269-8276.
[29] SAHU S K,THOMAS D,CHIU B,et al.Relation extraction with self-determined graph convolutional network[C]//Proceedings of the 29th ACM International Conference on Information and Knowledge Management.New York:ACM,2020:2205-2208.
[30] JIAO Z,SUN S,SUN K.Chinese lexical analysis with deep bi-gru-crf network[J].arXiv:1807.01882,2018.
[31] CHE W,FENG Y,QIN L,et al.N-LTP:A Open-source Neural Chinese Language Technology Platform with Pretrained Models[J].arXiv:2009.11616,2020.
[32] HE H,CHOI J D.The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders[J].arXiv:2109.06939,2021.
[33] QI P,DOZAT T,ZHANG Y,et al.Universal dependency parsing from scratch[J].arXiv:1901.10457,2019.
[34] WANG C,WANG H,ZHUANG H,et al.Chinese medicalnamed entity recognition based on multi-granularity semantic dictionary and multi modal tree[J].Journal of Biomedical Informatics,2020,111:103583.
[35] MA P,JIANG B,LU Z,et al.Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields[J].Tsinghua Science and Technology,2020,26(3):259-265.
[36] PENG Q,ZHU XH,SUN L,et al.IC-based approach for calculating word semantic similarity in CiLin[J].Application Research of Computers,2018,35(2):400-404.
[37] KOWSARI K,JAFARI MEIMANDI K,HEIDARYSAFA M,et al.Text classification algorithms:A survey[J].Information,2019,10(4):150.
[38] Research Center for Social Computing and Information Retrieval[EB /OL].(2015-09-13).http://www.ltp-cloud.com/download.
[39] Princeton University.WordNet[DB/OL].https://wordnet.prin-ceton.edu/
[40] GOYAL A,GUPTA V,KUMAR M.Recent named entity recognition and classification techniques:a systematic review[J].Computer Science Review,2018,29:21-43.
[41] LEE C,HWANG Y G,OH H J,et al.Fine-grained named entity recognition using conditional random fields for question answering[C]//Asia Information Retrieval Symposium(AIRS).Berlin:Springer,2006:581-587.
[42] LING X,WELD D S.Fine-grained entity recognition[C]//Twenty-Sixth AAAI Conference on Artificial Intelligence.2012.
[43] SHENG J,XIANG Z P,QIN B,et al.Fine-grained Named Entity Recognition for Multi-scenario[J].Journal of Chinese Information Processing,2019,33(6):85-92.
[44] SHIMAOKA S,STENETORP P,INUI K,et al.An attentive neural architecture for fine-grained entity type classification[J].arXiv:1604.05525,2016.
[45] HU X B,YU X Q,LI S M,et al.Additional knowledge En-hanced Chinese Name Entity Recognition[J/OL].[2021-03-25].https://doi.org/10.19678/j.issn.1000-3428.0059810.
[46] ANGLES R.The Property Graph Database Model[C]//ASCII Media Works.2018.
[47] Wikipedia[EB /OL].https://en.wikipedia.org/wiki/Graph_database.
[48] Baidu baike[EB /OL].https://baike.baidu.com/item/%E5%9B%BE%E5%BD%A2%E6%95%B0%E6%8D%AE%E5%BA%93.
[49] WANG Z,LI J,WANG Z,et al.XLore:A Large-scale English-Chinese Bilingual Knowledge Graph[C]//International Semantic Web Conference(Posters & Demos).2013:121-124.
[50] LANG Y Q.An English-Chinese Dictionary of Cryptographyand Cybersecurity [M].Publishing House of Electronics Industry,2017.
[51] SHEN C X,ZHANG H G,FENG D G,et al.Overview of Information Security[J].Scientia SinicaInformationis,2007(2):129-150.
[52] ZHANG H G,HAN W B,LAI X J,et al.Overview of Cyberspace Security[J].Scientia Sinica Informationis,2016,46(2):125-164.
[53] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.

相关文章 15

[1]	柳杰灵, 凌晓波, 张蕾, 王博, 王之梁, 李子木, 张辉, 杨家海, 吴程楠. 基于战术关联的网络安全风险评估框架 Network Security Risk Assessment Framework Based on Tactical Correlation 计算机科学, 2022, 49(9): 306-311. https://doi.org/10.11896/jsjkx.210600171
[2]	王磊, 李晓宇. 基于随机洋葱路由的LBS移动隐私保护方案 LBS Mobile Privacy Protection Scheme Based on Random Onion Routing 计算机科学, 2022, 49(9): 347-354. https://doi.org/10.11896/jsjkx.210800077
[3]	武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[4]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[5]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[7]	赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[8]	陶礼靖, 邱菡, 朱俊虎, 李航天. 面向网络安全训练评估的受训者行为描述模型 Model for the Description of Trainee Behavior for Cyber Security Exercises Assessment 计算机科学, 2022, 49(6A): 480-484. https://doi.org/10.11896/jsjkx.210800048
[9]	康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[10]	邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[11]	杜鸿毅, 杨华, 刘艳红, 杨鸿鹏. 基于网络媒体的非线性动力学信息传播模型 Nonlinear Dynamics Information Dissemination Model Based on Network Media 计算机科学, 2022, 49(6A): 280-284. https://doi.org/10.11896/jsjkx.210500043
[12]	吕鹏鹏, 王少影, 周文芳, 连阳阳, 高丽芳. 基于进化神经网络的电力信息网安全态势量化方法 Quantitative Method of Power Information Network Security Situation Based on Evolutionary Neural Network 计算机科学, 2022, 49(6A): 588-593. https://doi.org/10.11896/jsjkx.210200151
[13]	邓朝阳, 仲国强, 王栋. 基于注意力门控图神经网络的文本分类 Text Classification Based on Attention Gated Graph Neural Network 计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[14]	刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
[15]	钟桂凤, 庞雄文, 隋栋. 基于Word2Vec和改进注意力机制AlexNet-2的文本分类方法 Text Classification Method Based on Word2Vec and AlexNet-2 with Improved AttentionMechanism 计算机科学, 2022, 49(4): 288-293. https://doi.org/10.11896/jsjkx.211100016

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed