Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 100-108.doi: 10.11896/jsjkx.210900018

• Intelligent Computing • Previous Articles     Next Articles

Fast and Transmissible Domain Knowledge Graph Construction Method

DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu   

  1. Schoolof Cyber Science and Engineering,Sichuan University,Chengdu 610207,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:DENG Kai,born in 1996,postgraduate.His main research interests include knowledge graph and network application security.
    YANG Pin,born in 1967,Ph.D,professor.His main research interests include software security and network security.

Abstract: Domain knowledge graph can clearly and visually represent domain entity relations,acquire knowledge efficiently and accurately.The construction of domain knowledge graph is helpful to promote the development of information technology in rela-ted fields,but the construction of domain knowledge graph requires huge manpower and time costs of experts,and it is difficult to migrate to other fields.In order to reduce the manpower cost and improve the versatility of knowledge graph construction me-thod,this paper proposes a general construction method of domain knowledge graph,which does not rely on a large of artificial ontology construction and data markup.The domain knowledge graph is constructed through four steps:domain dictionary construction,data acquisition and cleaning,entity linking and maintenance,and graph updating and visualization.This paper takes the domain of network security as an example to construct the knowledge graph and details the build process.At the same time,in order to improve the domain correlation of entities in the knowledge graph,a fusion model based on BERT(Bidirectional Encoder Representations from Transformers) and attention mechanism model is proposed in this paper.The F-score of this model in text classification is 87.14%,and the accuracy is 93.51%.

Key words: Entity classification, Knowledge graph construction, Network security, Text classification

CLC Number: 

  • TP391
[1] BERNERSLEE T,HENDLER J,LASSILA O.The semanticWeb[J].Scientific American,2001,284(5):34-43.
[2] SINGHAL A.Introducing the Knowledge Graph:Things,NotStrings[EB/OL].[2013-04-10].http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html.
[3] FENSEL D,ŞIMŞEK U,ANGELE K,et al.Why We NeedKnowledge Graphs:Applications [M]//Knowledge Graphs.Cham:Springer,2020:95-112.
[4] ERXLEBEN F,GÜNTHER M,KRÖTZSCH M,et al.Introducing Wikidata to the linked data web[C]//International Semantic Web Conference.Cham:Springer,2014:50-65.
[5] XU B,LIANG J,XIE C,et al.CN-DBpedia2:An Extraction and Verification Framework for Enriching Chinese Encyclopedia Knowledge Base[J].Data Intelligence,2019,1(3):271-288.
[6] LIU Q,LI Y,DUAN H,et al.Knowledge Graph Construction Technique[J].Journal of Computer Research and Development,2016,53(3):582-600.
[7] ZHANG C X,PENG C,LUO M Q,et al.Construction of Mathe-matics Course Knowledge Graph and Its Reasoning[J].Computer Science,2020,47(S2):573-578.
[8] CHEN X J,XIANG Y.Construction and Application of Enterprise Risk Knowledge Graph[J].Computer Science,2020,47(11):237-243.
[9] GENG Z Q,CHEN G F,HAN Y M,et al.Semantic relation extraction using sequential and tree-structured LSTM with attention[J].Information Sciences,2020,509:183-192.
[10] YANG Y J,XU B,HU J W,et al.Accurate and efficient method for constructing domain knowledge graph[J].Ruan Jian Xue Bao/Journal of Software,2018,29(10):2931-2947.
[11] TOSI M D L,DOS REIS J C.Scikgraph:a knowledge graph approach to structure a scientific field[J].Journal of Informetrics,2021,15(1):101109.
[12] www.thinkpink.com.WebCrawler's History[EB/OL].Ar-chived from the original on 2005-11-28.Retrieved 2019-01-09.http://thinkpink.com/bp/WebCrawler/History.html.
[13] CHO J,GARCIA-MOLINA H,PAGE L.Efficient crawlingthrough URL ordering[J].Computer networks and ISDN systems,1998,30(1/2/3/4/5/6/7):161-172.
[14] LAWRENCE S,GILES C L.Accessibility of information on the web[J].Intelligence,2000,11(1):32-39.
[15] ABITEBOUL S,PREDA M,COBENA G.Adaptive on-line pageimportance computation[C]//Proceedings of the 12th International Conference on World Wide Web.2003:280-290.
[16] DANESHPAJOUH S,NASIRI M M,GHODSI M.A Fast Community Based Algorithm for Generating Web Crawler Seeds Set[C]//WEBIST(2).2008:98-105.
[17] MENCZER F.ARACHNID:Adaptive retrieval agents choosing heuristic neighborhoods for information discovery[C]//Machine Learning-international Workshop then Conference Morgan Kaufmann Publishers.INC,1997:227-235.
[18] DONG H,HUSSAIN F K.SOF:a semi-supervised ontology-learning-based focused crawler[J].Concurrency and Computation:Practice and Experience,2013,25(12):1755-1770.
[19] SHKAPENYUK V,SUEL T.Design and implementation of ahigh-performance distributed web crawler[C]//Proceedings 18th International Conference on Data Engineering.IEEE,2002:357-368.
[20] LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//ICML 2001.2001:282-289.
[21] GRUBER A,WEISS Y,ROSEN-ZVI M.Hidden topic markovmodels[C]//Artificial Intelligence and Statistics.PMLR,2007:163-170.
[22] ŽUKOV-GREGORIČA,BACHRACH Y,COOPE S.Namedentity recognition with parallel recurrent neural networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association forComputational Linguistics,2018:69-74.
[23] LIN B Y C,LEE D H,SHEN M,et al.TriggerNER:learningwith entity triggers as explanations for named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2020:8503-8511.
[24] RAJU S,PINGALI P,VARMA V.An unsupervised approachtoproduct attribute extraction[C]//Proceedings of the 2009 European Conference on Information Retrieval,LNCS 5478.Berlin:Springer,2009:796-800.
[25] SHINZATO K,SEKINE S.Unsupervised extraction of at-tributesand their values from product description[C]//Procee-dings of the6th International Joint Conference on Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2013:1339-1347.
[26] LOGAN R L IV,HUMEAU S,SINGH S.Multimodal attributeextraction[C]//Proceedings of the 6th Workshop on Automated Knowledge Base Construction at NIPS 2017.Red Hook,NY:Curran Associates Inc.,2017.
[27] ZENG D J,LIU K,LAI S W,et al.Relation classification viaconvolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers.Dublin:Dublin City University and Association for Computational Linguistics,2014:2335-2344.
[28] LI Y,LONG G D,SHEN T,et al.Self-attention enhancedselec-tive gate with entity-aware embedding for distantly supervised relation extraction[C]//Proceedings of the 34th AAAI Confe-rence on Artificial Intelligence.Palo Alto,CA:AAAI Press,2020:8269-8276.
[29] SAHU S K,THOMAS D,CHIU B,et al.Relation extraction with self-determined graph convolutional network[C]//Proceedings of the 29th ACM International Conference on Information and Knowledge Management.New York:ACM,2020:2205-2208.
[30] JIAO Z,SUN S,SUN K.Chinese lexical analysis with deep bi-gru-crf network[J].arXiv:1807.01882,2018.
[31] CHE W,FENG Y,QIN L,et al.N-LTP:A Open-source Neural Chinese Language Technology Platform with Pretrained Models[J].arXiv:2009.11616,2020.
[32] HE H,CHOI J D.The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders[J].arXiv:2109.06939,2021.
[33] QI P,DOZAT T,ZHANG Y,et al.Universal dependency parsing from scratch[J].arXiv:1901.10457,2019.
[34] WANG C,WANG H,ZHUANG H,et al.Chinese medicalnamed entity recognition based on multi-granularity semantic dictionary and multi modal tree[J].Journal of Biomedical Informatics,2020,111:103583.
[35] MA P,JIANG B,LU Z,et al.Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields[J].Tsinghua Science and Technology,2020,26(3):259-265.
[36] PENG Q,ZHU XH,SUN L,et al.IC-based approach for calculating word semantic similarity in CiLin[J].Application Research of Computers,2018,35(2):400-404.
[37] KOWSARI K,JAFARI MEIMANDI K,HEIDARYSAFA M,et al.Text classification algorithms:A survey[J].Information,2019,10(4):150.
[38] Research Center for Social Computing and Information Retrieval[EB /OL].(2015-09-13).http://www.ltp-cloud.com/download.
[39] Princeton University.WordNet[DB/OL].https://wordnet.prin-ceton.edu/
[40] GOYAL A,GUPTA V,KUMAR M.Recent named entity recognition and classification techniques:a systematic review[J].Computer Science Review,2018,29:21-43.
[41] LEE C,HWANG Y G,OH H J,et al.Fine-grained named entity recognition using conditional random fields for question answering[C]//Asia Information Retrieval Symposium(AIRS).Berlin:Springer,2006:581-587.
[42] LING X,WELD D S.Fine-grained entity recognition[C]//Twenty-Sixth AAAI Conference on Artificial Intelligence.2012.
[43] SHENG J,XIANG Z P,QIN B,et al.Fine-grained Named Entity Recognition for Multi-scenario[J].Journal of Chinese Information Processing,2019,33(6):85-92.
[44] SHIMAOKA S,STENETORP P,INUI K,et al.An attentive neural architecture for fine-grained entity type classification[J].arXiv:1604.05525,2016.
[45] HU X B,YU X Q,LI S M,et al.Additional knowledge En-hanced Chinese Name Entity Recognition[J/OL].[2021-03-25].https://doi.org/10.19678/j.issn.1000-3428.0059810.
[46] ANGLES R.The Property Graph Database Model[C]//ASCII Media Works.2018.
[47] Wikipedia[EB /OL].https://en.wikipedia.org/wiki/Graph_database.
[48] Baidu baike[EB /OL].https://baike.baidu.com/item/%E5%9B%BE%E5%BD%A2%E6%95%B0%E6%8D%AE%E5%BA%93.
[49] WANG Z,LI J,WANG Z,et al.XLore:A Large-scale English-Chinese Bilingual Knowledge Graph[C]//International Semantic Web Conference(Posters & Demos).2013:121-124.
[50] LANG Y Q.An English-Chinese Dictionary of Cryptographyand Cybersecurity [M].Publishing House of Electronics Industry,2017.
[51] SHEN C X,ZHANG H G,FENG D G,et al.Overview of Information Security[J].Scientia SinicaInformationis,2007(2):129-150.
[52] ZHANG H G,HAN W B,LAI X J,et al.Overview of Cyberspace Security[J].Scientia Sinica Informationis,2016,46(2):125-164.
[53] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[1] LIU Jie-ling, LING Xiao-bo, ZHANG Lei, WANG Bo, WANG Zhi-liang, LI Zi-mu, ZHANG Hui, YANG Jia-hai, WU Cheng-nan. Network Security Risk Assessment Framework Based on Tactical Correlation [J]. Computer Science, 2022, 49(9): 306-311.
[2] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[3] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[4] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[5] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[6] ZHAO Dong-mei, WU Ya-xing, ZHANG Hong-bin. Network Security Situation Prediction Based on IPSO-BiLSTM [J]. Computer Science, 2022, 49(7): 357-362.
[7] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[8] LYU Peng-peng, WANG Shao-ying, ZHOU Wen-fang, LIAN Yang-yang, GAO Li-fang. Quantitative Method of Power Information Network Security Situation Based on Evolutionary Neural Network [J]. Computer Science, 2022, 49(6A): 588-593.
[9] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[10] DU Hong-yi, YANG Hua, LIU Yan-hong, YANG Hong-peng. Nonlinear Dynamics Information Dissemination Model Based on Network Media [J]. Computer Science, 2022, 49(6A): 280-284.
[11] DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[12] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[13] ZHONG Gui-feng, PANG Xiong-wen, SUI Dong. Text Classification Method Based on Word2Vec and AlexNet-2 with Improved AttentionMechanism [J]. Computer Science, 2022, 49(4): 288-293.
[14] DENG Wei-bin, ZHU Kun, LI Yun-bo, HU Feng. FMNN:Text Classification Model Fused with Multiple Neural Networks [J]. Computer Science, 2022, 49(3): 281-287.
[15] LIANG Jing-ru, E Hai-hong, Song Mei-na. Method of Domain Knowledge Graph Construction Based on Property Graph Model [J]. Computer Science, 2022, 49(2): 174-181.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!