计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 209-215.doi: 10.11896/jsjkx.181001939

• 人工智能 • 上一篇    下一篇

基于非结构化文本增强关联规则的知识推理方法

李智星1,2, 任诗雅1,2, 王化明1,2, 沈柯1   

  1. (重庆邮电大学计算机科学与技术学院 重庆400065)1
    (计算智能重庆市重点实验室 重庆400065)2
  • 收稿日期:2018-10-18 出版日期:2019-11-15 发布日期:2019-11-14
  • 通讯作者: 李智星(1985-),男,博士,副教授,主要研究方向为自然语言处理、机器学习,E-mail:lizx@cqupt.edu.cn
  • 作者简介:任诗雅(1994-),女,硕士,主要研究方向为自然语言处理、知识图谱;王化明(1995-),男,硕士,主要研究方向为自然语言处理、多粒度计算;沈柯(1996-),女,主要研究方向为自然语言处理。
  • 基金资助:
    本文受国家重点研发计划项目(2016QY01W0200),国家自然科学基金青年项目(61502066),重庆市基础与前沿研究计划项目(cstc2015jcyjA40018)资助。

Knowledge Reasoning Method Based on Unstructured Text-enhanced Association Rules

LI Zhi-xing1,2, REN Shi-ya1,2, WANG Hua-ming1,2, SHEN Ke1   

  1. (Coolege of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)1
    (Chongqing Key Lab of Computation Intelligence,Chongqing 400065,China)2
  • Received:2018-10-18 Online:2019-11-15 Published:2019-11-14

摘要: 知识图谱用一种结构化的方式存储实体、实体的属性以及实体之间的关系。由于知识图谱中的知识易于被计算机处理,因此它在许多自然语言处理任务中都起着至关重要的作用。虽然从绝对数量来看,现有的知识图谱已经包含了海量的三元组事实,但是与真实世界中存在的知识相比它远远不够。因此,如何完善知识图谱成为目前的研究热点。现有的研究方向主要分为内部推理和外部抽取两类,然而这些方法仍有很大的提升空间:一方面,由于知识图谱内部知识存在错误或缺失,可能会在推理时产生错误的扩散;另一方面,现有的知识抽取方法主要集中于对实体类型、关系等知识的抽取,从而导致抽取的知识不够全面。鉴于此,提出了一种基于非结构化文本增强关联规则的知识推理方法。该方法从非结构化文本表述中抽象出文本表述模式,并以词语分布袋的形式对其进行表示,进而结合知识图谱已有的知识构建关联规则。与传统关联规则的区别在于,该方法得到的关联规则可以通过与非结构化文本匹配的方式来完成知识推理。实验结果表明,与传统方法相比,该方法可以高效地从非结构化文本中推理出数量更大且质量更高的三元组知识。

关键词: 关联规则, 三元组知识, 文本增强, 知识图谱完善, 知识推理

Abstract: Knowledge bases (KBs) store entities,entity attributes and relations between entities in a structured manner.Because the knowledge in the KBs can be easily processed by computers,KBs play a vital role in many natural language processing (NLP) tasks.Although current KBs contain massive triple knowledge from the perspective of absolute quantity,they are far less than the knowledge existing in real world.Therefore,many researches focus on how to enrich the knowledge base with more high-quality knowledge.Internal reasoning and extracting from external resources are two main kinds of methods for knowledge base completion,but they still need to be improved.On the one hand,since the knowledge in KBs are not perfect and some errors exist,reasoning on such error knowledge will cause error propagation.On the other hand,existing extracting methods usually focus on limited relations and properties and thus cannot find comprehensive knowledge from external resources such as texts.In light of this,this paper proposed a knowledge reasoning method based on unstructured text-enhanced association rules.In this method,the text representation pattern is abstracted from the unstructured text firstly,then it is represented in the form of a bag of distribution,and the associa-tion rules can be mined through combining the knowledge of KBs.The difference from the traditional association rules is that the association rules obtained by the proposed method can directly match unstructured texts for knowledge reasoning.Experimental results show that the proposed method can efficiently infer triple knowledge from unstructured text with higher quality and larger quantity compared with traditional methods.

Key words: Association rules, Knowledge bases completion, Knowledge reasoning, Text-enhanced, Triple knowledge

中图分类号: 

  • TP391
[1]QI G L,GAO H,WU T X.The Research Advances of Knowledge Graph[J].Technology Intelligence Engineering,2017,3(1):4-25.(in Chinese)
漆桂林,高桓,吴天星.知识图谱研究进展[J].情报工程,2017,3(1):4-25.
[2]YIH W T,CHANG M W,HE X,et al.Semantic Parsing via Staged Query Graph Generation:Question Answering with Knowledge Base[C]∥Meeting of the Association for Computational Linguistics and the,International Joint Conference on Natural Language Processing.2015:1321-1331.
[3]LU W,WU C.Literature Review on Entity Linking[J].Technology Intelligence Engineering,2015,34(1):105-112.(in Chinese)
陆伟,武川.实体链接研究综述[J].情报学报,2015,34(1):105-112.
[4]AUER S,BIZER C,KOBILAROV G,et al.Dbpedia:A nucleus for a web of open data[M]∥The semantic web.Springer,Berlin,Heidelberg,2007:722-735.
[5]VRANDEČIćD,KRÖTZSCH M.Wikidata:a free collaborative knowledgebase[J].Communications of the ACM,2014,57(10):78-85.
[6]SUCHANEK F M,KASNECI G,WEIKUM G.Yago:a core of semantic knowledge[C]∥Proceedings of the 16th international conference on World Wide Web.ACM,2007:697-706.
[7]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:a collaboratively created graph database for structuring human knowledge[C]∥Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.ACM,2008:1247-1250.
[8]LIU L,REN X,ZHU Q,et al.Heterogeneous Supervision forRelation Extraction:A Representation Learning Approach[C]∥Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:46-56.
[9]YANG X,REN S,LI Y,et al.Relation Linking for WikidataUsing Bag of Distribution Representation[C]∥National CCF Conference on Natural Language Processing and Chinese Computing.Springer,Cham,2017:652-661.
[10]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating embeddings for modeling multi-relational data[C]∥Advances in Neural Information Processing Systems.2013:2787-2795.
[11]WANG Z,ZHANG J,FENG J,et al.Knowledge Graph Embedding by Translating on Hyperplanes[C]∥AAAI.2014,14:1112-1119.
[12]GALÁRRAGA L A,TEFLIOUDI C,HOSE K,et al.AMIE:association rule mining under incomplete evidence in ontological knowledge bases[C]∥Proceedings of the 22nd International Conference on World Wide Web.ACM,2013:413-422.
[13]GALÁRRAGA L,TEFLIOUDI C,HOSE K,et al.Fast rulemining in ontological knowledge bases with AMIE \$\$+ \$\$+[J].The International Journal on Very Large Data Bases,2015,24(6):707-730.
[14]WANG Z,LI J.RDF2Rules:Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles[DB/OL].(2015-12-24)[2018-08-20].https://arxiv.org/abs/1512.07734.
[15]ZENG D,LIU K,CHEN Y,et al.Distant supervision for relation extraction via piecewise convolutional neural networks[C]∥Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1753-1762.
[16]LIN Y,SHEN S,LIU Z,et al.Neural relation extraction withselective attention over instances[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:2124-2133.
[17]LI Q,JI H.Incremental joint extraction of entity mentions and relations[C]∥Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.2014,1:402-412.
[18]MIWA M,SASAKI Y.Modeling joint entity and relation extraction with table representation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:1858-1869.
[19]REN X,WU Z,HE W,et al.Cotype:Joint extraction of typed entities and relations with knowledge bases[C]∥Proceedings of the 26th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee.2017:1015-1024.
[20]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[21]KIM Y.Convolutional neural networks for sentence classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.2014:1746-1751.
[22]BERGER M J.Large scale multi-label text classification with semantic word vectors[R].Stanford University,2015.
[1] 曹扬晨, 朱国胜, 孙文和, 吴善超.
未知网络攻击识别关键技术研究
Study on Key Technologies of Unknown Network Attack Identification
计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
[2] 马瑞新, 李泽阳, 陈志奎, 赵亮.
知识图谱推理研究综述
Review of Reasoning on Knowledge Graph
计算机科学, 2022, 49(6A): 74-85. https://doi.org/10.11896/jsjkx.210100122
[3] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[4] 杨如涵, 戴毅茹, 王坚, 董津.
基于表示学习的工业领域人机物本体融合
Humans-Cyber-Physical Ontology Fusion of Industry Based on Representation Learning
计算机科学, 2021, 48(5): 190-196. https://doi.org/10.11896/jsjkx.200500023
[5] 沈夏炯, 杨继勇, 张磊.
基于不相关属性集合的属性探索算法
Attribute Exploration Algorithm Based on Unrelated Attribute Set
计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082
[6] 杭婷婷, 冯钧, 陆佳民.
知识图谱构建技术:分类、调查和未来方向
Knowledge Graph Construction Techniques:Taxonomy,Survey and Future Directions
计算机科学, 2021, 48(2): 175-189. https://doi.org/10.11896/jsjkx.200700010
[7] 张素梅, 张波涛.
一种基于量子耗散粒子群的评估模型构建方法
Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization
计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148
[8] 陈孟辉, 曹黔峰, 兰彦琦.
基于区块挖掘与重组的启发式算法求解置换流水车间调度问题
Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem
计算机科学, 2020, 47(6A): 108-113. https://doi.org/10.11896/JsJkx.190300151
[9] 崔巍, 贾晓琳, 樊帅帅, 朱晓燕.
一种新的不均衡关联分类算法
New Associative Classification Algorithm for Imbalanced Data
计算机科学, 2020, 47(6A): 488-493. https://doi.org/10.11896/JsJkx.190600132
[10] 王青松, 姜富山, 李菲.
大数据环境下基于关联规则的多标签学习算法
Multi-label Learning Algorithm Based on Association Rules in Big Data Environment
计算机科学, 2020, 47(5): 90-95. https://doi.org/10.11896/jsjkx.190300150
[11] 朱岸青, 李帅, 唐晓东.
Spark平台中的并行化FP_growth关联规则挖掘方法
Parallel FP_growth Association Rules Mining Method on Spark Platform
计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110
[12] 张春霞, 彭成, 罗妹秋, 牛振东.
数学课程知识图谱构建及其推理
Construction of Mathematics Course Knowledge Graph and Its Reasoning
计算机科学, 2020, 47(11A): 573-578. https://doi.org/10.11896/jsjkx.191200141
[13] 文习明,方良达,余泉,常亮,王驹.
多智能体模态逻辑系统KD45n中的知识遗忘
Knowledge Forgetting in Multi-agent Modal Logic System KD45n
计算机科学, 2019, 46(7): 195-205. https://doi.org/10.11896/j.issn.1002-137X.2019.07.030
[14] 张蕾,蔡明.
基于主题融合和关联规则挖掘的图像标注
Image Annotation Based on Topic Fusion and Frequent Patterns Mining
计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037
[15] 陆鑫赟, 王兴芬.
基于领域关联冗余的教务数据关联规则挖掘
Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy
计算机科学, 2019, 46(6A): 427-430.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!