计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 272-276.doi: 10.11896/j.issn.1002-137X.2019.08.045

• 人工智能 • 上一篇    下一篇

基于外部语义知识补全的自然语言查询

冯雪   

  1. (北京信息科技大学计算机学院 北京100192)
  • 收稿日期:2018-09-26 出版日期:2019-08-15 发布日期:2019-08-15
  • 作者简介:冯雪(1984-),女,博士,讲师,主要研究方向为数字版权保护技术、自然语言处理,E-mail:fengxue@bistu.edu.cn
  • 基金资助:
    国家重点研发计划项目(2018YFB1004100)

Natural Language Querying with External Semantic Enrichment

FENG Xue   

  1. (Computer School,Beijing Information Science and Technology University,Beijing 100192,China)
  • Received:2018-09-26 Online:2019-08-15 Published:2019-08-15

摘要: 语义网是依托互联网技术而产生的一类非常重要的资源。目前,语义网中的用户查询仅支持形式化的查询方式,因此需要严格地遵循某种特定的语法规范,从而导致只有熟悉语义网系统和形式语言的专业人士才能正确进行查询操作。为了弥补这一缺陷,提出了一个无指导的自然语言查询系统,它能自动地将自然语言的句子转换成语义网查询支持的形式语言语句,从而方便非专业用户(即普通用户)使用。该系统首先根据语义网自动抽取给定句子中的所有实体和属性,然后将这些实体和属性关联起来形成一个语义关联图,最后通过启发式的方式从图中搜索出一条最优路径,并将这条路径转换成SPARQL语句。该系统最关键的部分在于语义网中的实体和属性覆盖度,它能直接决定语义关联图的好坏,从而影响系统的最终性能。为了提升系统的实用性,进一步利用外部语义网的知识来补全和丰富自然语言句子中所蕴含的信息,优化中间生成的语义关联度,得到更准确的SPARQL语句。最后采用美国地理问题集进行实验以验证该系统以及提出的改进方法,该数据集共包含了880个问句的人工SPARQL语句,是自然语言查询相关工作中一个被广泛认可的数据集。最终实验结果表明:提出的基准系统能够正确回答77.6%的问题,显著优于当前最好的无指导系统;当采用外部语义知识补全后,回答正确率达到78.5%。

关键词: SPARQL, 无指导学习, 形式语言, 语义网, 自然语言查询

Abstract: Semantic Web is one kind of extremely important resources based on Internet technique.Querying on a semantic Web only supports formal languages,which need manipulator to strictly observe certain syntax constraints,and thus only experts that are familiar with semantic Web system and formal language are capable of querying.To overcome this problem,this paper presented an unsupervised natural language querying system,which can convert natural languages into formal languages automatically,thus making common users query on a semantic web using natural languages conveniently.The system first extracts all entities and attributes in a sentence based on a specific semantic Web,then connects them to form a semantic relationship graph,and finally exploits a heuristic strategy to search for an optimum path which is used to produce the output SPARQL expression.The key of the system is the coverage of the entities and attributes from the semantic Web,which directly decides the quality of the inter-mediate semantic relationship graph,and influences the final performance of system.In order to achieve a practical system,this paper enriched a human-annotated semantic Web for a specific domain through using external semantic knowledge,so that the natural language formed languages can contain more information.By this method,better semantic relationship graphs can be obtained and more accurate SPARQL expressions for sentences are achieved.Finally,this paper used the dataset based on American geography for experimental evaluation to verify this system.The dataset is widely acceptable for related research work of natural language querying,which includes manually-annotated SPARQL expressions with 880 questions.The experimental results show that this system can correctly answer 77.6% of the natural queries,outperforming the best unsupervised system in the literature significantly.After knowledge enriching by the external semantic Web,the system reaches 78.5% in term of the correctly-answering accuracy

Key words: Formal language, Natural language querying, Semantic Web, SPARQL, Unsupervised learning

中图分类号: 

  • TP391
[1]FABIAN M.SUCHANE K,KASNEC G,et al.Yago:A Core of Semantic Knowledge[C]∥Proceedings of WWW.New York:ACM,2007:697-706.
[2]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:A Collaboratively Created Graph Database for Structuring Human Knowledge[C]∥Proceedings of the SIGMOD.New York:ACM,2008:1247-1250.
[3]BERNERSLEE T,AHENDLER J,LASSILA O.THE SEMANTIC WEB[J].Scientific American,2001,284(5):28-37.
[4]WANG C,XIONG M,ZHOU Q,et al.PANTO:A Portable Natural Language Interface to Ontologies[C]∥The Semantic Web:Research and Applications,ESWC 2007.Berlin:Springer,2007:473-487.
[5]TROELS A.An approach to knowledge-based query evaluation[J].Fuzzy Sets and Systems,2003,140(1):75-91.
[6]ZHANG Z R,YANG T Q.SPARQL ontology query based on natural language understanding[J].Journal of Computer Applications,2010,30(12):3397-3400.(in Chinese) 张宗仁,杨天奇.基于自然语言理解的SPARQL本体查询[J].计算机应用,2010,30(12):3397-3400.
[7]LI H,TIAN J W,WANG H H,et al.Ontology-based Natural Language Interface to Relational Databases[J].Computer Scien-ce,2010,37(6):200-205.(in Chinese) 李虎,田金文,王缓缓,等.基于 Ontology 的数据库自然语言查询接口的研究[J].计算机科学,2010,37(6):200-205.
[8]XU K,FENG Y S,ZHAO D Y,et al.Automatic Understanding of Natural Language Questions for Querying Chinese Know-ledge Bases[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2014,50(1):85-92.(in Chinese) 许坤,冯岩松,赵东岩,等.面向知识库的中文自然语言问句的语义理解[J].北京大学学报(自然科学版),2014,50(1):85-92.
[9]LINCKELS S,MEINEL C.Semantic Interpretation of Natural Language User Input to Improve Search in Multimedia Know-ledge Base [J].Information Technology,2007,49(1):40.
[10]BERANT J,CHOU A,FROSTIG R,et al.Semantic Parsing on Freebase from Question-Answer Pairs[C]∥Proceedings of the EMNLP 2013.USA:ACL,2013:1533-1544.
[11]LIANG P,JORDAN M I,KLEIN D.Learning dependency-based compositional semantics[J].Computational Linguistics,2013,39(2):389-446.
[12]KWIATKOWSKI T,CHOI E,ARTZI Y,et al.Scaling Semantic Parsers with On-the-fly Ontology Matching [C]∥Proceedings of the EMNLP.USA:ACL,2013:1545-1556.
[13]WONG Y W,MOONEY R J.Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus[C]∥Proceedings of ACL 2007.USA:ACL,2007:960-967.
[14]JONATHAN H,BERANT J.Neural Semantic Parsing over Multiple Knowledge-bases[C]∥Proceedings of the ACL 2017.USA:ACL,2017:623-628.
[15]ALON T,BERANT J.The Web as a Knowledge-Base for Answering Complex Questions[C]∥Proceedings of the NAACL-HLT.USA:ACL,2018:641-651.
[16]SUHR A,IYER S,ARTZI Y.Learning to Map Context-De- pendent Sentences to Executable Formal Queries[C]∥Procee-dings of NAACL-HLT.USA:ACL,2018:2238-2249.
[17]CHEN B,AN B,SUN L,et al.Semi-Supervised Lexicon Lear- ning for Wide-Coverage Semantic Parsing[C]∥Proceedings of the COLING 2018.USA:ACL,2018:892-904.
[18]MARTINS A F T,SMITH N A,XING E P,et al.Turbo par- sers:Dependency parsing by approximate variational inference[C]∥Proceedings of the EMNLP 2010.USA:ACL,2010:34-44.
[19]DAS D,CHEN D,MARTINS A F T,et al.Frame-semantic parsing[J].Computational Linguistics,2014,40(1):9-56.
[1] 陈艳, 陈佳晴, 陈星.
基于层次标签的机器学习流程组装
Machine Learning Process Composition Based on Hierarchical Label
计算机科学, 2021, 48(6A): 306-312. https://doi.org/10.11896/jsjkx.200500077
[2] 卢海川, 符海东, 刘宇.
基于CAN的地理语义数据存储与检索机制
Geo-semantic Data Storage and Retrieval Mechanism Based on CAN
计算机科学, 2019, 46(2): 171-177. https://doi.org/10.11896/j.issn.1002-137X.2019.02.027
[3] 刘浩舸,管建和.
基于正态分布对模糊概念自动计算的FPDA应用设计
FPDA of Fuzzy Concepts Automatic Calculation PDA Designing Based on Normal Distribution
计算机科学, 2017, 44(Z6): 557-559. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.124
[4] 柯昌博,黄志球,肖甫.
基于本体概念相似度的软件构件检索方法
Software Component Retrieval Method Based on Ontology Concept Similarity
计算机科学, 2017, 44(12): 144-149. https://doi.org/10.11896/j.issn.1002-137X.2017.12.028
[5] 苗德成,奚建清,苏锦钿.
形式语言基于Monads的语义计算模型
Semantics Computational Model of Formal Languages Based on Monads
计算机科学, 2017, 44(1): 199-202. https://doi.org/10.11896/j.issn.1002-137X.2017.01.038
[6] 董书暕,汪璟玢,陈远.
HMSST+:基于分布式内存数据库的HMSST算法优化
HMSST+:HMSST Algorithm Optimization Based on Distributed Memory Database
计算机科学, 2016, 43(3): 220-224. https://doi.org/10.11896/j.issn.1002-137X.2016.03.040
[7] 王诗碕,李伊潇,沈立炜,赵文耘.
本体概念图的展示过程及技术实现
Display Process and Technique Implementation of Ontology Conceptual Diagram
计算机科学, 2015, 42(12): 87-91.
[8] 柯叶青,马志柔,伍海江,刘 杰.
一种简历语义搜索系统的实现方法
SmartHR:A Resume Query and Management System Based on Semantic Web
计算机科学, 2015, 42(12): 56-59.
[9] 叶锡君,尹岩.
基于认知语言学的自然语言语义表示方法
Natural Language Semantic Representation Based on Cognitive Linguistics
计算机科学, 2014, 41(Z6): 98-102.
[10] 董书暕,汪璟玢.
HMSST:一种高效的SPARQL查询优化算法
HMSST:An Efficient Algorithm for SPARQL Query
计算机科学, 2014, 41(Z11): 323-326.
[11] 汪璟玢,方知立,张燕琴.
面向分布式的SPARQL查询优化算法
Distributed Optimized Query Algorithm Based on SPARQL
计算机科学, 2014, 41(7): 227-231. https://doi.org/10.11896/j.issn.1002-137X.2014.07.047
[12] 王汀,邸瑞华,李维铭.
一种基于同义词词林的中文大规模本体映射方案
Tongyici Cilin-based Mapping Approach for Large-scale Chinese Ontology
计算机科学, 2014, 41(5): 120-123. https://doi.org/10.11896/j.issn.1002-137X.2014.05.026
[13] 王海荣,马宗民,程经纬.
一种支持用户偏好的RDF模糊查询方法
Approach for Querying RDF with Fuzzy Conditions and User Preferences
计算机科学, 2013, 40(8): 176-180.
[14] 蔡国永,林 航,文益民.
社会语义网社区发现标签传递算法研究
Study on Label Propagation Based Community Detection Algorithm for Social Semantic Network
计算机科学, 2013, 40(2): 53-57.
[15] 苗德成,奚建清.
一种时态数据形式语言模型
Formal Languages Model for Temporal Data
计算机科学, 2012, 39(4): 172-176.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!