计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 276-286.doi: 10.11896/jsjkx.210100218

• 人工智能 • 上一篇    下一篇

基于容错Earley解析算法的领域语义文法自动学习方法

马一帆1, 马涛涛2, 方芳3, 王石2, 唐素勤4, 曹存根2   

  1. 1 广西师范大学计算机科学与信息工程学院 广西 桂林541000
    2 中国科学院计算技术研究所 北京100190
    3 中国科学院信息工程研究所 北京100190
    4 广西师范大学教育学部教育技术系 广西 桂林541000
  • 收稿日期:2021-01-28 修回日期:2021-05-19 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 曹存根(cgcao@ict.ac.cn)
  • 作者简介:1363832871@qq.com
  • 基金资助:
    科技部重点研发计划课题(2017YFC1700302);北京市科技新星计划交叉学科合作课题(Z191100001119014);国家重点研发计划重点专项(2017YFB1002300);国家自然科学基金(61967002)

Automatic Learning Method of Domain Semantic Grammar Based on Fault-tolerant Earley Parsing Algorithm

MA Yi-fan1, MA Tao-tao2, FANG Fang3, WANG Shi2, TANG Su-qin4, CAO Cun-gen2   

  1. 1 School of Computer Science and Information Engineering,Guangxi Normal University,Guilin,Guangxi 541000,China
    2 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    3 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100190,China
    4 Department of Educational Technology,Faculty of Education,Guangxi Normal University,Guilin,Guangxi 541000,China
  • Received:2021-01-28 Revised:2021-05-19 Online:2021-11-15 Published:2021-11-10
  • About author:MA Yi-fan,born in 1996,postgraduate.Her main research interests include na-tural language processing and so on.
    CAO Cun-gen,born in 1964,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include large-scale knowledge process and so on.
  • Supported by:
    Key Research and Development Projects of the Ministry of Science and Technology(2017YFC1700302),Beijing NOVA Program(Cross-discipline,Z191100001119014),National Key Research and Development Program of China(2017YFB1002300) and National Natural Science Foundation of China(61967002).

摘要: 精细化的领域文本分析是高质量领域知识获取的重要前提,它通常依赖于大量某种形式的语义文法产生式,但总结这些文法通常耗时耗力。对此,文中提出了一种基于容错Earley解析算法的语义文法自动学习方法,根据种子文法自动生成新的语义文法(包括词类和文法产生式),以减少人工成本。该方法利用优化后的容错Earley解析器,对输入的语句进行容错解析,然后根据容错解析生成的解析树产生候选语义文法,最后对候选语义文法进行过滤或纠正得到最终的语义文法。在5种不同疾病的中医医案的实验中,该方法的词类学习的正确率达到63.88%,文法产生式学习的正确率达到81.78%。

关键词: 过滤算法, 容错Earley解析, 文法学习, 语义纠正, 语义文法

Abstract: Refined domain text analysis is an important prerequisite for high-quality domain knowledge acquisition.It usually relies on a large number of some form of semantic grammars,but summarizing them is often time-consuming and labor-intensive.In this paper,an automatic learning method of semantic grammar based on fault-tolerant Earley parsing algorithm is proposed,which automatically generates new semantic grammars (including lexicons and grammar production rules) according to seed grammar to reduce labor costs.This method uses the optimized fault-tolerant Earley parser to perform fault-tolerant parsing on the input statements,and then generates candidate semantic grammars based on the parse tree generated by the fault-tolerant parsing.Finally,the candidate semantic grammars are filtered or corrected to obtain the final semantic grammars.In the experiment of five TCM medical records with different diseases,the precision rate of learning new lexicons is 63.88%,and precision rate of learning new grammar production rules is 81.78%.

Key words: Fault-tolerant Earley parser, Filtering algorithm, Grammar learning, Semantic correction, Semantic grammar

中图分类号: 

  • TP391
[1]SARAWAGI S.Information Extraction[J].IEEE IntelligentSystems,2015,30(3):8-15.
[2]PAULHEIM H,CIMIANO P.Knowledge graph refinement:A survey of approaches and evaluation methods[J].Semantic Web,2017,8(3):489-508.
[3]LIU Y C,LI H Y.Survey of Domain Knowledge Graph Research[J].Computer System Application,2020,29(6):1-12.
[4]ALANI H,SANGHEE K,MILLARD D,et al.Automatic onto-logy-based knowledge extraction from Web documents[J].Intelligent Systems,IEEE,2003,18(1):14-21.
[5]VARGAS-VERA M,MOTTA E,DOMINGUE J,et al.Know-ledge extraction by using an ontology-based annotation tool[C]//K-cap Workshop on Knowledge Markup & Semantic Annotation.2001.
[6]GARCEZ A,BRODA K,GABBAY D M.Symbolic knowledge extraction from trained neural networks:A sound approach[J].Artificial intelligence,2001,125(1/2):155-207.
[7]BOGER Z,GUTERMAN H.Knowledge extraction from artificial neural network models[C]//IEEE International Conference on Systems,Man and Cybernetics.Computational Cybernetics and Simulation.IEEE,1997.
[8]CUNGEN C,QIANGZE F,YING G,et al.Progress in the development of national knowledge infrastructure[J].Journal of Computer Science & Technology,2002,17(5):523-534.
[9]WANG Y.Research on common sense knowledge acquisitionmethod based on semantic classification[D].Guangxi Normal University,2015.
[10]MA T T.Research on the Design and Optimization Method of Domain Semantic Grammar[D].University of Chinese Academy of Sciences,2020.
[11]SAKAKIBARA Y,MURAMATSU H.Learning Context-Free Grammars from Partially Structured Examples[C]//Lecture Notes in Computer Science(ICGI 2000).Berlin:Springer,2000:229-240.
[12]SAKAKIBARA Y,KONDO M.GA-based Learning of Con-text-Free Grammars using Tabular Representations[C]//Procee-dings of the Sixteenth International Conference on Machine Learning (ICML 1999).Morgan Kaufmann,1999.
[13]SAKAKIBARA Y.Learning context-free grammars using tabular representations[J].Pattern Recognition,2005,38(9):1372-1383.
[14]GRAHAM S L,HARRISON M A,RUZZO W L.An Improved Context-Free Recognizer[J].ACM Transactions on Programming Languages and Systems,1980,2(3):415-462.
[15]NAKAMURA K,MATSUMOTO M.Incremental learning ofcontext free grammars based on bottom-up parsing and search[J].Pattern Recognition,2005,38(9):1384-1392.
[16]NAKAMURA K.Incremental Learning of Context Free Grammars by Bridging Rule Generation and Search for Semi-optimum Rule Sets[C]//International Colloquium on Grammatical Infe-rence.Berlin:Springer,2006.
[17]IMADA K,NAKAMURA K.Search for Minimal and Semi-Minimal Rule Sets in Incremental Learning of Context-Free and Definite Clause Grammars[J].IEICE Transactions on Information & Systems,2010,93-D(5):1197-1204.
[18]WANG D S.Research on domain-specific natural language understanding and semantic grammar learning[D].Beijing:University of Chinese Academy of Sciences,2012.
[19]ZHOU D.Research on Chinese Semantic Grammar Expansion Method Based on Seed Grammar[D].Beijing:University of Chinese Academy of Sciences,2015.
[20]HARADA T,ARAKI O,SAKURAI A.Learning context-freegrammars with recurrent neural networks[C]//International Joint Conference on Neural Networks.IEEE,2002.
[21]COHEN M,CACIULARU A,REJWAN I,et al.Inducing Regular Grammars Using Recurrent Neural Networks[J].arXiv:1710.10453.
[22]SHEN Y K,LIN Z H,HUANG C W,et al.Neural languagemodeling by jointly learning syntax and lexicon[C]//International Conference on Learning Representations.2018.
[23]WU Z K,JOHNSON E,WEI Y,et al.REINAM:reinforcement learning for input-grammar inference[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Enginee-ring Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019).Association for Computing Machinery,New York,NY,USA,2019:488-498.
[24]TOMITA M.An Efficient Augmented-Context-Free Parsing Algorithm[J].Computational Linguistics,1987,13(1):31-46.
[25]EARLEY J.An Efficient Context-free Parsing Algorithm[J].Communications of the ACM,1970,26(1):57-61.
[26]GU B,LI R,LIU K Y.Earley Algorithm with Forecasting Stra-tegy[J].Computer Science,2010,37(1):229-232.
[27]FANG F.Research on Semantic Analysis and Knowledge Acquisition from Web Texts[D].University of Chinese Academy of Sciences,2019.
[28]YANG G Z.Analysis and improvement of Earley algorithm[J].Journal of University of Science and Technology of China,1985(S1):90-98.
[29]JACCARD P.The Distribution of the Flora in the Alpine Zone[J].New Phytologist,2010,11(2):37-50.
[1] 周琦,陆叶,李婷玉,王亚,张再跃,曹存根.
基于语义文法的地理实体位置关系的获取
Acquiring Relationships Between Geographical Entities Based on Semantic Grammar
计算机科学, 2016, 43(7): 208-216. https://doi.org/10.11896/j.issn.1002-137X.2016.07.038
[2] 郑志高,刘京,王平,孙圣力.
时间加权不确定近邻协同过滤算法
Time-weighted Uncertain Nearest Neighbor Collaborative Filtering Algorithm
计算机科学, 2014, 41(8): 7-12. https://doi.org/10.11896/j.issn.1002-137X.2014.08.002
[3] 孙德才,王晓霞.
一种基于尾匹配q-gram的近似串匹配算法
Approximate String Matching Using Tail Matched q-gram
计算机科学, 2014, 41(6): 243-249. https://doi.org/10.11896/j.issn.1002-137X.2014.06.048
[4] 侯圣峦,刘磊,曹存根.
基于语义文法的网络舆情精准分析方法研究
Research on Accurate Analysis of Internet Public Opinion:A Semantic Grammar-based Method
计算机科学, 2014, 41(10): 225-231. https://doi.org/10.11896/j.issn.1002-137X.2014.10.048
[5] 陈端兵,周玉林,傅彦.
一种基于邻居信息的最大派系过滤算法
Maximal Clique Percolation Algorithm Based on Neighboring Information
计算机科学, 2011, 38(1): 203-206.
[6] 刘震 佘堃 周明天.
基于Bayes参数估计的垃圾邮件过滤算法研究

计算机科学, 2005, 32(9): 55-57.
[7] 曾庆辉 邱玉辉.
一种基于协作过滤的电子图书推荐系统

计算机科学, 2005, 32(6): 147-150.
[8] 石霞军 林亚平 陈治平.
基于最小风险的贝叶斯邮件过滤算法

计算机科学, 2002, 29(8): 50-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!