计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 276-286.doi: 10.11896/jsjkx.210100218
马一帆1, 马涛涛2, 方芳3, 王石2, 唐素勤4, 曹存根2
MA Yi-fan1, MA Tao-tao2, FANG Fang3, WANG Shi2, TANG Su-qin4, CAO Cun-gen2
摘要: 精细化的领域文本分析是高质量领域知识获取的重要前提,它通常依赖于大量某种形式的语义文法产生式,但总结这些文法通常耗时耗力。对此,文中提出了一种基于容错Earley解析算法的语义文法自动学习方法,根据种子文法自动生成新的语义文法(包括词类和文法产生式),以减少人工成本。该方法利用优化后的容错Earley解析器,对输入的语句进行容错解析,然后根据容错解析生成的解析树产生候选语义文法,最后对候选语义文法进行过滤或纠正得到最终的语义文法。在5种不同疾病的中医医案的实验中,该方法的词类学习的正确率达到63.88%,文法产生式学习的正确率达到81.78%。
中图分类号:
[1]SARAWAGI S.Information Extraction[J].IEEE IntelligentSystems,2015,30(3):8-15. [2]PAULHEIM H,CIMIANO P.Knowledge graph refinement:A survey of approaches and evaluation methods[J].Semantic Web,2017,8(3):489-508. [3]LIU Y C,LI H Y.Survey of Domain Knowledge Graph Research[J].Computer System Application,2020,29(6):1-12. [4]ALANI H,SANGHEE K,MILLARD D,et al.Automatic onto-logy-based knowledge extraction from Web documents[J].Intelligent Systems,IEEE,2003,18(1):14-21. [5]VARGAS-VERA M,MOTTA E,DOMINGUE J,et al.Know-ledge extraction by using an ontology-based annotation tool[C]//K-cap Workshop on Knowledge Markup & Semantic Annotation.2001. [6]GARCEZ A,BRODA K,GABBAY D M.Symbolic knowledge extraction from trained neural networks:A sound approach[J].Artificial intelligence,2001,125(1/2):155-207. [7]BOGER Z,GUTERMAN H.Knowledge extraction from artificial neural network models[C]//IEEE International Conference on Systems,Man and Cybernetics.Computational Cybernetics and Simulation.IEEE,1997. [8]CUNGEN C,QIANGZE F,YING G,et al.Progress in the development of national knowledge infrastructure[J].Journal of Computer Science & Technology,2002,17(5):523-534. [9]WANG Y.Research on common sense knowledge acquisitionmethod based on semantic classification[D].Guangxi Normal University,2015. [10]MA T T.Research on the Design and Optimization Method of Domain Semantic Grammar[D].University of Chinese Academy of Sciences,2020. [11]SAKAKIBARA Y,MURAMATSU H.Learning Context-Free Grammars from Partially Structured Examples[C]//Lecture Notes in Computer Science(ICGI 2000).Berlin:Springer,2000:229-240. [12]SAKAKIBARA Y,KONDO M.GA-based Learning of Con-text-Free Grammars using Tabular Representations[C]//Procee-dings of the Sixteenth International Conference on Machine Learning (ICML 1999).Morgan Kaufmann,1999. [13]SAKAKIBARA Y.Learning context-free grammars using tabular representations[J].Pattern Recognition,2005,38(9):1372-1383. [14]GRAHAM S L,HARRISON M A,RUZZO W L.An Improved Context-Free Recognizer[J].ACM Transactions on Programming Languages and Systems,1980,2(3):415-462. [15]NAKAMURA K,MATSUMOTO M.Incremental learning ofcontext free grammars based on bottom-up parsing and search[J].Pattern Recognition,2005,38(9):1384-1392. [16]NAKAMURA K.Incremental Learning of Context Free Grammars by Bridging Rule Generation and Search for Semi-optimum Rule Sets[C]//International Colloquium on Grammatical Infe-rence.Berlin:Springer,2006. [17]IMADA K,NAKAMURA K.Search for Minimal and Semi-Minimal Rule Sets in Incremental Learning of Context-Free and Definite Clause Grammars[J].IEICE Transactions on Information & Systems,2010,93-D(5):1197-1204. [18]WANG D S.Research on domain-specific natural language understanding and semantic grammar learning[D].Beijing:University of Chinese Academy of Sciences,2012. [19]ZHOU D.Research on Chinese Semantic Grammar Expansion Method Based on Seed Grammar[D].Beijing:University of Chinese Academy of Sciences,2015. [20]HARADA T,ARAKI O,SAKURAI A.Learning context-freegrammars with recurrent neural networks[C]//International Joint Conference on Neural Networks.IEEE,2002. [21]COHEN M,CACIULARU A,REJWAN I,et al.Inducing Regular Grammars Using Recurrent Neural Networks[J].arXiv:1710.10453. [22]SHEN Y K,LIN Z H,HUANG C W,et al.Neural languagemodeling by jointly learning syntax and lexicon[C]//International Conference on Learning Representations.2018. [23]WU Z K,JOHNSON E,WEI Y,et al.REINAM:reinforcement learning for input-grammar inference[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Enginee-ring Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019).Association for Computing Machinery,New York,NY,USA,2019:488-498. [24]TOMITA M.An Efficient Augmented-Context-Free Parsing Algorithm[J].Computational Linguistics,1987,13(1):31-46. [25]EARLEY J.An Efficient Context-free Parsing Algorithm[J].Communications of the ACM,1970,26(1):57-61. [26]GU B,LI R,LIU K Y.Earley Algorithm with Forecasting Stra-tegy[J].Computer Science,2010,37(1):229-232. [27]FANG F.Research on Semantic Analysis and Knowledge Acquisition from Web Texts[D].University of Chinese Academy of Sciences,2019. [28]YANG G Z.Analysis and improvement of Earley algorithm[J].Journal of University of Science and Technology of China,1985(S1):90-98. [29]JACCARD P.The Distribution of the Flora in the Alpine Zone[J].New Phytologist,2010,11(2):37-50. |
[1] | 周琦,陆叶,李婷玉,王亚,张再跃,曹存根. 基于语义文法的地理实体位置关系的获取 Acquiring Relationships Between Geographical Entities Based on Semantic Grammar 计算机科学, 2016, 43(7): 208-216. https://doi.org/10.11896/j.issn.1002-137X.2016.07.038 |
[2] | 郑志高,刘京,王平,孙圣力. 时间加权不确定近邻协同过滤算法 Time-weighted Uncertain Nearest Neighbor Collaborative Filtering Algorithm 计算机科学, 2014, 41(8): 7-12. https://doi.org/10.11896/j.issn.1002-137X.2014.08.002 |
[3] | 孙德才,王晓霞. 一种基于尾匹配q-gram的近似串匹配算法 Approximate String Matching Using Tail Matched q-gram 计算机科学, 2014, 41(6): 243-249. https://doi.org/10.11896/j.issn.1002-137X.2014.06.048 |
[4] | 侯圣峦,刘磊,曹存根. 基于语义文法的网络舆情精准分析方法研究 Research on Accurate Analysis of Internet Public Opinion:A Semantic Grammar-based Method 计算机科学, 2014, 41(10): 225-231. https://doi.org/10.11896/j.issn.1002-137X.2014.10.048 |
[5] | 陈端兵,周玉林,傅彦. 一种基于邻居信息的最大派系过滤算法 Maximal Clique Percolation Algorithm Based on Neighboring Information 计算机科学, 2011, 38(1): 203-206. |
[6] | 刘震 佘堃 周明天. 基于Bayes参数估计的垃圾邮件过滤算法研究 计算机科学, 2005, 32(9): 55-57. |
[7] | 曾庆辉 邱玉辉. 一种基于协作过滤的电子图书推荐系统 计算机科学, 2005, 32(6): 147-150. |
[8] | 石霞军 林亚平 陈治平. 基于最小风险的贝叶斯邮件过滤算法 计算机科学, 2002, 29(8): 50-51. |
|