计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240400139-7.doi: 10.11896/jsjkx.240400139
尹宝生, 宗辰
YIN Baosheng, ZONG Chen
摘要: 针对中文的一词多义特点,基于现有各类汉语词典资源构建一个义项全面、描述规范的中文多义词知识库,对于汉语语义分析、智能问答、机器翻译以及大语言模型消歧能力调优和评估等具有重要意义。文中针对《现代汉语词典》和《现代汉语规范词典》等资源整合过程中“词条义项含义相同但描述不同”等问题进行了深入分析,并创新性地提出了基于大语言模型和提示学习的多义词义项融合技术,即充分利用大语言模型对常识知识的分析理解和辅助决策能力,通过有效的问题分解策略和提示模版设计,以及义项关系交叉验证等手段完成了多义词义项的自动化融合工作。实验结果表明,在通过正态分布抽取50个多义词共754个义项对的评测数据上,基于上述算法的义项融合的正确率达96.26%,Dice系数为0.973 3。该项研究验证了利用大语言模型开展中文知识资源自动化加工的可行性和有效性,与传统依赖语言专家加工模式相比,在保证较高质量的前提下,显著提升了知识加工效率。
中图分类号:
[1]LI J Z,HUANG C N.An Unsupervised Word Sense Tagging Method Based on Transformation [J].Tsinghua Science and Technology,1999(7):116-120. [2]SHI J M,ZAN H Y,HAN Y J.Specification of the large-scale Chinese lexical semantic knowledge base building[J].Journal of Shanxi University(Natural ence Edition),2015,38(4):581-587. [3]FELLBAUMC.WordNet[M]//Theory and applications of ontology:computer applications.Dordrecht:Springer Netherlands,2010:231-243. [4]DONG Z,DONG Q.HowNet-a hybrid language and knowledge resource[C]//International Conference on Natural Language Processing and Knowledge Engineering,2003.IEEE,2003:820-824. [5]JIN P,WU Y F,YU S W.A Review of Word Sense Annotation Corpora [J].Journal of Chinese Information Science,2008,22(3):8. [6]Institute of Linguistics,Chinese Academy of Social Sciences.Modern Chinese Dictionary [M].The Commercial Press,2002. [7]LI Z,DING N,LIU Z,et al.Chinese relation extraction with multi-grained information and external linguistic knowledge[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:4377-4386. [8]TAUCHMANN C.Advanced Corpus Annotation Strategies for NLP.Applications in Automatic Summarization and Text Classification[D].Technische Universität Darmstadt,2021. [9]FORTK.Collaborative annotation for reliable natural language processing:Technical and sociological aspects[M].John Wiley &Sons,2016. [10]PICORAL A,STAPLES S,REPPEN R.Automated annotation of learner English:An evaluation of software tools[J].International Journal of Learner Corpus Research,2021,7(1):17-52. [11]ZHENG H,LI L,DAI D,et al.Leveraging word-formationknowledge for Chinese word sense disambiguation[C]//Findings of the Association for Computational Linguistics:EMNLP 2021.2021:918-923. [12]ZHENG H,LI L,DAI D,et al.Leveraging word-formationknowledge for Chinese word sense disambiguation[C]//Findings of the Association for Computational Linguistics:EMNLP 2021.2021:918-923. [13]LESTER B,AL-RFOU R,CONSTANT N.The power of scale for parameter-efficient prompt tuning[J].arXiv:2104.08691,2021. [14]VAN ZANDVOORT D,WIERSEMA L,HUIBERST,et al.Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting[J].arXiv:2311.13274,2023. [15]AMYEEN R.Prompt-Engineering and Transformer-basedQuestion Generation and Evaluation[J].arXiv:2310.18867,2023. [16]GILARDI F,ALIZADEH M,KUBLI M.ChatGPT outperforms crowd workers for text-annotation tasks[J].Proceedings of the National Academy of Sciences,2023,120(30):e2305016120. [17]GAN L,DU X.Research on Chinese word sense disambigution based on Prompt learning[C]//2022 3rd International Conference on Computer Science and Management Technology(ICCSMT).IEEE,2022:504-507. [18]LIU P,YUAN W,FU J,et al.Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023,55(9):1-35. [19]ZHANG Y,LU W,WU H.Chinese Word Similarity Computation based on Automatically Acquired Knowledge[C]//International Conference on Information System and Management Engineering.SCITEPRESS,2016:48-52. [20]WANG H.Polysemous words:meaning,length,and frequency[J].Stud Chinese Lang,2009,2:120-130. [21]WU Y F,YU S W.The Principles and Methods of Sense Discrimination for Chinese Language Processing [J].Language and Script,2006(2):126-133. [22]LEE D D,PHAM P,LARGMAN Y,et al.Advances in neural information processing systems 22[J].Tech Rep,2009. [23]KOUBAA A.GPT-4 vs. GPT-3.5:A concise showdown[J].Preprints. org,2023,2023030422. [24]EBERTS M,ULGES A.Span-based joint entity and relation extraction with transformer pre-training[J].arXiv:1909.07755,2019. [25]KOUBAAA.GPT-4 vs. GPT-3.5:A concise showdown[J].2023. [26]HABER J,POESIO M.Polysemy-evidence from linguistics,behavioral science,and contextualized language models[J].Computational Linguistics,2024,50(1):351-417. [27]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report[J].arXiv:2303.08774,2023. [28]ZHANG M,LI J.A commentary of GPT-3 in MIT Technology Review 2021[J].Fundamental Research,2021,1(6):831-833. [29]CHANTRAPORNCHAI C,TUNSAKUL A.Information ex-traction tasks based on BERT and SpaCy on tourism domain[J].ECTI Transactions on Computer and Information Technology(ECTI-CIT),2021,15(1):108-122. |
|