计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 255-261.doi: 10.11896/jsjkx.240100155

• 人工智能 • 上一篇    下一篇

话题性话语标记的自动识别与分类

杨进才1, 余漠洋1, 胡满1, 肖明2   

  1. 1 华中师范大学计算机学院 武汉 430079
    2 华中师范大学语言与语言教育研究中心 武汉 430079
  • 收稿日期:2024-01-22 修回日期:2024-05-16 出版日期:2025-04-15 发布日期:2025-04-14
  • 通讯作者: 杨进才(jcyang@mail.ccnu.edu.cn)
  • 基金资助:
    国家社会科学基金(19BYY092);教育部人文社科规划基金(20YJA740047)

Automatic Identification and Classification of Topical Discourse Markers

YANG Jincai1, YU Moyang1, HU Man1, XIAO Ming2   

  1. 1 School of Computer Science,Central China Normal University,Wuhan 430079,China
    2 Research Center for Language and Language Education,Central China Normal University,Wuhan 430079,China
  • Received:2024-01-22 Revised:2024-05-16 Online:2025-04-15 Published:2025-04-14
  • About author:YANG Jincai,born in 1976,professor,doctoral supervisor,is a member of CCF(No.35662M).His main research interests include advanced database and information system,Chinese information processing,artificial intelligence and natural language processing.
  • Supported by:
    National Social Science Fundation of China(19BYY092) and Humanity and Social Science Foundation of Ministry of Education of China(20YJA740047).

摘要: 话语标记(Discourse Markers)是一种语言标记,具有组织语篇、引导指意、显示情感的作用,因而受到语言学界的广泛关注。对话语标记及其类别的准确识别,对于篇章理解、说话人意图和情感的把握有重要作用。近十年来,国内外学者对话语标记的功能、特征、来源和系统分类展开研究并取得了丰富的成果。然而,因话语标记形式多变、来源多样、特征抽象、变体繁多,机器自动识别的难度较大。对此,以话题性话语标记为研究对象,提出一种融合外部语言学特征的NFLAT指针网络模型,实现对语篇中话语标记的自动识别和分类。经实验检验,训练后模型对话题性话语标记的识别及分类精确率(P值)达94.55%。

关键词: 话语标记, 语义增强, 特征融合, 自动识别与分类

Abstract: Discourse markers,a kind of linguistic markers at the pragmatic level which have functions of organizing discourse,guiding signifier,and expressing emotions,have attracted extensive attention in linguistics.The accurate identification of discourse markers and categories plays an important role in the comprehension of text and the grasp of the speaker’s intention and emotion.In the past decade,scholars at home and abroad have conducted research on function,characteristics,sources and systematic classification of discourse markers and achieved rich results.However,due to the changeable forms,diverse sources,abstract features,and variants,it is difficult for machines to automatically identify discourse markers.In this paper,an NFLAT pointer network model integrating external linguistic features is proposed,which takes topical discourse markers as the research object,and realizes the automatic recognition and classification of discourse markers in discourse.Experimental results show that the precision of the trained model for the recognition and classification of topical discourse markers reaches 94.55%.

Key words: Discourse marker, Semantic enhancement, Feature fusion, Automatic identification and classification

中图分类号: 

  • TP391
[1]XIAO M.Research hotspots and development analysis of dis-course markers [J].Central China Humanities,2021,13(3):160-169.
[2]ZHOU M Q.Research on the system of discourse markers and cognition of modern Chinese[M].Beijing:China Social Science Press,2022:1-23.
[3]LIU L Y.Research on Chinese discourse markers[M].Beijing:Beijing Language and Culture University Press,2011:26-38.
[4]XU J J.The discourse marker RANHOU and its functions in spoken Chinese [J].Foreign Languages Research,2009(2):9-15,112.
[5]LI Z J.Chinese new function words [M].Shanghai:Shanghai Education Press,2011.
[6]ZHOU M Q.An overview of the system of modern Chinese discourse markers[J].Journal of Zhejiang International Studies University,2020(1):80-88,108.
[7]LI X M.A study on Chinese metalinguistic markers[M]//Beijing:China Social Science Press,2011:104-137.
[8]LI Z P.A study of discourse markers in modern Chinese language[M]//Beijing:World Publishing Corporation,2015:78-83.
[9]XI J G.Pragmatic markers in English and Chinese:A cognitive study[M]//Hangzhou:Zhejiang University Press,2009:52-65.
[10]ZHAO Y Y.Design of discourse marker feature recognition system based on multi-dimensional spectrogram[J].Modern Electronics Technique,2021,44(12):83-86.
[11]XIAO M,XIAO Y.Research on interpretability recognition ofChinese discourse markers based on dependency graph[J].Journal of Central China Normal University(Natural Science),2023,57(4):528-538.
[12]QI P N,LIAO Y L,QIN B.Survey on deep learning for Chinese named entity recognition[J].Journal of Chinese Computer Systems,2023,44(9):1857-1868.
[13]DONG C,ZHANG J,ZONG C,et al.Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Proceedings 24 ICCPOL.Springer International Publishing,2016:239-250.
[14]MENG Y X,WU W,WANG F,et al.Glyce:Glyph-vectors for Chinese Character Representations[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:2746-2757.
[15]WU S,SONG X N,FENG Z H.MECT:Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition[J].arXiv:2107.05418,2021.
[16]NIE Y Y,TIAN Y H,WAN X,et al.Named Entity Recognition for Social Media Texts with Semantic Augmentation[J].arXiv:2010.15458,2020.
[17]LIAO M,JIA Z,LI T R,et al.Chinese Named Entity Recognition Based on Label Information Fusion and Multi-Task Lear-ning[J].Computer Science,2024,51(3):198-204.
[18]WU S,SONG X N,FENG Z H,et al.Non-flat-lattice transfor-mer for chinese named entity recognition [J].arXiv:2205.05832,2022.
[19]LI X,YAN H,QIU X,et al.FLAT:Chinese NER Using Flat-Lattice Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020.
[20]DAI Z H,YANG Z L,YANG Y M,et al.Transformer-XL:Attentive Language Models beyond a Fixed-Length Context [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2978-2988.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-
rence on Neural Information Processing Systems.2017:6000-6010.
[22]YAN H,DENG B C,LI X N,et al.TENER:Adapting Transformer Encoder for Named Entity Recognition[J].arXiv:1911.04474,2019.
[23]CHE W X,FENG Y L,QIN L B,et al.N-LTP:An Open-source Neural Language Technology Platform for Chinese[C]//Proceedings of Association for Computational Linguistics.2021:42-49.
[24]SU J L,MURTADHA A,PAN S F,et al.Global Pointer:Novel Efficient Span-based Approach for Named Entity Recognition[J].arXiv:2208.03054,2022.
[25]ORIOL V,MEIRE F,NAVDEEP J.Pointer networks[J].ar-Xiv:1506.03134,2015.
[26]DENG L,QI P H,LIU Z P,et al.BGPNER:A BERT-based global pointer network for named entity-relation joint extraction method[J].Computer Science,2023,50(3):42-48.
[27]SU J L,LU Y,PAN S F,et al.Reformer:Enhanced transformer with rotary position embedding[J].arXiv:2104.09864,2021.
[28]YANG Z,DAI Z,YANG Y,et al.Xlnet:Generalized autoregressive pretraining for language understanding[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:5753-5763.
[29]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!