计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 250-257.doi: 10.11896/jsjkx.231100134

• 人工智能 • 上一篇    下一篇

融合语义解释和DeBERTa的极短文本层次分类

陈昊飏, 张雷   

  1. 南京大学计算机软件新技术全国重点实验室 南京 210023
  • 收稿日期:2023-11-20 修回日期:2024-02-22 出版日期:2024-05-15 发布日期:2024-05-08
  • 通讯作者: 张雷(ZhangL@nju.edu.cn)
  • 作者简介:(chen-haoyang@qq.com)
  • 基金资助:
    国家自然科学基金(62192783,62376117);南京大学软件新技术与产业化协同创新中心

Very Short Texts Hierarchical Classification Combining Semantic Interpretation and DeBERTa

CHEN Haoyang, ZHANG Lei   

  1. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2023-11-20 Revised:2024-02-22 Online:2024-05-15 Published:2024-05-08
  • About author:CHEN Haoyang,born in 2003,undergraduate.His main research interests include NLP text classification and question answering.
    ZHANG Lei,born in 1987,assistant researcher.His main research interests include artificial intelligence,intelligent agents,and multi-agent systems.
  • Supported by:
    National Natural Science Foundation of China(62192783,62376117) and Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University.

摘要: 文本层次分类在社交评论主题分类、搜索词分类等场景中有重要应用,这些场景的数据往往具有极短文本特征,体现在信息的稀疏性、敏感性等中,这对模型特征表示和分类性能带来了很大挑战,而层次标签空间的复杂性和关联性使得难度进一步加剧。基于此,提出了一种融合语义解释和DeBERTa模型的方法,该方法的核心思想在于:引入具体语境下各个字词或词组的语义解释,补充优化模型获取的内容信息;结合DeBERTa模型的注意力解耦机制与增强掩码解码器,以更好地把握位置信息、提高特征提取能力。所提方法首先对训练文本进行语法分词、词性标注,再构造GlossDeBERTa模型进行高准确率的语义消歧,获得语义解释序列;然后利用SimCSE框架使解释序列向量化,以更好地表征解释序列中的句子信息;最后训练文本经过DeBERTa模型神经网络后,得到原始文本的特征向量表示,再与解释序列中的对应特征向量相加,传入多分类器。实验遴选短文本层次分类数据集TREC中的极短文本部分,并进行数据扩充,最终得到的数据集平均长度为12词。多组对比实验表明,所提出的融合语义解释的DeBERTa模型性能最为优秀,在验证集和测试集上的Accuracy值、F1-micro值、F1-macro值相比其他算法模型有较大的提升,能够很好地应对极短文本层次分类任务。

关键词: 极短文本, 层次分类, 语义解释, DeBERTa, GlossDeBERTa, SimCSE

Abstract: Text hierarchy classification has important applications in scenarios such as social comment topic classification and search term classification.The data in these scenarios often exhibits short text features,which is reflected in the sparsity and sensitivity of information.It poses great challenges for model feature representation and classification performance.The complexity and associativity of the hierarchical label space further exacerbate the difficulties.In view of this,a method fusing semantic interpretation and DeBERTa model is proposed,and the core idea of the method is as follows:introducing the semantic interpretation of individual words or phrases in specific contexts to supplement and optimize the content information acquired by the model;combining the disentangled attention and enhanced mask decoder of the DeBERTa model to better grasp the location information and improve the feature extraction ability.The method firstly performs grammatical disambiguation and lexical annotation on the training text,and then constructs the GlossDeBERTa model to perform semantic disambiguation with high accuracy to obtain the semantic interpreted sequence.Then the SimCSE framework is used to make the interpreted sequence vectorized to better characterize the sentence information in the interpreted sequence.Finally,the training text passes through the DeBERTa model neural network to get the feature vector representations of the original text,which is then summed up with the corresponding feature vector in the interpreted sequence,and passed into the multi-class classifier.The experiments select the very short text portion of the short text hierarchical categorization dataset TREC and expand the data,resulting in a dataset with an average length of 12 words.Multiple sets of comparison experiments show that the DeBERTa model proposed in this paperwith fused semantic interpretation has the best performance,and the Accuracy,F1-micro,and F1-macro values on the validation and test sets are much better than other algorithmic models,which can well cope with the task of hierarchical categorization of very short texts.

Key words: Very short text, Hierarchical classification, Semantic interpretation, DeBERTa, GlossDeBERTa, SimCSE

中图分类号: 

  • TP391.1
[1]SIDDHARTHA B,CEM A,FRANCISCO P S,et al.Hierarchical Transfer Learning for Multi-label Text Classification [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2019:6295-6300.
[2]ZHOU J,MA C P, LONG D K,et al.Hierarchy-Aware Global Model for Hierarchical Text Classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2020:1106-1117.
[3]CHEN H B,MA Q L,LIN Z X,et al.Hierarchy-aware label semantics matching network for hierarchical text classification[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2021:4370-4379.
[4]HUANG C M,WANG S L.Research on Short Text Classification Based on Bag of Words and TF-IDF[J].Software Enginee-ring,2020,23(3):1-3.
[5]WALLACH H M.Topic Modeling:Beyond Bag-of-Words[C]//Proceedings of the 23rd International Conference on Machine Learning.New York:ACM,2006:977-984.
[6]CHEN Q,YAO L,YANG J.Short text classification based on LDA topic model[C]//Proceedings of the 2016 International Conference on Audio,Language and Image Processing(ICALIP).Piscataway:IEEE,2016:749-753.
[7]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:ACL,2019:4171-4186.
[8]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[EB/OL].https://arxiv.org/abs/1907.11692.
[9]CHEN L C,QIN J,LU W D,et al.Short text classification me-thod based on self-attention mechanism [J].Computer Enginee-ring and Design,2022,43(3):728-734.
[10]HU Y,LI Y,YANG T,et al.Short Text Classification with A Convolutional Neural Networks Based Method [C]//Procee-dings of the 2018 15th International Conference on Control,Automation,Robotics and Vision(ICARCV).Piscataway:IEEE 2016,2018:1432-1435.
[11]LY U S,LIU J.Combine Convolution with Recurrent Networks for Text Classification[EB/OL].https://arxiv.org/abs/2006.15795.
[12]YANG F H,WANG X W,LI J.BERT-TextCNN-based classification of short texts from clinical trials [J].Chinese Journal of Medical Library and Information Science,2021,30(1):54-59.
[13]LIU Y,ZHANG K,HUANG Z,,et al.Enhancing Hierarchical Text Classification through Knowledge Graph Integration[C]//Findings of the Association for Computational Linguistics:ACL,Stroudsburg.PA:Association for Computational Linguistics,2023:5797-5810.
[14]LI B H,XIANG Y X,FENG D,et al.Short Text Classification Model Combining Knowledge Aware and Dual Attention[J].Journal of Software,2022,33(10):3565-3581.
[15]HOPPE F.Improving Zero-Shot Text Classification with Graph-based Knowledge Representations[C]//Proceedings of the Doctoral Consortium at ISWC 2022.FIZ Karlsruhe,2022:3165:4.
[16]ZHENG K X,WANG Y Q,YAO Q M,et al.Simplified Graph Learning for Inductive Short Text Classification[C]//Procee-dings of the 2022 Conference on Empirical Methods in Natural Language Processing Stroudsburg.PA:Association for Computational Linguistics,2020:10717-10724.
[17]HE P,LIU X,GAO J,et al.DeBERTa:Decoding-enhancedBERT with Disentangled Attention[EB/OL].https://arxiv.org/abs/2006.03654.
[18]Lesk M.Automatic sense disambiguation using machine readable dictionaries:how to tell a pine cone from an ice cream cone[C]//Proceedings of the 5th Annual International Conference on Systems Documentation.New York:ACM,1986:24-26.
[19]MONA D,PHILIP R.An unsupervised method for word sense tagging using parallel corpora [C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:255-262.
[20]BARBA E,PROCOPIO L, NAVIGLI R.ExtEnD:Extractiveentity disambiguation [C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2022:2478-2488.
[21]HUANG L,SUN C,QIU X,et al.GlossBERT:BERT for Word Sense Disambiguation with Gloss Knowledge[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Stroudsburg,PA:Association for Computational Linguistics,2019:3509-3514.
[22]VASWANIA,SHAZEER N,PARMARN,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York:ACM,2017:6000-6010.
[23]GAO T,YAO X,CHEN D.Simcse:Simple contrastive learning of sentence embeddings[EB/OL].https://arxiv.org/abs/2104.08821.
[24]HOVY E,GERBER L,HERMJAKOB U,et al.Toward seman-tics-based answer pinpointing[C]//Proceedings of the First International Conference on Human Language Technology Research.New York:ACM,2021:1-7.
[25]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[EB/OL].https://arxiv.org/abs/1508.01991.
[26]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Stroudsburg,PA:Association for Computational Linguistics,2014:1746-1751.
[27]SANH V,DEBUT L,CHAUMONDJ,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter.[EB/OL].https://arxiv.org/abs/1910.01108.
[28]WAN Y,GAO Q.An ensemble sentiment classification system ofTwitter data for airline services analysis[C]//Proceedings of the 2015 IEEE International Conference on Data Mining Workshop(ICDMW).Piscataway:IEEE,2015:1318-1325.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!