计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 250-257.doi: 10.11896/jsjkx.231100134
陈昊飏, 张雷
CHEN Haoyang, ZHANG Lei
摘要: 文本层次分类在社交评论主题分类、搜索词分类等场景中有重要应用,这些场景的数据往往具有极短文本特征,体现在信息的稀疏性、敏感性等中,这对模型特征表示和分类性能带来了很大挑战,而层次标签空间的复杂性和关联性使得难度进一步加剧。基于此,提出了一种融合语义解释和DeBERTa模型的方法,该方法的核心思想在于:引入具体语境下各个字词或词组的语义解释,补充优化模型获取的内容信息;结合DeBERTa模型的注意力解耦机制与增强掩码解码器,以更好地把握位置信息、提高特征提取能力。所提方法首先对训练文本进行语法分词、词性标注,再构造GlossDeBERTa模型进行高准确率的语义消歧,获得语义解释序列;然后利用SimCSE框架使解释序列向量化,以更好地表征解释序列中的句子信息;最后训练文本经过DeBERTa模型神经网络后,得到原始文本的特征向量表示,再与解释序列中的对应特征向量相加,传入多分类器。实验遴选短文本层次分类数据集TREC中的极短文本部分,并进行数据扩充,最终得到的数据集平均长度为12词。多组对比实验表明,所提出的融合语义解释的DeBERTa模型性能最为优秀,在验证集和测试集上的Accuracy值、F1-micro值、F1-macro值相比其他算法模型有较大的提升,能够很好地应对极短文本层次分类任务。
中图分类号:
[1]SIDDHARTHA B,CEM A,FRANCISCO P S,et al.Hierarchical Transfer Learning for Multi-label Text Classification [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2019:6295-6300. [2]ZHOU J,MA C P, LONG D K,et al.Hierarchy-Aware Global Model for Hierarchical Text Classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2020:1106-1117. [3]CHEN H B,MA Q L,LIN Z X,et al.Hierarchy-aware label semantics matching network for hierarchical text classification[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2021:4370-4379. [4]HUANG C M,WANG S L.Research on Short Text Classification Based on Bag of Words and TF-IDF[J].Software Enginee-ring,2020,23(3):1-3. [5]WALLACH H M.Topic Modeling:Beyond Bag-of-Words[C]//Proceedings of the 23rd International Conference on Machine Learning.New York:ACM,2006:977-984. [6]CHEN Q,YAO L,YANG J.Short text classification based on LDA topic model[C]//Proceedings of the 2016 International Conference on Audio,Language and Image Processing(ICALIP).Piscataway:IEEE,2016:749-753. [7]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:ACL,2019:4171-4186. [8]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[EB/OL].https://arxiv.org/abs/1907.11692. [9]CHEN L C,QIN J,LU W D,et al.Short text classification me-thod based on self-attention mechanism [J].Computer Enginee-ring and Design,2022,43(3):728-734. [10]HU Y,LI Y,YANG T,et al.Short Text Classification with A Convolutional Neural Networks Based Method [C]//Procee-dings of the 2018 15th International Conference on Control,Automation,Robotics and Vision(ICARCV).Piscataway:IEEE 2016,2018:1432-1435. [11]LY U S,LIU J.Combine Convolution with Recurrent Networks for Text Classification[EB/OL].https://arxiv.org/abs/2006.15795. [12]YANG F H,WANG X W,LI J.BERT-TextCNN-based classification of short texts from clinical trials [J].Chinese Journal of Medical Library and Information Science,2021,30(1):54-59. [13]LIU Y,ZHANG K,HUANG Z,,et al.Enhancing Hierarchical Text Classification through Knowledge Graph Integration[C]//Findings of the Association for Computational Linguistics:ACL,Stroudsburg.PA:Association for Computational Linguistics,2023:5797-5810. [14]LI B H,XIANG Y X,FENG D,et al.Short Text Classification Model Combining Knowledge Aware and Dual Attention[J].Journal of Software,2022,33(10):3565-3581. [15]HOPPE F.Improving Zero-Shot Text Classification with Graph-based Knowledge Representations[C]//Proceedings of the Doctoral Consortium at ISWC 2022.FIZ Karlsruhe,2022:3165:4. [16]ZHENG K X,WANG Y Q,YAO Q M,et al.Simplified Graph Learning for Inductive Short Text Classification[C]//Procee-dings of the 2022 Conference on Empirical Methods in Natural Language Processing Stroudsburg.PA:Association for Computational Linguistics,2020:10717-10724. [17]HE P,LIU X,GAO J,et al.DeBERTa:Decoding-enhancedBERT with Disentangled Attention[EB/OL].https://arxiv.org/abs/2006.03654. [18]Lesk M.Automatic sense disambiguation using machine readable dictionaries:how to tell a pine cone from an ice cream cone[C]//Proceedings of the 5th Annual International Conference on Systems Documentation.New York:ACM,1986:24-26. [19]MONA D,PHILIP R.An unsupervised method for word sense tagging using parallel corpora [C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:255-262. [20]BARBA E,PROCOPIO L, NAVIGLI R.ExtEnD:Extractiveentity disambiguation [C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2022:2478-2488. [21]HUANG L,SUN C,QIU X,et al.GlossBERT:BERT for Word Sense Disambiguation with Gloss Knowledge[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Stroudsburg,PA:Association for Computational Linguistics,2019:3509-3514. [22]VASWANIA,SHAZEER N,PARMARN,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York:ACM,2017:6000-6010. [23]GAO T,YAO X,CHEN D.Simcse:Simple contrastive learning of sentence embeddings[EB/OL].https://arxiv.org/abs/2104.08821. [24]HOVY E,GERBER L,HERMJAKOB U,et al.Toward seman-tics-based answer pinpointing[C]//Proceedings of the First International Conference on Human Language Technology Research.New York:ACM,2021:1-7. [25]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[EB/OL].https://arxiv.org/abs/1508.01991. [26]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Stroudsburg,PA:Association for Computational Linguistics,2014:1746-1751. [27]SANH V,DEBUT L,CHAUMONDJ,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter.[EB/OL].https://arxiv.org/abs/1910.01108. [28]WAN Y,GAO Q.An ensemble sentiment classification system ofTwitter data for airline services analysis[C]//Proceedings of the 2015 IEEE International Conference on Data Mining Workshop(ICDMW).Piscataway:IEEE,2015:1318-1325. |
|