Computer Science ›› 2024, Vol. 51 ›› Issue (5): 250-257.doi: 10.11896/jsjkx.231100134

• Artificial Intelligence • Previous Articles     Next Articles

Very Short Texts Hierarchical Classification Combining Semantic Interpretation and DeBERTa

CHEN Haoyang, ZHANG Lei   

  1. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2023-11-20 Revised:2024-02-22 Online:2024-05-15 Published:2024-05-08
  • About author:CHEN Haoyang,born in 2003,undergraduate.His main research interests include NLP text classification and question answering.
    ZHANG Lei,born in 1987,assistant researcher.His main research interests include artificial intelligence,intelligent agents,and multi-agent systems.
  • Supported by:
    National Natural Science Foundation of China(62192783,62376117) and Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University.

Abstract: Text hierarchy classification has important applications in scenarios such as social comment topic classification and search term classification.The data in these scenarios often exhibits short text features,which is reflected in the sparsity and sensitivity of information.It poses great challenges for model feature representation and classification performance.The complexity and associativity of the hierarchical label space further exacerbate the difficulties.In view of this,a method fusing semantic interpretation and DeBERTa model is proposed,and the core idea of the method is as follows:introducing the semantic interpretation of individual words or phrases in specific contexts to supplement and optimize the content information acquired by the model;combining the disentangled attention and enhanced mask decoder of the DeBERTa model to better grasp the location information and improve the feature extraction ability.The method firstly performs grammatical disambiguation and lexical annotation on the training text,and then constructs the GlossDeBERTa model to perform semantic disambiguation with high accuracy to obtain the semantic interpreted sequence.Then the SimCSE framework is used to make the interpreted sequence vectorized to better characterize the sentence information in the interpreted sequence.Finally,the training text passes through the DeBERTa model neural network to get the feature vector representations of the original text,which is then summed up with the corresponding feature vector in the interpreted sequence,and passed into the multi-class classifier.The experiments select the very short text portion of the short text hierarchical categorization dataset TREC and expand the data,resulting in a dataset with an average length of 12 words.Multiple sets of comparison experiments show that the DeBERTa model proposed in this paperwith fused semantic interpretation has the best performance,and the Accuracy,F1-micro,and F1-macro values on the validation and test sets are much better than other algorithmic models,which can well cope with the task of hierarchical categorization of very short texts.

Key words: Very short text, Hierarchical classification, Semantic interpretation, DeBERTa, GlossDeBERTa, SimCSE

CLC Number: 

  • TP391.1
[1]SIDDHARTHA B,CEM A,FRANCISCO P S,et al.Hierarchical Transfer Learning for Multi-label Text Classification [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2019:6295-6300.
[2]ZHOU J,MA C P, LONG D K,et al.Hierarchy-Aware Global Model for Hierarchical Text Classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2020:1106-1117.
[3]CHEN H B,MA Q L,LIN Z X,et al.Hierarchy-aware label semantics matching network for hierarchical text classification[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2021:4370-4379.
[4]HUANG C M,WANG S L.Research on Short Text Classification Based on Bag of Words and TF-IDF[J].Software Enginee-ring,2020,23(3):1-3.
[5]WALLACH H M.Topic Modeling:Beyond Bag-of-Words[C]//Proceedings of the 23rd International Conference on Machine Learning.New York:ACM,2006:977-984.
[6]CHEN Q,YAO L,YANG J.Short text classification based on LDA topic model[C]//Proceedings of the 2016 International Conference on Audio,Language and Image Processing(ICALIP).Piscataway:IEEE,2016:749-753.
[7]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:ACL,2019:4171-4186.
[8]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[EB/OL].https://arxiv.org/abs/1907.11692.
[9]CHEN L C,QIN J,LU W D,et al.Short text classification me-thod based on self-attention mechanism [J].Computer Enginee-ring and Design,2022,43(3):728-734.
[10]HU Y,LI Y,YANG T,et al.Short Text Classification with A Convolutional Neural Networks Based Method [C]//Procee-dings of the 2018 15th International Conference on Control,Automation,Robotics and Vision(ICARCV).Piscataway:IEEE 2016,2018:1432-1435.
[11]LY U S,LIU J.Combine Convolution with Recurrent Networks for Text Classification[EB/OL].https://arxiv.org/abs/2006.15795.
[12]YANG F H,WANG X W,LI J.BERT-TextCNN-based classification of short texts from clinical trials [J].Chinese Journal of Medical Library and Information Science,2021,30(1):54-59.
[13]LIU Y,ZHANG K,HUANG Z,,et al.Enhancing Hierarchical Text Classification through Knowledge Graph Integration[C]//Findings of the Association for Computational Linguistics:ACL,Stroudsburg.PA:Association for Computational Linguistics,2023:5797-5810.
[14]LI B H,XIANG Y X,FENG D,et al.Short Text Classification Model Combining Knowledge Aware and Dual Attention[J].Journal of Software,2022,33(10):3565-3581.
[15]HOPPE F.Improving Zero-Shot Text Classification with Graph-based Knowledge Representations[C]//Proceedings of the Doctoral Consortium at ISWC 2022.FIZ Karlsruhe,2022:3165:4.
[16]ZHENG K X,WANG Y Q,YAO Q M,et al.Simplified Graph Learning for Inductive Short Text Classification[C]//Procee-dings of the 2022 Conference on Empirical Methods in Natural Language Processing Stroudsburg.PA:Association for Computational Linguistics,2020:10717-10724.
[17]HE P,LIU X,GAO J,et al.DeBERTa:Decoding-enhancedBERT with Disentangled Attention[EB/OL].https://arxiv.org/abs/2006.03654.
[18]Lesk M.Automatic sense disambiguation using machine readable dictionaries:how to tell a pine cone from an ice cream cone[C]//Proceedings of the 5th Annual International Conference on Systems Documentation.New York:ACM,1986:24-26.
[19]MONA D,PHILIP R.An unsupervised method for word sense tagging using parallel corpora [C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:255-262.
[20]BARBA E,PROCOPIO L, NAVIGLI R.ExtEnD:Extractiveentity disambiguation [C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2022:2478-2488.
[21]HUANG L,SUN C,QIU X,et al.GlossBERT:BERT for Word Sense Disambiguation with Gloss Knowledge[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Stroudsburg,PA:Association for Computational Linguistics,2019:3509-3514.
[22]VASWANIA,SHAZEER N,PARMARN,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York:ACM,2017:6000-6010.
[23]GAO T,YAO X,CHEN D.Simcse:Simple contrastive learning of sentence embeddings[EB/OL].https://arxiv.org/abs/2104.08821.
[24]HOVY E,GERBER L,HERMJAKOB U,et al.Toward seman-tics-based answer pinpointing[C]//Proceedings of the First International Conference on Human Language Technology Research.New York:ACM,2021:1-7.
[25]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[EB/OL].https://arxiv.org/abs/1508.01991.
[26]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Stroudsburg,PA:Association for Computational Linguistics,2014:1746-1751.
[27]SANH V,DEBUT L,CHAUMONDJ,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter.[EB/OL].https://arxiv.org/abs/1910.01108.
[28]WAN Y,GAO Q.An ensemble sentiment classification system ofTwitter data for airline services analysis[C]//Proceedings of the 2015 IEEE International Conference on Data Mining Workshop(ICDMW).Piscataway:IEEE,2015:1318-1325.
[1] CUI Min-jun, DUAN Li-guo and LI Ai-ping. Research on Multi-features Hierarchical Answer Quality Evaluation Method [J]. Computer Science, 2016, 43(1): 94-97.
[2] WU Bi-jun,LI Juan-zi,JIN Xin. Hierarchical Classification Approach of Hierarchical Feature Selection and Error Control [J]. Computer Science, 2010, 37(10): 165-168.
[3] WANG Yun-peng,MIAO Duo-qian,YUE Xiao-dong. Off-line Handwritten Character Recognition Based on Hierarchical Classification [J]. Computer Science, 2009, 36(12): 203-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!