计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 238-244.doi: 10.11896/jsjkx.221100266
廖兴滨, 钱杨舸, 王乾垒, 秦小林
LIAO Xingbin, QIAN Yangge, WANG Qianlei, QIN Xiaolin
摘要: 文档分类的一项基本工作是研究如何高效地表示输入特征,句子和文档向量表示也可以辅助自然语言处理的下游任务,如文本情感分析和数据泄露预防等。特征表示也逐渐成为文档分类问题的性能瓶颈和模型可解释性的关键之一。针对现有分层模型面临的大量重复计算以及可解释性缺乏的问题,提出了一种分层文档分类模型,并研究了句子和文档表示方法对文档分类问题的性能影响。所提模型集成了使用改进自注意力机制融合输入特征向量的句子编码器和文档编码器,形成了一个层次结构,以实现对文档级数据的分层处理,在简化计算的同时增强了模型的可解释性。与仅使用预训练语言模型的特殊标记向量作为句子表示的模型相比,所提模型在5个公开文档分类数据集上实现了平均 4% 的性能提升,比使用词向量矩阵的注意力输出均值的模型提高了2%。
中图分类号:
[1]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.PMLR,2014:1188-1196. [2]PAGLIARDINI M,GUPTA P,JAGGI M.Unsupervised Lear-ning of Sentence Embeddings Using Compositional n-Gram Features[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long Papers).2018:528-540. [3]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186. [4]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992. [5]YANG Y,CER D,AHMAD A,et al.Multilingual UniversalSentence Encoder for Semantic Retrieval[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics:System Demonstrations.2020:87-94. [6]ADHIKARI A,RAM A,TANG R,et al.Docbert:Bert for document classification[EB/OL].(2019-04-17)[2019-08-22].https://arxiv.org/abs/1904.08398. [7]TANAKA H,SHINNOU H,CAO R,et al.Document classification by word embeddings of bert[C]//International Conference of the Pacific Association for Computational Linguistics.Sprin-ger,Singapore,2019:145-154. [8]LUONG M T,PHAM H,MANNING C D.Effective Approaches to Attention-based Neural Machine Translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1412-1421. [9]LI J,XU Y,SHI H.Bidirectional LSTM with hierarchical attention for text classification[C]//2019 IEEE 4th Advanced Information Technology,Electronic and Automation Control Confe-rence(IAEAC).IEEE,2019,1:456-459. [10]CHEN Y.Convolutional neural network for sentence classification[D].Ontario:University of Waterloo,2015. [11]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489. [12]FAN F L,XIONG J,LI M,et al.On interpretability of artificial neural networks:A survey[J].IEEE Transactions on Radiation and Plasma Medical Sciences,2021,5(6):741-760. [13]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162. [14]PAPPAGARI R,ZELASKO P,VILLALBA J,et al.Hierarchical transformers for long document classification[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop(ASRU).IEEE,2019:838-844. [15]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536. [16]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems(NIPS’17).Curran Associates Inc.,2017:6000-6010. [17]LI B,ZHOU H,HE J,et al.On the Sentence Embeddings from Pre-trained Language Models [C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2020:9119-9130. [18]HUANG J,TANG D,ZHONG W,et al.WhiteningBERT:An Easy Unsupervised Sentence Embedding Approach[C]//Fin-dings of the Association for Computational Linguistics(EMNLP 2021).2021:238-244. [19]CHOI G,OH S,KIM H.Improving document-level sentiment classification using importance of sentences[J].Entropy,2020,22(12):1336. [20]CHO K,VAN M B,GU∙LÇEHRE Ç,et al.Learning PhraseRepresentations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1724-1734. [21]LI W J,QI F,YU Z T.Sentiment classification method based on multi-channel features and self-attention[J].Journal of Software,2021,32(9):2783-2800. [22]REN J H,LI J,MENG X F.Document classification method based on context awareness and hierarchical attention network[J].Journal of Frontiers of Computer Science and Technology,2021,15(2):305-314. [23]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[EB/OL].(2019-07-26) [2019-07-26].https://arxiv.org/ abs/1907.11692. [24]JAWAHAR G,SAGOT B,SEDDAH D.What Does BERTLearn about the Structure of Language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3651-3657. [25]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[J].Advances in Neural Information Processing Systems,2019,32:8024-8035. [26]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Nnatural Language Processing(EMNLP).2014:1532-1543. [27]BIRD S,KLEIN E,LOPER E.Natural language processing with Python:analyzing text with the natural language toolkit[M].California:O’Reilly Media,Inc.,2009. [28]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958. [29]BA J L,KIROS J R,HINTON G E.Layer normalization[EB/OL].(2016-07-21)[2016-07-21].https://arxiv.org/abs/1607.06450. [30]ROBBINS H,MONRO S.A stochastic approximation method[J].The Annals of Mathematical Statistics,1951,22(3):400-407. |
|