计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 238-244.doi: 10.11896/jsjkx.221100266

• 人工智能 • 上一篇    下一篇

基于改进自注意力机制和表示学习的分层文档分类方法

廖兴滨, 钱杨舸, 王乾垒, 秦小林   

  1. 中国科学院成都计算机应用研究所自动推理实验室 成都610213中国科学院大学计算机科学与技术学院 北京100080
  • 收稿日期:2022-11-30 修回日期:2023-04-06 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 秦小林(qinxl2001@126.com)
  • 作者简介:(liaoxingbin20@mails.ucas.ac.cn)
  • 基金资助:
    四川省科技计划(2019ZDZX0006,2020YFQ0056);中科院STS计划区域重点A类(KFJ-STS-QYZD-2021-21-001)

Hierarchical Document Classification Method Based on Improved Self-attention Mechanism and Representation Learning

LIAO Xingbin, QIAN Yangge, WANG Qianlei, QIN Xiaolin   

  1. Laboratory for Automated Reasoning and Programming,Chengdu Institute of Computer Applications,CAS,Chengdu 610213,ChinaSchool of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100080,China
  • Received:2022-11-30 Revised:2023-04-06 Online:2024-02-15 Published:2024-02-22
  • About author:LIAO Xingbin,born in 1994,postgra-duate,is a member of CCF(No.J9063G).His main research interests include natural language processing and artificial intelligence.QIN Xiaolin,born in 1980,Ph.D,professor,Ph.D.supervisor,is a senior member of CCF(No.12344M).His main research interests include artificial intelligence and automated reasoning.
  • Supported by:
    Sichuan Science and Technology Program(2019ZDZX0006,2020YFQ0056) and Science and Technology Service Network Initiative(KFJ-STS-QYZD-2021-21-001).

摘要: 文档分类的一项基本工作是研究如何高效地表示输入特征,句子和文档向量表示也可以辅助自然语言处理的下游任务,如文本情感分析和数据泄露预防等。特征表示也逐渐成为文档分类问题的性能瓶颈和模型可解释性的关键之一。针对现有分层模型面临的大量重复计算以及可解释性缺乏的问题,提出了一种分层文档分类模型,并研究了句子和文档表示方法对文档分类问题的性能影响。所提模型集成了使用改进自注意力机制融合输入特征向量的句子编码器和文档编码器,形成了一个层次结构,以实现对文档级数据的分层处理,在简化计算的同时增强了模型的可解释性。与仅使用预训练语言模型的特殊标记向量作为句子表示的模型相比,所提模型在5个公开文档分类数据集上实现了平均 4% 的性能提升,比使用词向量矩阵的注意力输出均值的模型提高了2%。

关键词: 句子表示, 文档表示, 注意力机制, 文档分类, 模型可解释性

Abstract: An essential task of document classification is to study how to effectively represent input features,and sentence and document vector representations can assist in downstream tasks in natural language processing,such as text sentiment analysis and data leakage prevention.Feature representation is also increasingly becoming one of the keys to performance bottlenecks and interpretability of document classification problems.A hierarchical document classification model is proposed to address thepro-blems of extensive repetitive computation and lack of interpretability faced by existing hierarchical models,and the performance effects of sentence and document representations on the document classification problem are investigated.The proposed model integrates a sentence encoder and a document encoder that fuses input feature vectors using an improved self-attention mechanism,forming a hierarchy to enable hierarchical processing of document-level data,simplifying the computation while enhancing the interpretability of the model.Compared with the model that only uses the special token vector of pre-trained models as sentence representation,the proposed model can achieve an average of 4% performance improvements on five public document classification datasets,and an average of about 2% higher than the model that uses mean attention outputs of word vector matrix.

Key words: Sentence representation, Document representation, Attention mechanism, Document classification, Model interpre-tability

中图分类号: 

  • TP183
[1]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.PMLR,2014:1188-1196.
[2]PAGLIARDINI M,GUPTA P,JAGGI M.Unsupervised Lear-ning of Sentence Embeddings Using Compositional n-Gram Features[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long Papers).2018:528-540.
[3]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[4]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992.
[5]YANG Y,CER D,AHMAD A,et al.Multilingual UniversalSentence Encoder for Semantic Retrieval[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics:System Demonstrations.2020:87-94.
[6]ADHIKARI A,RAM A,TANG R,et al.Docbert:Bert for document classification[EB/OL].(2019-04-17)[2019-08-22].https://arxiv.org/abs/1904.08398.
[7]TANAKA H,SHINNOU H,CAO R,et al.Document classification by word embeddings of bert[C]//International Conference of the Pacific Association for Computational Linguistics.Sprin-ger,Singapore,2019:145-154.
[8]LUONG M T,PHAM H,MANNING C D.Effective Approaches to Attention-based Neural Machine Translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1412-1421.
[9]LI J,XU Y,SHI H.Bidirectional LSTM with hierarchical attention for text classification[C]//2019 IEEE 4th Advanced Information Technology,Electronic and Automation Control Confe-rence(IAEAC).IEEE,2019,1:456-459.
[10]CHEN Y.Convolutional neural network for sentence classification[D].Ontario:University of Waterloo,2015.
[11]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489.
[12]FAN F L,XIONG J,LI M,et al.On interpretability of artificial neural networks:A survey[J].IEEE Transactions on Radiation and Plasma Medical Sciences,2021,5(6):741-760.
[13]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162.
[14]PAPPAGARI R,ZELASKO P,VILLALBA J,et al.Hierarchical transformers for long document classification[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop(ASRU).IEEE,2019:838-844.
[15]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[16]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems(NIPS’17).Curran Associates Inc.,2017:6000-6010.
[17]LI B,ZHOU H,HE J,et al.On the Sentence Embeddings from Pre-trained Language Models [C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2020:9119-9130.
[18]HUANG J,TANG D,ZHONG W,et al.WhiteningBERT:An Easy Unsupervised Sentence Embedding Approach[C]//Fin-dings of the Association for Computational Linguistics(EMNLP 2021).2021:238-244.
[19]CHOI G,OH S,KIM H.Improving document-level sentiment classification using importance of sentences[J].Entropy,2020,22(12):1336.
[20]CHO K,VAN M B,GU∙LÇEHRE Ç,et al.Learning PhraseRepresentations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1724-1734.
[21]LI W J,QI F,YU Z T.Sentiment classification method based on multi-channel features and self-attention[J].Journal of Software,2021,32(9):2783-2800.
[22]REN J H,LI J,MENG X F.Document classification method based on context awareness and hierarchical attention network[J].Journal of Frontiers of Computer Science and Technology,2021,15(2):305-314.
[23]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[EB/OL].(2019-07-26) [2019-07-26].https://arxiv.org/ abs/1907.11692.
[24]JAWAHAR G,SAGOT B,SEDDAH D.What Does BERTLearn about the Structure of Language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3651-3657.
[25]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[J].Advances in Neural Information Processing Systems,2019,32:8024-8035.
[26]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Nnatural Language Processing(EMNLP).2014:1532-1543.
[27]BIRD S,KLEIN E,LOPER E.Natural language processing with Python:analyzing text with the natural language toolkit[M].California:O’Reilly Media,Inc.,2009.
[28]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958.
[29]BA J L,KIROS J R,HINTON G E.Layer normalization[EB/OL].(2016-07-21)[2016-07-21].https://arxiv.org/abs/1607.06450.
[30]ROBBINS H,MONRO S.A stochastic approximation method[J].The Annals of Mathematical Statistics,1951,22(3):400-407.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!