Computer Science ›› 2024, Vol. 51 ›› Issue (2): 238-244.doi: 10.11896/jsjkx.221100266

• Artificial Intelligence • Previous Articles     Next Articles

Hierarchical Document Classification Method Based on Improved Self-attention Mechanism and Representation Learning

LIAO Xingbin, QIAN Yangge, WANG Qianlei, QIN Xiaolin   

  1. Laboratory for Automated Reasoning and Programming,Chengdu Institute of Computer Applications,CAS,Chengdu 610213,ChinaSchool of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100080,China
  • Received:2022-11-30 Revised:2023-04-06 Online:2024-02-15 Published:2024-02-22
  • About author:LIAO Xingbin,born in 1994,postgra-duate,is a member of CCF(No.J9063G).His main research interests include natural language processing and artificial intelligence.QIN Xiaolin,born in 1980,Ph.D,professor,Ph.D.supervisor,is a senior member of CCF(No.12344M).His main research interests include artificial intelligence and automated reasoning.
  • Supported by:
    Sichuan Science and Technology Program(2019ZDZX0006,2020YFQ0056) and Science and Technology Service Network Initiative(KFJ-STS-QYZD-2021-21-001).

Abstract: An essential task of document classification is to study how to effectively represent input features,and sentence and document vector representations can assist in downstream tasks in natural language processing,such as text sentiment analysis and data leakage prevention.Feature representation is also increasingly becoming one of the keys to performance bottlenecks and interpretability of document classification problems.A hierarchical document classification model is proposed to address thepro-blems of extensive repetitive computation and lack of interpretability faced by existing hierarchical models,and the performance effects of sentence and document representations on the document classification problem are investigated.The proposed model integrates a sentence encoder and a document encoder that fuses input feature vectors using an improved self-attention mechanism,forming a hierarchy to enable hierarchical processing of document-level data,simplifying the computation while enhancing the interpretability of the model.Compared with the model that only uses the special token vector of pre-trained models as sentence representation,the proposed model can achieve an average of 4% performance improvements on five public document classification datasets,and an average of about 2% higher than the model that uses mean attention outputs of word vector matrix.

Key words: Sentence representation, Document representation, Attention mechanism, Document classification, Model interpre-tability

CLC Number: 

  • TP183
[1]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.PMLR,2014:1188-1196.
[2]PAGLIARDINI M,GUPTA P,JAGGI M.Unsupervised Lear-ning of Sentence Embeddings Using Compositional n-Gram Features[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long Papers).2018:528-540.
[3]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[4]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992.
[5]YANG Y,CER D,AHMAD A,et al.Multilingual UniversalSentence Encoder for Semantic Retrieval[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics:System Demonstrations.2020:87-94.
[6]ADHIKARI A,RAM A,TANG R,et al.Docbert:Bert for document classification[EB/OL].(2019-04-17)[2019-08-22].https://arxiv.org/abs/1904.08398.
[7]TANAKA H,SHINNOU H,CAO R,et al.Document classification by word embeddings of bert[C]//International Conference of the Pacific Association for Computational Linguistics.Sprin-ger,Singapore,2019:145-154.
[8]LUONG M T,PHAM H,MANNING C D.Effective Approaches to Attention-based Neural Machine Translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1412-1421.
[9]LI J,XU Y,SHI H.Bidirectional LSTM with hierarchical attention for text classification[C]//2019 IEEE 4th Advanced Information Technology,Electronic and Automation Control Confe-rence(IAEAC).IEEE,2019,1:456-459.
[10]CHEN Y.Convolutional neural network for sentence classification[D].Ontario:University of Waterloo,2015.
[11]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489.
[12]FAN F L,XIONG J,LI M,et al.On interpretability of artificial neural networks:A survey[J].IEEE Transactions on Radiation and Plasma Medical Sciences,2021,5(6):741-760.
[13]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162.
[14]PAPPAGARI R,ZELASKO P,VILLALBA J,et al.Hierarchical transformers for long document classification[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop(ASRU).IEEE,2019:838-844.
[15]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[16]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems(NIPS’17).Curran Associates Inc.,2017:6000-6010.
[17]LI B,ZHOU H,HE J,et al.On the Sentence Embeddings from Pre-trained Language Models [C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2020:9119-9130.
[18]HUANG J,TANG D,ZHONG W,et al.WhiteningBERT:An Easy Unsupervised Sentence Embedding Approach[C]//Fin-dings of the Association for Computational Linguistics(EMNLP 2021).2021:238-244.
[19]CHOI G,OH S,KIM H.Improving document-level sentiment classification using importance of sentences[J].Entropy,2020,22(12):1336.
[20]CHO K,VAN M B,GU∙LÇEHRE Ç,et al.Learning PhraseRepresentations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1724-1734.
[21]LI W J,QI F,YU Z T.Sentiment classification method based on multi-channel features and self-attention[J].Journal of Software,2021,32(9):2783-2800.
[22]REN J H,LI J,MENG X F.Document classification method based on context awareness and hierarchical attention network[J].Journal of Frontiers of Computer Science and Technology,2021,15(2):305-314.
[23]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[EB/OL].(2019-07-26) [2019-07-26].https://arxiv.org/ abs/1907.11692.
[24]JAWAHAR G,SAGOT B,SEDDAH D.What Does BERTLearn about the Structure of Language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3651-3657.
[25]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[J].Advances in Neural Information Processing Systems,2019,32:8024-8035.
[26]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Nnatural Language Processing(EMNLP).2014:1532-1543.
[27]BIRD S,KLEIN E,LOPER E.Natural language processing with Python:analyzing text with the natural language toolkit[M].California:O’Reilly Media,Inc.,2009.
[28]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958.
[29]BA J L,KIROS J R,HINTON G E.Layer normalization[EB/OL].(2016-07-21)[2016-07-21].https://arxiv.org/abs/1607.06450.
[30]ROBBINS H,MONRO S.A stochastic approximation method[J].The Annals of Mathematical Statistics,1951,22(3):400-407.
[1] ZHANG Mingdao, ZHOU Xin, WU Xiaohong, QING Linbo, HE Xiaohai. Unified Fake News Detection Based on Semantic Expansion and HDGCN [J]. Computer Science, 2024, 51(4): 299-306.
[2] WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping. Review of Vision-based Neural Network 3D Dynamic Gesture Recognition Methods [J]. Computer Science, 2024, 51(4): 193-208.
[3] WANG Zihong, SHAO Yingxia, HE Jiyuan, LIU Jinbao. Sequential Recommendation Based on Multi-space Attribute Information Fusion [J]. Computer Science, 2024, 51(3): 102-108.
[4] HAO Ran, WANG Hongjun, LI Tianrui. Deep Neural Network Model for Transmission Line Defect Detection Based on Dual-branch Sequential Mixed Attention [J]. Computer Science, 2024, 51(3): 135-140.
[5] LI Yu, YANG Xiangli, ZHANG Le, LIANG Yalin, GAO Xian, YANG Jianxi. Combined Road Segmentation and Contour Extraction for Remote Sensing Images Based on Cascaded U-Net [J]. Computer Science, 2024, 51(3): 174-182.
[6] LIAO Meng, JIA Zhen, LI Tianrui. Chinese Named Entity Recognition Based on Label Information Fusion and Multi-task Learning [J]. Computer Science, 2024, 51(3): 198-204.
[7] SUN Shounan, WANG Jingbin, WU Renfei, YOU Changkai, KE Xifan, HUANG Hao. TMGAT:Graph Attention Network with Type Matching Constraint [J]. Computer Science, 2024, 51(3): 235-243.
[8] ZHANG Feng, HUANG Shixin, HUA Qiang, DONG Chunru. Novel Image Classification Model Based on Depth-wise Convolution Neural Network andVisual Transformer [J]. Computer Science, 2024, 51(2): 196-204.
[9] LIU Xuheng, BAI Zhengyao, XU Zhu, DU Jiajin, XIAO Xiao. Multi-guided Point Cloud Registration Network Combined with Attention Mechanism [J]. Computer Science, 2024, 51(2): 142-150.
[10] ZHANG Guodong, CHEN Zhihua, SHENG Bin. Infrared Small Target Detection Based on Dilated Convolutional Conditional GenerativeAdversarial Networks [J]. Computer Science, 2024, 51(2): 151-160.
[11] WU Jiawei, FANG Quan, HU Jun, QIAN Shengsheng. Pre-training of Heterogeneous Graph Neural Networks for Multi-label Document Classification [J]. Computer Science, 2024, 51(1): 143-149.
[12] WANG Weijia, XIONG Wenzhuo, ZHU Shengjie, SONG Ce, SUN He, SONG Yulong. Method of Infrared Small Target Detection Based on Multi-depth Feature Connection [J]. Computer Science, 2024, 51(1): 175-183.
[13] SHI Dianxi, LIU Yangyang, SONG Linna, TAN Jiefu, ZHOU Chenlei, ZHANG Yi. FeaEM:Feature Enhancement-based Method for Weakly Supervised Salient Object Detection via Multiple Pseudo Labels [J]. Computer Science, 2024, 51(1): 233-242.
[14] YI Liu, GENG Xinyu, BAI Jing. Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion [J]. Computer Science, 2023, 50(9): 278-286.
[15] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!