计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 278-286.doi: 10.11896/jsjkx.221200133
易流, 耿新宇, 白静
YI Liu, GENG Xinyu, BAI Jing
摘要: 自然语言处理是人工智能与机器学习领域的重要方向,它的目标是利用计算机技术来分析、理解和处理自然语言。自然语言处理的一个重点研究方向是从文本内容中获取信息,并且按照一定的标签体系或标准将文本内容进行自动分类标记。相比于单一标签文本分类而言,多标签文本分类具有一条数据属于多个标签的特点,使得更难从文本信息中获得多类别的数据特征。层级多标签文本分类又是其中的一个特别的类别,它将文本中的信息对应划分到不同的类别标签体系中,各个类别标签体系又具有互相依赖的层级关系。因此,如何利用其内部标签体系中的层级关系更准确地将文本分类到对应的标签中,也就成了解决问题的关键。为此,提出了一种基于并行卷积网络信息融合的层级多标签文本分类算法。首先,该算法利用BERT模型对文本信息进行词嵌入,接着利用自注意力机制增强文本信息的语义特征,然后利用不同卷积核对文本数据特征进行抽取。通过使用阈值控制树形结构建立上下位的节点间关系,更有效地利用了文本的多方位语义信息实现层级多标签文本分类任务。在公开数据集Kanshan-Cup和CI企业信息数据集上的结果表明,该算法在宏准确率、宏召回率与微F1值3种评价指标上均优于主流的TextCNN,TextRNN,FastText等对比模型,具有较好的层级多标签文本分类效果。
中图分类号:
[1]WU S,GAO M,XIAO Q,et al.A topic-enhanced recurrent autoencoder model for sentiment analysis of short texts[J].International Journal of Internet Manufacturing and Services,2020,7(4):393-406. [2]BIN N,WU J W,HU F.Spam message classification based on theNaïve Bayes classification algorithm[J].IAENG International Journal of Computer Science,2019,46(1):46-53. [3]CHEN J,HE J,SHEN Y,et al.End-to-end learning of LDA by mirror-descent back propagation over a deep architecture[J].arXiv:1508.03398,2015. [4]MINAEE S,KALCHBRENNER N,CAMBRIA E,et al.Deeplearning--based text classification:a comprehensive review[J].ACM Computing Surveys(CSUR),2021,54(3):1-40. [5]TAN C.Short Text Classification Based on LDA and SVM [J].International Journal of Applied Mathematics & Stats,2013,51(22):205-214. [6]YIN C,SHI L,WANG J.Short Text Classification Technology Based on KNN+Hierarchy SVM [C] // International Confe-rence on Multimedia and Ubiquitous Engineering International Conference on Future Information Technology.2017:633-639. [7]JIANG T,WANG D,SUN L,et al.Transformer with DynamicNegative Sampling for High-Performance Extreme Multi-label Text Classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7987-7994. [8]JOHNSON R,ZHANG T.Effective use of word order for text categorization with convolutional neural networks[J].arXiv:1412.1058,2014. [9]GARGIULO F,SILVESTRI S,CIAMPI M,et al.Deep neuralnetwork for hierarchical extreme multi-label text classification[J].Applied Soft Computing,2019,79:125-138. [10]LIU J,CHANG W C,WU Y,et al.Deep learning for extreme multi-label text classification [C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.2017:115-124. [11]KIM Y.Convolutional Neural Networks for Sentence Classification [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2014:1746-1751. [12]GRAVES A,MOHAMED A,HINTON G.Speech recognitionwith deep recurrent neural networks[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.2013:6645-6649. [13]JOULIN A,GRAVE E,BOJANOWSKI P,et al.FastText.zip:Compressing text classification models[J].arXiv:1612.03651,2016. [14]GARGIULO F,SILVESTRI S,CIAMPI M,et al.Deep neuralnetwork for hierarchical extreme multi-label text classification[J].Applied Soft Computing,2019,79:125-138. [15]ZHENG C,HONG T T,XUE M Y.BLSTM_MLPCNN Model For short Text Classification [J].Computer Science,2019,46(6):206-211. [16]DUAN D D,TANG J S,WEN Y,et al.Chinese short text classification algorithm based on BERT model[J].Computer engineering,2021,47(1):79-86. [17]LAN Z,CHEN M,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].arXiv:1909.11942,2019. [18]GARGIULO F,SILVESTRI S,CIAMPI M,et al.Deep neural network for hierarchical extreme multi-label text classification[J].Applied Soft Computing,2019,79:125-138. [19]SOUCY P,MINEAU G W.A simple KNN algorithm for text categorization[C]//Proceedings 2001 IEEE International Conference on Data Mining.IEEE,2001:647-648. [20]CAI L,HOFMANN T.Hierarchical document categorizationwith support vector machines[C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management.2004:78-87. |
|