计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250100059-8.doi: 10.11896/jsjkx.250100059
赵卓洋1, 秦董洪1,4, 白凤波1,4, 梁贤烨1, 徐晨1, 郑月华1, 梁宇锋1, 蓝盛2,4, 周国平3
ZHAO Zhuoyang1, QIN Donghong1,4, BAI Fengbo1,4, LIANG Xianye1, XU Chen1, ZHENG Yuehua1, LIANG Yufeng1, LAN Sheng2,4, ZHOU Guoping3
摘要: 传统图卷积网络方法在数据有限的条件下能够有效建模图结构,但由于依赖稀疏的独热编码,其捕捉词与词之间上下文关系的能力存在局限性。这一问题在低资源语言环境中尤为突出。以壮文文本主题分类任务为例,该任务不仅面临数据稀缺的困境,还需应对复杂语言结构的挑战。针对这些挑战,提出了一种适用于低资源环境的壮文主题分类方法——ZHA_TGCN。该方法利用壮文预训练模型 ZHA_BERT 提取文本特征,并将文本特征与壮文声调特征相结合,输入BiGRU以学习深层语义表示,将学习到的表示向量作为文档节点的特征提供给GCN,通过在GCN中执行标签传播来学习训练数据和未标记测试数据的特征表示。最后,利用Softmax层输出分类结果。实验结果表明,提出的方法在低资源壮文主题分类任务中的准确率为82.12%,精确率为90.08%,召回率为92.46%,F1值为90.18%,验证了该方法的有效性。
中图分类号:
| [1]WANG A H.Don’t follow me:Spam detection in twitter[C]//2010 International Conference on Security and Cryptography(SECRYPT).IEEE,2010:1-10. [2]PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends© in Information Retrieval,2008,2(1/2):1-135. [3]KARIM A,AZAM S,SHANMUGAM B,et al.A comprehensive survey for intelligent spam email detection[J].IEEE Access,2019,7:168261-168295. [4]LI J Y.From Ancient Zhuang Characters to Zhuang Script:The Zhuang People Now Have Their Own Writing System [J].Contemporary Guangxi,2019(Z1):86. [5]CHURCH K W.Word2Vec[J].Natural Language Engineering,2017,23(1):155-162. [6]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543. [7]LI Z,LIU F,YANG W,et al.A survey of convolutional neural networks:analysis,applications,and prospects [J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(12):6999-7019. [8]MIKOLOV T,KARAFIÁT M,BURGET L,et al.Recurrentneural network based language model[C]//Interspeech.2010:1045-1048. [9]ZHANG S,ZHENG D,HU X,et al.Bidirectional long short-term memory networks for relation classification[C]//Procee-dings of the 29th Pacific Asia Conference on Language,Information and Computation.2015:73-78. [10]ZULQARNAIN M,GHAZALI R,GHOUSE M G,et al.Efficient processing of GRU based on word embedding for text classification[J].International Journal on Informatics Visualization,2019,3(4):377-383. [11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186. [12]WU L,CHEN Y,SHEN K,et al.Graph neural networks fornatural language processing:A survey [J].Foundations and Trends© in Machine Learning,2023,16(2):119-328. [13]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model [J].IEEE Transactions on Neural Networks,2008,20(1):61-80. [14]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[C]//International Conference on Learning Representations.2017. [15]YAO L,MAO C,LUO Y.Graph convolutional networks fortext classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7370-7377. [16]LIU X,YOU X,ZHANG X,et al.Tensor graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8409-8416. [17]LI X,LI Z,SHENG J,et al.Low-resource text classification via cross-lingual language model fine-tuning[C]//China National Conference on Chinese Computational Linguistics.Cham:Springer,2020:231-246. [18]SAZZED S.Cross-lingual sentiment analysis in bengali utilizing a new benchmark corpus[C]//Proceedings of the 2020 EMNLP Workshop W-NUT:The Sixth Workshop on Noisy User-gene-rate.2020:50-60. [19]YAO H,WU Y,AL-SHEDIVAT M,et al.Knowledge-Aware Meta-learning for Low-Resource Text Classification[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:1814-1821. [20]FESSEHA A,XIONG S,EMIRU E D,et al.Text classification based on convolutional neural networks and word embedding for low-resource languages:Tigrinya [J].Information,2021,12(2):52. [21]AN B,ZHAO W N,LONG C J.Low-resource Tibetan TextClassification Based on Prompt Learning[J].Journal of Chinese Information Processing,2024,38(2):70-78. [22]WEN Z,FANG Y.Augmenting low-resource text classification with graph-grounded pre-training and prompting[C]//Procee-dings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.2023:506-516. [23]LIN Y,MENG Y,SUN X,et al.BertGCN:Transductive TextClassification by Combining GNN and BERT[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:1456-1462. [24]YUAN Y,LV S,BAO Z,et al.A Joint Model for Text Classification with BERT-BiLSTM and GCN[C]//Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition.2022:180-186. [25]HUANG L,MA D,LI S,et al.Text Level Graph Neural Net-work for Text Classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3444-3450. [26]JOULIN A,GRAVE É,BOJANOWSKI P,et al.Bag of Tricks for Efficient Text Classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.2017:427-431. [27]LIU P,QIU X,HUANG X.Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence.2016:2873-2879. [28]JOHNSON R,ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:562-570. [29]GUO B,ZHANG C,LIU J,et al.Improving text classification with weighted word embeddings via a multi-channel TextCNN model[J].Neurocomputing,2019,363:366-374. |
|
||