Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230700064-5.doi: 10.11896/jsjkx.230700064

• Artificial Intelligenc • Previous Articles     Next Articles

Study on Tibetan Short Text Classification Based on DAN and FastText

LI Guo1,2, CHEN Chen1,2, YANG Jing1,3, QUN Nuo1   

  1. 1 School of Information Science and Technology,Tibet University,Lhasa 850000,China
    2 Engineering Research Center of Tibetan Information Technology Ministry of Education,Tibet University,Lhasa 850000,China
    3 School of Cyber Science and Engineering,Sichuan University,Chengdu 610000,China
  • Published:2024-06-06
  • About author:LI Gu,born in 1994,postgraduate.His main research interest includs natural language processing.
    YANG Jing,born in 1980,professor.His main research interests include cyberspace security and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61872254,62162057).

Abstract: As Tibetan information continues to be integrated into social life,more and more Tibetan short text data is available on online platforms.Aiming at the low classification performance of traditional classification methods on Tibetan short texts,a Tibetan short text classification model based on DAN-FastText is proposed.The model uses the FastText network to perform unsupervised training on a large-scale Tibetan corpus to obtain the pre-trained Tibetan syllabic vector set,uses the pre-trained syllable vector set to convert the Tibetan short text information into syllable vector,sends the syllable vector into the deep averaging networks(DAN) network and fuses the sentence vector features trained by the FastText network in the output stage,and finally completes the classification through the fully connected layer and the softmax layer.On the publicly available tibetan news classification corpus(TNCC) news headline dataset,Macro-F1 is 64.53%,which is 2.81% higher than that of the TiBERT model and 6.14% higher than that GCN model,and the fusion model has a better Tibetan short text classification effect.

Key words: Tibetan short text classification, Feature fusion, Deep averaging networks, Fast text

CLC Number: 

  • TP391.1
[1]SALTON G,WONG A,YANG C A.A vector space model for automatic indexing[C]//Communications of the ACM.1975:613-620.
[2]MIKOLOV T,SUTSKEVER I,CHENK,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[3]DUMAIS,SUSAN T.Latent semantic analysis.AnnualReview of Information Science and Technology[J].Annual Review of Information Science and Technology,2004,38:189-230.
[4]CHRISTOS H P,PRABHAKAR R,HISAO T,et al.Latent semantic indexing:a probabilistic analysis[J].Journal of Computer and System Sciences,2000,61(2):217-235.
[5]BLEI D M,NG A Y,JORDANM I.Latent dirichlet allocation[J].Machine Learning Research Archive,2003,3(Jan):993-1022.
[6]CAO C P,CUI H C.Microblog topic detection based on LSA and structural property[J].Application Research of Computers,2015,32(9):2720-2723.
[7]WANG Y B,ZHENG W J,CHENG Y S,et al.Multi-label classification algorithm based on PLSA learning probability distribution semantic information[J].Journal of NANJING University(Natural Science),2021,57(1):75-89.
[8]SUN X K,DAI H,ZHOU J H,et al.LTTFAD:log template topic feature-based anomaly detection[J].Computer Science,2023,50(6):313-321.
[9]YAN X H,GUO J F,LAN Y Y,et al.A biterm topic model for short texts[C]// WWW 2013-Proceedings of the 22nd International Conference on World Wide Web.2013:1445-1456.
[10]JIANG X H,SHEN Y H,WANG Y Z,et al.BaKGraSTeC:a background knowledge graph based method for short text classifications[C]//2020 IEEE International Conference on Know-ledge Graph(ICKG).IEEE,2020:360-366.
[11]HE Y,WANG C,ZHANG S,et al.KG-MTT-BERT:knowledge graph enhancedbert for multi-type medical text classification[J].arXiv:2210.03970,2022.
[12]LI B H,XIANG Y X,FENG D I,et al.Short text classification model combining knowledge aware and dual attention[J].Journal of Software,2022,33(10):3565-3581.
[13]JIANG T,YUAN B,YU H Z.Multi-feature based sentimentanalysis of Tibetan microblogs[J].Journal of Chinese Information Processing,2017,31(3):163-169.
[14]YAN X D,HUANG T.Tibetan sentence sentiment classification based on emotion dictionary[J].Journal of Chinese Information Processing,2018,32(2):75-80.
[15]ZHU Y L,DEJI K Z,QUN N,et al.Sentiment analysis of Tibe-tan short texts based on graphical neural networks and pre-training models[J].Journal of Chinese Information Processing,2023,37(2):71-79.
[16]MENG X H,YU H Z.Tibetan text sentiment classificationcombining syllables and words[J].Journal of Chinese Information Processing,2023,37(2):80-86.
[17]QUN N,LI X,QIU X,et al.End-to-End neural text classification for Tibetan[C]//The Sixteenth China National Conference on Computational Linguistics.2017:1-8.
[18]XU G X,ZHANG Z X,YU S N,et al.Tibetannews text classification based on graph convolutional networks[J].Data Analysis and Knowledge Discovery,2022,7(6):73-85.
[19]LIU S S,DENG J J,SUN Y,et al.TiBERT:tibetan pre-trained language model[C]//2022 IEEE International Conference on Systems.IEEE,2022:2956-2961.
[1] WANG Yanlin, SUN Jing, YANG Hongbo, GUO Tao, PAN Jiahua, WANG Weilian. Classification Model of Heart Sounds in Pulmonary Hypertension Based on Time-Frequency Fusion Features [J]. Computer Science, 2024, 51(6A): 230800091-7.
[2] QUE Yue, GAN Menghan, LIU Zhiwei. Object Detection with Receptive Field Expansion and Multi-branch Aggregation [J]. Computer Science, 2024, 51(6A): 230600151-6.
[3] LIU Heng, LIN Hongyu, WU Tao. Detection Method for Workers’ Illegal Operation Behavior in PackagingWorkshop of CigaretteFactory [J]. Computer Science, 2024, 51(6A): 230700123-8.
[4] KANG Zhiyong, LI Bicheng, LIN Huang. User Interest Recognition Method Incorporating Category Labels and Topic Information [J]. Computer Science, 2024, 51(6A): 230500169-8.
[5] HAN Zhigeng, ZHOU Ting, CHEN Geng, FU Chunshuo, CHEN Jian. RM-RT2NI:A Recommendation Model with Review Timeliness and Trusted Neighbor Influence [J]. Computer Science, 2024, 51(6A): 230800160-7.
[6] LI Yuehao, WANG Dengjiang, JIAN Haifang, WANG Hongchang, CHENG Qinghua. LiDAR-Radar Fusion Object Detection Algorithm Based on BEV Occupancy Prediction [J]. Computer Science, 2024, 51(6): 215-222.
[7] GAO Nan, ZHANG Lei, LIANG Ronghua, CHEN Peng, FU Zheng. Scene Text Detection Algorithm Based on Feature Enhancement [J]. Computer Science, 2024, 51(6): 256-263.
[8] SHAN Xinxin, LI Kai, WEN Ying. Medical Image Segmentation Network Integrating Full-scale Feature Fusion and RNN with Attention [J]. Computer Science, 2024, 51(5): 100-107.
[9] ZHOU Yu, CHEN Zhihua, SHENG Bin, LIANG Lei. Multi Scale Progressive Transformer for Image Dehazing [J]. Computer Science, 2024, 51(5): 117-124.
[10] BAI Xuefei, SHEN Wucheng, WANG Wenjian. Salient Object Detection Based on Feature Attention Purification [J]. Computer Science, 2024, 51(5): 125-133.
[11] WU Xiaoqin, ZHOU Wenjun, ZUO Chenglin, WANG Yifan, PENG Bo. Salient Object Detection Method Based on Multi-scale Visual Perception Feature Fusion [J]. Computer Science, 2024, 51(5): 143-150.
[12] HONG Tijing, LIU Dengfeng, LIU Yian. Radar Active Jamming Recognition Based on Multiscale Fully Convolutional Neural Network and GRU [J]. Computer Science, 2024, 51(5): 306-312.
[13] XUE Jinqiang, WU Qin. Progressive Multi-stage Image Denoising Algorithm Combining Convolutional Neural Network and
Multi-layer Perceptron
[J]. Computer Science, 2024, 51(4): 243-253.
[14] ZHANG Yang, XIA Ying. Object Detection Method with Multi-scale Feature Fusion for Remote Sensing Images [J]. Computer Science, 2024, 51(3): 165-173.
[15] QIAO Fan, WANG Peng, WANG Wei. Multivariate Time Series Classification Algorithm Based on Heterogeneous Feature Fusion [J]. Computer Science, 2024, 51(2): 36-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!