计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600111-8.doi: 10.11896/jsjkx.230600111

• 人工智能 • 上一篇    下一篇

融合主题特征的文本情感分析模型

杨俊哲1, 宋莹2, 陈逸菲2   

  1. 1 南京信息工程大学自动化学院 南京 210044
    2 无锡学院自动化学院 江苏 无锡 214105
  • 发布日期:2024-06-06
  • 通讯作者: 宋莹(sypeace@126.com)
  • 作者简介:(20211249590@nuist.edu.cn)
  • 基金资助:
    江苏省高等学校自然科学研究面上项目(19KJB520044);江苏省研究生实践创新计划项目(SJCX23_0392)

Text Emotional Analysis Model Fusing Theme Characteristics

YANG Junzhe1, SONG Ying2, CHEN Yifei2   

  1. 1 School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044,China
    2 School of Automation,Wuxi University,Wuxi,Jiangsu 214105,China
  • Published:2024-06-06
  • About author:YANG Junzhe,born in 1999,postgra-duate,is a member of CCF(No.P2586G).His main research interests include sentiment analysis and topic classification.
    SONG Ying,born in 1979,Ph.D,postgraduate supervisor,is a member of CCF(No.P2602M).Her main research interests include computer vision and digital twins.
  • Supported by:
    Natural Science Foundation of the Jiangsu Higher Education Institutions of China(19KJB520044) and Postgraduate Research & Practice Innovation Program of Jiangsu Province(SJCX23_0392).

摘要: 随着大型语言模型的快速发展,如何在保证模型性能的同时减少模型参数量,成为了自然语言处理领的一个重要挑战。然而,现有的参数压缩技术往往难以兼顾模型的稳定性和泛化能力。为此,提出了一种融合主题特征的情感分析新架构,旨在利用主题信息增强模型对文本情感极性的判断能力。具体而言,采用一种结合LDA和K-means的方法来提取文本的主题特征,并将其作为固定维度的向量与词嵌入进行拼接,得到新的词向量表示。随后使用平均池化技术构建句子级别的表征向量,并输入到一个全连接层进行情感分类。为了验证所提模型的有效性,在公开的情感分析数据集上与多个基准算法进行了对比实验。实验结果表明,所提模型在多个数据集上明显优于ALBERT,准确率提高了约3.5%,在参数量仅有微小增加的情况下维持了较高的稳定性和泛化能力。

关键词: 情感分析, ALBERT模型, LDA模型, 主题特征, 平均池化

Abstract: With the rapid development of large-scale language models,how to reduce the number of model parameters while ensuring model performance has become an important challenge in the field of natural language processing.However,the existing parameter compression techniques are often difficult to balance the stability and generalization ability of the model.To this end,this paper proposes a new framework for sentiment analysis that integrates topic features,aiming to use topic information to enhance the model’s ability to judge text sentiment polarity.Specifically,a method combining LDA and K-means is used to extract the topic features of the text,and it is spliced with word embeddings as a fixed-dimensional vector to obtain a new word vector representation.Sentence-level representation vectors are then constructed using average pooling techniques and fed into a fully connected layer for sentiment classification.To verify the effectiveness of the proposed model,comparative experiments with multiple benchmark algorithms are carried out on public sentiment analysis datasets.Experimental results show that the proposed model is significantly better than ALBERT in multiple data sets,with an accuracy rate increases by about 3.5%,and it maintains high stability and generalization ability with only a small increase in the number of parameters.

Key words: Emotional analysis, ALBERT model, Latent dirichlet allocation, Theme features, Average pooling

中图分类号: 

  • TP391
[1]WANKHADE M,RAO A C S,KULKARNI C.A survey onsentiment analysis methods,applications,and challenges[J].Artificial Intelligence Review,2022,55(7):5731-5780.
[2]TAHERDOOST H,MADANCHIAN M.Artificial Intelligence and Sentiment Analysis:A Review in Competitive Research[J].Computers,2023,12(2):37.
[3]ZHOU J,YE J M.Sentiment analysis in education research:a review of journal publications[J].Interactive learning environments,2023,31(3):1252-1264.
[4]LAN Y X,ZHANG L W,WANG H W,et al.Risk-oriented online public opinion abnormal perception and empirical research[J].Modern intelligence,2022,42(3):102-108.
[5]OSMANI A,MOHASEFI J B,SHI Y.Opinion Mining Using Enriched Joint Sentiment-Topic Model[J].International Journal of Information Technology & Decision Making,2023,22(1):313-375.
[6]CHATURVEDI I,CAMBRIA E,WELSCH R E,et al.Distinguishing between facts and opinions for sentiment analysis:Survey and challenges[J].Information Fusion,2018,44:65-77.
[7]ZHOU C,LI Q,LI C,et al.A comprehensive survey on pretrained foundation models:A history from bert to chatgpt[J/OL].https://arxiv.org/abs/2302.09419.
[8]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[9]LAN Z,CHENM,GOODMAN S,et al.ALBERT:A Lite BERT for Self-supervised Learning of Language Representations[J].arXiv:1909.11942,2019.
[10]TURNEY P D,LITTMAN M L.Measuring praise and criticism:Inference of semantic orientation from association[J].ACM Transactions on Information Systems(TOIS),2003,21(4):315-346.
[11]DEY S,WASIF S,TONMOY D S,et al.A comparative study of support vector machine and Naive Bayes classifier for sentiment analysis on Amazon product reviews[C]//2020 International Conference on Contemporary Computing and Applications(IC3A),2020:217-220.
[12]CHEN Y.Convolutional neural network for sentence classification[D].University of Waterloo,2015.
[13]ALROOBAEAl R.Sentiment Analysis on Amazon Product Reviews using the Recurrent Neural Network(RNN)[J].International Journal of Advanced Computer Science and Applications,2022,13(4):314-318.
[14]LIN X,CHEN Z Z,WANG Z Q.Attribute-level emotional classification based on unbalanced data and integrated learning[J].Computer Science,2022,49(S1):144-149.
[15]HU Y L,TONG T Q,ZHANG X Y,et al.In-depth learningemotional analysis method of integrating self-attention mechanism[J].Computer Science,2022,49(1):252-258.
[16]YAHAV A,VISHWAKARMAI D K.Sentiment analysis using deep learning architectures:a review[J].Artificial Intelligence Review,2020,53(6):4335-4385.
[17]MIKOLOV T,CHEN K,COEEADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[18]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762,2017.
[19]HUANG S C,HAN D H,QIAO B Y,et al.Insumer emotional analysis method based on ERNIE2.0-BILSTM-ATTENTION[J].Journal of Chinese Computer Systems,2021,42(12):2485-2489.
[20]LIU P,YUAN W,FU J,et al.Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023,55(9):1-35.
[21]LI M,LI W,WANG F,et al.Applying BERT to analyze investor sentiment in stock market[J].Neural Computing and Applications,2020(3):1-14.
[22]SONG M,LIU Y L.Bert in the application and optimization of the emotional classification of Weibo short text[J].Journal of Chinese Computer Systems,2021,42(4):714-718.
[23]WANG H,HU X,ZHANG H.Sentiment analysis of com modity reviews based on ALBERT-LSTM[C]//Journal of Physics:Conference Series.Bristol,UK,2020:012022.
[24]GAO X,DING G,LIU C,et al.Research on high precision Chinese text sentiment Classification based on ALBERT Optimization[C]//2023 15th International Conference on Advanced Computational Intelligence(ICACI).Nanjing,China,2023:1-6.
[25]BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet Allocation[J].The Annals of Applied Statistics,2003,3:993-1022.
[26]YAN F,DU T F,MAO J H,et al.Emotional analysis of the stock market text based on emotional dictionary and LDA model[J].Electronic Measurement Technology,2017,40(12):82-87.
[27]XUE J,CHEN J,HU R,et al.Twitter discussions and emotions about the COVID-19 pandemic:Machine learning approach[J].Journal of Medical Internet Research,2020,22(11):e20550.
[28]BUI Q V,SAYADI K,BUI M.A multi-criteria document clustering method based on topic modeling and pseudoclosure function[C]//Proceedings of the Sixth International Symposium on Information and Communication Technology.Ho Chi Minh City,Vietnam,2015:38-45.
[29]SUN Y,WANG S,FENG S,et al.ERNIE 3.0:Large-scaleKnowledge Enhanced Pre-training for Language Understanding and Generation[J].arXiv:2107.02137,2021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!