计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230800011-9.doi: 10.11896/jsjkx.230800011
蒋昊达, 赵春蕾, 陈瀚, 王春东
JIANG Haoda, ZHAO Chunlei, CHEN Han, WANG Chundong
摘要: 领域情感词典的构建是领域文本情感分析的基础。现有的领域情感词典构建方法存在所筛选候选情感词冗余度高、情感极性判断失准、领域依赖性强等问题。为了提高所筛选候选情感词的领域性和判断领域情感词极性的准确程度,提出了一种基于改进词频-逆文档频率(TF-IDF)与BERT的领域情感词典构建方法。该方法在筛选领域候选情感词阶段对TF-IDF算法进行改进,将隐含狄利克雷分布(LDA)算法与改进后的TF-IDF算法结合,进行领域性修正,提升了所筛选候选情感词的领域性;在候选情感词极性判断阶段,将情感倾向点互信息算法(SO-PMI)与BERT结合,利用领域情感词微调BERT分类模型,提高了判断领域候选情感词情感极性的准确程度。在不同领域的用户评论数据集上进行实验,结果表明,该方法可以提高所构建领域情感词典的质量,使用该方法构建的领域情感词典用于汽车领域和手机领域文本情感分析的F1值分别达到78.02%和88.35%。
中图分类号:
[1]ZHAO Y Y,QIN B,LIU T,et al.Sentiment Analysis[J].Journal of Software,2010,21(8):1834-1848. [2]ZHAO Y Y,QIN B,SHI Q H,et al.Large-scale Sentiment Le-xicon Collection and Its Application in Sentiment Classification[J].Journal of Chinese Information Processing,2017,31(2):187-193. [3]DAI L,LIU B,XIA Y,et al.Measuring Semantic Similarity between Words Using HowNet[C]//2008 International Confe-rence on Computer Science and Information Technology,Los Alamitos,USA:IEEE Computer Society,2008:601-605. [4]LI J,SUN M.Experimental Study on Sentiment Classification ofChinese Review using Machine Learning Techniques[C]//2007 International Conference on Natural Language Processing and Knowledge Engineering.Piscataway,USA:IEEE,2007:393-400. [5]KU L,CHEN H.Mining opinions from the Web:Beyond relevance retrieval[J].Journal of the American Society for Information Science and Technology,2007,58(12):1838-1850. [6]ZHAI Y,WANG Z,ZENG H,et al.Social Media Opinion Lea-der Identification Based on Sentiment Analysis[C]//Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing.New York,USA:Association for Computing Machinery,2021:436-440. [7]PARK S,LEE W,MOON I.Efficient extraction of domain specific sentiment lexicon with active learning[J].Pattern Recognition Letters,2015,56(apr.15):38-44. [8]WANG K,XIA R.A Survey on Automatical ConstructionMethods of Sentiment Lexicons[J].Acta Automatica Sinica,2016,42(4):495-511. [9]NEVIAROUSKAYA A,PRENDINGER H,ISHIZUKA M.Senti-Ful:A Lexicon for Sentiment Analysis[J].IEEE Transactions on Affective Computing,2011,2(1):22-36. [10]HASSAN A,ABUJBARA A,RADE V,et al.Identifying the Semantic Orientation of Foreign Words[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,USA:Association for Computational Linguistics,2011:592-597. [11]DRAGUT E C,WANG H,SISTLA P,et al.Polarity Consistency Checking for Domain Independent Sentiment Dictionaries[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(3):838-851. [12]ZHU Y L,MIN J,ZHOU Y Q,et al.Semantic Orientation Computing Based on HowNet[J].Journal of Chinese Information Processing,2006,20(1):14-20. [13]BOLLEGALA D,WEIR D,CARROLL J.Using MultipleSources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification[C]//Meeting of the Association for Computational Linguistics:Human Language Technologies.USA:Association for Computational Linguistics,2011:132-141. [14]KRESTEL R,SIERSDORFER S.Generating contextualizedsentiment lexica based on latent topics and user ratings[C]//Proceedings of the 24th ACM Conference on Hypertext and Social Media.New York,USA:ACM,2013:129-138. [15]DENG D,JING L,YU J,et al.Sentiment Lexicon Construction With Hierarchical Supervision Topic Model[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2019,27(4):704-718. [16]WANG Y,YIN F,LIU J,et al.Automatic construction of domain sentiment lexicon for semantic disambiguation[J].Multimedia tools and applications,2020,79(31/32):22355-22373. [17]ZHAO C,ZHANG P,LIU J,et al.Research on Domain Emotion Dictionary Construction Method based on Improved SO-PMI Algorithm[C]//2021 5th International Conference on Natural Language Processing and Information Retrieval(NLPIR).New York,USA:Association for Computing Machinery,2021:18-23. [18]WANG Y,HUANG G,LI M,et al.Automatically Constructing a Fine-Grained Sentiment Lexicon for Sentiment Analysis[J].Cognitive Computation,2022,15(1):254-271. [19]REN W,ZHANG H W,CHEN M.A Method of Domain Dictionary Construction for Electric Vehicles Disassembly[J].Entropy.2022,24(3):363. [20]HUANG S,NIU Z,SHI C.Automatic construction of domain-specific sentiment lexicon based on constrained label propagation[J].Knowledge-Based Systems,2014,56(jan.):191-200. [21]XI Y H.Construction of Domain-specific Sentiment Lexicon in Product Reviews[J].Journal of Chinese Information Proces-sing,2016,30(5):136-144. [22]LI C,YAN X,XU G,et al.Khmer Sentiment Lexicon Based on PU Learning and Label Propagation Algorithm[J].ACM Tran-sactions on Asian and Low-Resource Language Information Processing,2023,22(3):1-18. [23]YANG X P,ZHANG Z X,WANG L,et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J].Computer Science,2017,44(1):42-47. [24]ZHANG P,WANG J X,WANG Y H.Sentiment Lexicon Construction Method Based on Label Propagation[J].Computer Engineering,2018,44(5):168-173. [25]YANG S Q,XU C J.Research on Constructing Sentiment Dictionary of Online Course Reviews based on Multi-source Combination[C]//Proceedings of the 2019 2nd International Confe-rence on Data Science and Information Technology.New York,USA:ACM,2019:71-76. [26]YE X,CAO J B,XU FEI X,et al.Sentiment dictionary adaptive learning method in Chinese domain[J].Computer Engineering and Design,2020,41(8):2231-2237. [27]LIU H,CHEN X,LIU X.A Study of the Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF for Text Sentiment Analysis[J].IEEE Access,2022,10:32280-32289. [28]BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet Allocation[J].Journal of Machine Learning Research,2003,3:993-1022. [29]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg,USA:Association for Computational Linguistics,2019:4171-4186. |
|