计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 19-26.doi: 10.11896/jsjkx.191000531C
喻影1, 陈珂1,2, 寿黎但1,2, 陈刚1,2, 吴晓凡3
YU Ying1, CHEN Ke1,2, SHOU Li-dan1,2, CHEN Gang1,2, WU Xiao-fan3
摘要: 情感分析的一项主要研究任务是根据文档内容对其情感极性(即正类和负类)进行判断。在判断文档的情感极性时,不同的词语和句子具有不同的情感贡献度,因此如何从整个文档中准确地提取与情感分类更相关的词语和句子,从而提升分类性能,成为了一个重要问题。在有监督实验中,基于依存句法关系分析句子的逻辑结构,提取出了与表达情感更相关的词语进行加权,提高了分类性能。在半监督实验中,使用基于中文评论的关键句抽取和分类器融合算法,对整篇文档中包含更多情感词和总结意味的关键句进行了抽取,充分考虑了句子的情感词属性、位置属性、标点符号属性和关键词属性,并且使用分类器融合算法,让置信度最高的子分类器决定分类效果。在大众点评网和头条新闻的数据集上将所提算法与已有的经典算法进行对比,发现所提方法的性能更高,从而证明了基于依存句法分析的关键词抽取和基于特征的中文关键句抽取算法的有效性。
中图分类号:
[1]TURNEY P D.Thumbs up or thumbs down?:semantic orienta-tion applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002:417-424. [2]NAKAGAWA T,INUI K,KUROHASHI S.Dependency tree-based sentiment classification using CRFs with hidden variables[C]//Human Language Technologies:Conference of the North American Chapter of the Association of Computational Linguistics.Los Angeles,California,USA,DBLP,2010:786-794. [3]MCDONALD R T,HANNAN K,NEYLON T,et al.Structured Models for Fine-to-Coarse Sentiment Analysis[C]//Proceedings of the,Meeting of the Association for Computational Linguistics(ACL 2007).Prague,Czech Republic,DBLP,2007:30-32. [4]ABBASI A,FRANCE S,ZHANG Z,et al.Selecting Attributes for Sentiment Classification Using Feature Relation Networks[J].IEEE Transactions on Knowledge & Data Engineering,2011,23(3):447-462. [5]AGARWAL A,XIE B,VOVSHA I,et al.Sentiment analysis of twitter data[C]//Proceedings of the Workshop on Language in Social Media (LSM 2011).2011:30-38. [6]LIU J W,LIU Y,LUO X Q.Semi-supervised learning method[J].Journal of Computer Science,2015(8):1592-1617. [7]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781. [8]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.2014:1188-1196. [9]ZHOU X,WAN X,XIAO J.Cross-lingual sentiment classification with bilingual document representation learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:1403-1412. [10]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].arXiv:1607.04606. [11]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365. [12]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805. [13]TAN S,WANG Y,CHENG X.Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2008:743-744. [14]CAMBRIA E,PORIA S,HAZARIKA D,et al.SenticNet 5:Discovering conceptual primitives for sentiment analysis by means of context embeddings[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018. [15]SCUDDER H J.Probability of error of some adaptive pattern-recognition machines[J].IEEE Transactions on Information Theory,1965,11(3):363-371. [16]FRALICK S C.Learning to recognize patterns without a teacher[J].IEEE Transactions on Information Theory,1967,13(1):57-64. [17]AGRAWALA A K.Learning with a probabilistic teacher[J].IEEE Transactions on Information Theory,1970,16(4):373-379. [18]PARK S B,ZHANG B T.Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information[J].Information Processing &Management,2004,40(3):421-439. [19]KIRITCHENKO S,MATWIN S.Email classification with co-training[C]//Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research.IBM Corp,2011:301-312. [20]SU Y,JU S,WANG Z,et al.Semi-supervised sentiment classification with random feature subspace method [J].Journal of Chinese Information Processing,2012,26(4):85-91. |
[1] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[2] | 庞兴龙, 朱国胜. 基于半监督学习的网络流量分析研究 Survey of Network Traffic Analysis Based on Semi Supervised Learning 计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131 |
[3] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[4] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[5] | 许华杰, 陈育, 杨洋, 秦远卓. 基于混合样本自动数据增强技术的半监督学习方法 Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques 计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156 |
[6] | 丁锋, 孙晓. 基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取 Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF 计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046 |
[7] | 袁景凌, 丁远远, 盛德明, 李琳. 基于视觉方面注意力的图像文本情感分析模型 Image-Text Sentiment Analysis Model Based on Visual Aspect Attention 计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074 |
[8] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[9] | 戴宏亮, 钟国金, 游志铭, 戴宏明. 基于Spark的舆情情感大数据分析集成方法 Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark 计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280 |
[10] | 张瑾, 段利国, 李爱萍, 郝晓燕. 基于注意力与门控机制相结合的细粒度情感分析 Fine-grained Sentiment Analysis Based on Combination of Attention and Gated Mechanism 计算机科学, 2021, 48(8): 226-233. https://doi.org/10.11896/jsjkx.200700058 |
[11] | 史伟, 付月. 考虑语境的微博短文本挖掘:情感分析的方法 Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis 计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089 |
[12] | 潘芳, 张会兵, 董俊超, 首照宇. 基于高效Transformer的中文在线课程评论方面情感分析 Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer 计算机科学, 2021, 48(6A): 264-269. https://doi.org/10.11896/jsjkx.200800116 |
[13] | 张明阳, 王刚, 彭起, 张岩峰. 学术论文公开评审平台数据分析 Data Analysis of OpenReview 计算机科学, 2021, 48(6): 63-70. https://doi.org/10.11896/jsjkx.200500138 |
[14] | 尹久, 池凯凯, 宦若虹. 基于ATT-DGRU的文本方面级别情感分析 Aspect-level Sentiment Analysis of Text Based on ATT-DGRU 计算机科学, 2021, 48(5): 217-224. https://doi.org/10.11896/jsjkx.200500076 |
[15] | 李建兰, 潘岳, 李小聪, 刘子维, 王天宇. 基于CiteSpace的中文评论文本研究现状与趋势分析 Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace 计算机科学, 2021, 48(11A): 17-21. https://doi.org/10.11896/jsjkx.210300172 |
|