计算机科学 ›› 2015, Vol. 42 ›› Issue (9): 208-213.doi: 10.11896/j.issn.1002-137X.2015.09.040

• 人工智能 • 上一篇    下一篇

基于深层结构模型的新词发现与情感倾向判定

孙晓,孙重远,任福继   

  1. 合肥工业大学计算机与信息学院 合肥230009,合肥工业大学计算机与信息学院 合肥230009,合肥工业大学计算机与信息学院 合肥230009
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61203315),国家863计划(2012AA011103)资助

New Word Detection and Emotional Tendency Judgment Based on Deep Structured Model

SUN Xiao, SUN Chong-yuan and REN Fu-ji   

  • Online:2018-11-14 Published:2018-11-14

摘要: 随着社交网络的发展,新的词汇不断出现。新词的出现往往表征了一定的社会热点,同时也代表了一定的公众情绪,新词的识别与情感倾向判定为公众情绪预测提供了一种新的思路。通过构建深层条件随机场模型进行序列标记,引入词性、单字位置和构词能力等特征,结合众包网络词典等第三方词典。传统的基于情感词典的方法难以对新词情感进行判定,基于神经网络的语言模型将单词表示为一个K维的词义向量,通过寻找新词词义向量空间中距离该新词最近的词,根据这些词的情感倾向以及与新词的词义距离,判断新词的情感倾向。通过在北京大学语料上的新词发现和情感倾向判定实验,验证了所提模型及方法的有效性,其中新词判断的F值为0.991,情感识别准确率为70%。

关键词: 新词发现,条件随机场,深层结构模型,情感倾向判定,神经网络语言模型

Abstract: With the development of social network,new words appear ceaselessly.The appearance of new word tends to characterize the social hot spot or represent certain public mood.The new word detection and emotional tendency judgment provide a new way for the public mood forecast.We constructed the deep conditional random fields model for the sequence labeling,introduced part of speech,character position,the ability of word formation as features,and combined it with the crowd sourcing network dictionary and the other third party dictionary.Traditional method based on emotional dictionary is difficult to judge the new word emotional tendency.We expressed word as a vector of K dimension based on neural network language model in order to find the nearest words to the new word in the vector space.According to the emotional tendency of these words and the distance between them and the new word,the new word sentiment is judged.The experiment on corpus of Peking university demonstrates the feasibility of the proposed model and method,in which the new word detection F-value is 0.991,and the emotion recognition accuracy is 70%.

Key words: New word detection,Conditional random fields,Deep structured model,Emotional tendency judgment,Neural network language model

[1] 聂金慧,苏红旗,时志远.中文新词提取与过滤研究综述[J].中国科技博览,2013(30):209-210 Nie Jin-hui,Su Hong-qi,Shi Zhi-yuan.Survey of Chinese new words extracting and filtering[J].China Science and Technology Review,2013(30):209-210
[2] Sproat R,Emerson T.The First International Chinese WordSegmentation Bakeoff[C]∥Proceedings of the Second SIGHAN Workshop on Chinese Language Processing.Sapporo,Japan,2003:133-143
[3] 张海军,史树敏,朱朝勇,等.中文新词识别技术综述[J].计算机科学,2010,7(3):6-10 Zhang Hai-jun,Shi Shu-min,Zhu Chao-yong,et al.Survey of Chinese new words identification[J].Computer science,2010,7 (3):6-10
[4] Fu G,Luke K-k.Chinese Unknown Word Identification UsingClass based LM [C]∥Proceedings of The First International Joint Conference on Natural Language Processing.Hainan Island,China,2004:262-269
[5] Goh C-L,Asahara M,Matsumoto Y.Machine Learning-basedMethods to Chinese Unknown Word Detection and POS Tag Guessing[J].Journal of Chinese Language and Computing,2006,6(4):185-206
[6] Xu Yuan-fang,Gu Hui.New Word Recognition Based On Support Vector Machines And Constraints[C]∥Proceedings of 2013 IEEE International Conference on Computer Science and Automation Engineering.Singapore,2013:56-59
[7] Li Cheng-cheng,Xu Yuan-fang.Using on support vector andwordfeatures new word discovery research[M]∥Trustworthy Computing and Services.Springer Berlin Heidelberg,2013:287-294
[8] Zeng Hua-lin,Zhou Chang-le,Zheng Xu-ling.A New Word Detection Method for Chinese based on local context information[J].Journal of Donghua University(English version),2010,27(2):189-192
[9] 陈飞,刘奕群,魏超,等.基于条件随机场方法的开放领域新词发现[J].软件学报,2013,24(5):1051-1060 Chen Fei,Liu Yi-qun,Wei Chao,et al.Open Domain New WordDetection Based on Condition Random Field Method[J].Journal of Software,2013,24(5):1051-1060
[10] 张靖,金浩.汉语词语情感倾向自动判断研究[J].计算机工程,2010,6(23):194-196 Zhang Jing,Jin Hao.Study on Chinese word sentiment Polarity Automatic.Estimation[J].Computer Engineering,2010,36(23):194-196
[11] 郑文超,徐鹏.利用word2vec对中文词进行聚类的研究[J].软件,2013,4(12):160-162 Zheng Wen-chao,Xu Peng.Research on Chinese words Clustering with word2vec[J].Computer Engineering and Software,2013,4(12):160-162
[12] Dong Yu,Li Deng,Wang Shi-zhen.Learning in the deep-structured conditional random fields[C]∥Proc.NIPS Workshop.2009:1-8
[13] Peng Fu-chun,Feng Fang-fang,McCallum A.Chinese segmentation and new word detection using conditional random fields[C]∥Proceedings of the 20th International Conference on Computational Linguistics.2004:562-568
[14] 邱泉清,苗夺谦,张志飞.中文微博命名实体识别[J].计算机科学,2013,40(6):196-198Qiu Quan-qing,Miao Duo-qian,Zhang Zhi-fei.Named entity re-cognition on Chinese micro-blog [J].Computer science,2013,40(6):196-198
[15] Mikolov T,Chen K,Corrado G,et al.Efficient estimation ofword representations in vector space[J].arXiv preprint arXiv:1301.3781,2013
[16] Xu Wei,Rudnicky A.Can artificial neural networks learn lan-guage models?[C]∥The Proceedings of the 6th International Conference on Spoken Language Processing.2000:202-205

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!