Computer Science ›› 2019, Vol. 46 ›› Issue (6): 29-34.doi: 10.11896/j.issn.1002-137X.2019.06.003

Previous Articles     Next Articles

Newly-emerging Domain Word Detection Method Based on Syntactic Analysis and Term Vector

ZHAO Zhi-bin1, SHI Yu-xin1, LI Bin-yang2   

  1. (School of Computer Science and Engineering,Northeastern University,Shenyang 110819,China)1
    (School of Information Science and Technology,University of International Relations,Beijing 100091,China)2
  • Received:2018-08-18 Published:2019-06-24

Abstract: Many existing words and phrases may be used in a domain in which they have never appeared before.These words and phrases are called newly-emerging domain words.The researchers can get insight into the latest development tendency and public opinions of a domain through these newly-emerging words.Therefore,it is significant to detect newly-emerging domain words.Based on dependency syntactic analysis and term vector,this paper proposed a newly-emerging domain words detection method.Firstly,the concept of syntactic dictionary was proposed, and its constructing method was proposed for some specific domains based on the dependency syntax of sentences and TF-IDF values of training corpus.Next,domain syntactic dictionary and term vectors were used to detect newly-emerging domain words.The comprehensive experiments were conducted to evaluate the proposed method with comment data from a skin-care products forum.The experimental results show that the syntactic dictionary is effective and the proposed method has good performance in newly-emerging domain word detection.

Key words: Newly-emerging domain words, Syntactic analysis, Syntactic dictionary, Term vector

CLC Number: 

  • TP391
[1]YANG Y,LIU L F,WEI X H,et al.New methods for extracting emotional word based on distributed representation of words[J].Journal of Shandong University(Natural Science),2014,49(11):51-58.(in Chinese)
杨阳,刘龙飞,魏现辉,等.基于词向量的情感新词发现方法[J].山东大学学报(理学版),2014,49(11):51-58.
[2]LIANG Y,YIN P,YIU S M.New Word Detection and Tagging on Chinese Twitter Stream[C]∥ International Conference on Big Data Analytics and Knowledge Discovery.Cham:Springer,2015:310-321.
[3]YAN L,BAI B,CHEN W,et al.New Word Extraction From Chinese Financial Documents[J].IEEE Signal Processing Letters,2017,24(6):770-773.
[4]SU Q L,LIU B Q.Chinese new word extraction from MicroBlog data[C]∥International Conference on Machine Learning and Cybernetics.IEEE,2014:1874-1879.
[5]WANG F.Research on New Chinese Words Detection in Micro-blog[J].Computer Engineering & Software,2015,36(11):6-8.
[6]SHEN M,KAWAHARA D,KUROHASHI S.Chinese Word Segmentation and Unknown Word Extraction by Mining Maximized Substring[J].Journal of Natural Language Processing,2016,23(3):235-266.
[7]XU Y,GU H.New Word Recognition Based on Support Vector Machines and Constraints[C]∥ International Conference on Information Science and Control Engineering.IEEE,2015:341-344.
[8]HE T,HAO R,QI H,et al.Mining Feature-Opinion from Re-views Based on Dependency Parsing[J].International Journal of Software Engineering & Knowledge Engineering,2017,26(9n10):1581-1591.
[9]LI Y,ZHOU X,SUN Y,et al.Design and Implementation of Weibo Sentiment Analysis Based on LDA and Dependency Parsing[J].China Communications,2016,13(11):91-105.
[10]SHI Z P,ZOU X X,XIANG R Z,et al.Multi-feature Word Sense Disambiguation Based on Dependency Parsing Analysis[J].Computer Engineering,2017,43(9):210-213.(in Chinese)
史兆鹏,邹徐熹,向润昭,等.基于依存句法分析的多特征词义消歧[J].计算机工程,2017,43(9):210-213.
[11]GUO F,ZHOU G.Research on micro-blog sentiment orientation analysis based on improved dependency parsing∥International Conference on Consumer Electronics.IEEE,2014.
[12]ZHI S,LI X,ZHANG J,et al.Aspects Opinion Mining Based on Word Embedding and Dependency Parsing[C]∥ International Conference on Advances in Image Processing.ACM,2017:210-215.
[13]LIN Z,WANG Y.Age Prediction in Social Networks Based on Word Embedding and Tensor Learning[C]∥ International Conference on Communication and Electronic Information Engineering.Paris:Atlantis Press,2017.
[14]HAYRAN A,SERT M.Sentiment analysis on microblog data based on word embedding and fusion techniques[C]∥ Signal Processing and Communications Applications Conference.IEEE,2017.
[15]MENG F,LU W,XUE R.Mapping senses in BabelNet to Chinese based on word embedding[C]∥ International Congress on Image and Signal Processing,Biomedical Engineering and Informatics.IEEE,2018.
[16]KUSNER M J,SUN Y,KOLKIN N I,et al.From word embeddings to document distances[C]∥ International Conference on International Conference on Machine Learning.JMLR.org,2015:957-966.
[17]CHE W,LI Z,LIU T.LTP:a Chinese Language Technology Platform[C]∥ International Conference on Computational Linguistics:Demonstrations.Association for Computational Linguistics,2010:13-16.
[1] SHI He, YANG Qun, LIU Shao-han, LI Wei. Study on Information Extraction of Power Grid Fault Emergency Pre-plans Based on Deep Learning [J]. Computer Science, 2020, 47(11A): 52-56.
[2] LIU Sheng-jiu,LI Tian-rui,JIA Zhen and ZHU Jie. Research on Parallel Chinese Syntactic Analysis Based on Hadoop Platform [J]. Computer Science, 2014, 41(3): 88-90.
[3] FENG Er-ying,NIU Yun,WEI Ou and CAI Xin-ye. Protein-protein Interaction Identification Based on Relational Similarity [J]. Computer Science, 2013, 40(6): 229-232.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!