Computer Science ›› 2020, Vol. 47 ›› Issue (11): 95-100.doi: 10.11896/jsjkx.190900012

• Database & Big Data & Data Science • Previous Articles     Next Articles

Domain Label Acquisition Method Based on SL-LDA Model

WANG Sheng1, ZHANG Yang-sen1,2, ZHANG Wen1, JIANG Yu-ru1,2, ZHANG Rui1   

  1. 1 Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China
    2 Beijing Laboratory ofNational Economic Security Early Warning Engineering,Beijing100044,China
  • Received:2019-08-31 Revised:2019-11-04 Online:2020-11-15 Published:2020-11-05
  • About author:WANG Sheng,born in 1996,postgraduate.His main research interests include natural language processing and machine learning.
    ZHANG Yang-sen,born in 1962,postdoctoral,professor,Ph.D supervisor,is a member of China Computer Federation (CCF).His main research interests include natural language processing and artificial intelligence.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61772081,61602044) and Construction of Technological Innovation Service Capability-Construction of Research Base-Beijing Laboratory-National Economic Security Early Warning Project Beijing Laboratory Project(PXM2018_014224_000010).

Abstract: The development of science and technology poses new challenges for the management of literature and scholars.In order to solve the problem of automatic management of massive scientific literature and scholars,this paper proposes a domain label acquisition method based on SL-LDA.On the basis of massive scientific literature,the distribution characteristics of scientificliterature data are analyzed,and the SL-LDA theme model is constructed by introducing the word frequency feature of scientific literature.The theme model is used to extract the “theme-phrase” from the scientific literature of the same scholar and get the initial domain keywords.Then the domain system is introduced,the extraction results of the theme model are vector-represented with the system label.After the position feature weighting,the similarity is used for system mapping.Finally,the domain label of the scholar is obtained.Experiment results show that,compared withthe traditional LDA model,the statistical-based TFIDF algorithm and the TextRank algorithm based on network graph,the final label words obtained by SL-LDA model have better effect and higher accuracy with the same amount of literature data,and the F1 value is also raised to 0.572,indicating that the domain label acquisition method based on SL-LDA has good applicability in the academic field.

Key words: Domain tags, Label mapping, Scientific literature, SL-LDA model, theme phrase extraction

CLC Number: 

  • TP391.1
[1] BUDURA A,BOURGES-WALDEGG D,R IORDAN J.Deri-ving Expertise Profiles from Tags[C]//Proceedings of the 2009 International Conferenceon Computational Science and Engineering.2009:34-41.
[2] KHAN S,NABEEL S M.OPEMS:Online Peer-to-Peer Expert-ise Matching System[C]//Proceedings of the 1st International Conferenceon Information and Communication Technologies.2005.
[3] ZHANG J.The design and implementation of expert informationmanagement system for think tank [D].Harbin Institute of Technology,2017.
[4] DAM K H T,TOUILI T.Automatic extraction of malicious behaviors[C]//2016 11th International Conference on Malicious and Unwanted Software (MALWARE).IEEE,2016.
[5] ZHAO H B,LU W.The Study of Expert Research Field Automatic Recognition [J].New Technology of Library and Information Service,2010(2):63-67.
[6] BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].J Machine Learning Research Archive,2003,3:993-1022.
[7] GROOF R D,XU H.Automatic topic discovery of online hospital reviews using an improved LDA with Variational Gibbs Sampling[C]//IEEE International Conference on Big Data.IEEE,2018.
[8] ZHOU W X,ZHANG Y S,ZHANG L.Research on topic detection and expression method for Weibo hot events[J/OL].Application Research of Computers.[2019-02-27].https://doi.org/10.19734/j.issn.1001-3695.2018.08.0601.
[9] HU X.News hotspots detection and tracking based on LDA topic model[C]//International Conference on Progress in Informatics & Computing.IEEE,2017.
[10] MIHALCEA R,TARAU P.Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[11] WEN Y,YUAN H,ZHANG P.Research on keyword extraction based on Word2Vec weighted TextRank[C]//2016 2nd IEEE International Conference on Computer and Communications (ICCC).IEEE,2016.
[12] LI W,ZHAO J.TextRank Algorithm by Exploiting Wikipedia for Short Text Keywords Extraction[C]//International Conference on Information Science & Control Engineering.IEEE,2016.
[13] CUI L,FAN M,YONG S,et al.A Hierarchy Method Based on LDA and SVM for News Classification[C]//IEEE International Conference on Data Mining Workshop.2015.
[14] YANG C Y,PAN Y N,ZHAO L.Study on Topic Extraction of Literatures Based on Weighted Semantic and Citation Relation [J].Library and Information Service,2016,60(9):131-138,146.
[15] CHEN Z,JI W.Exploiting noisy web data by OOV ranking for low-resource keyword search[C]//International Symposium on Chinese Spoken Language Processing.2017.
[16] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[17] AO F,WANG L,CHEN M,et al.Text and position ranking algorithm based on sample weighted[C]//International Conference on Information Science & Engineering.IEEE,2010.
[18] SONG Y,SHI S,LI J,et al.Directional skip-gram:Explicitlydistinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Short Papers).2018:175-180.
[19] WU H,YIN S F,MA Y X,et al.WI-LDA:Technical Topic Analysis in Patents [J].Library and Information Service,2018,62(17):68-74.
[20] SHAN B,LI F.A Survey of Topic Evolution Based on LDA[J].Journal of Chinese Information Processing,2010,24(6):43-49,68.
[1] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[2] GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318.
[3] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[4] MIU Feng, WANG Ping, LI Tai-yong. Implicit Causality Extraction Method Based on Event Action Direction [J]. Computer Science, 2022, 49(3): 276-280.
[5] XIAO Kang, ZHOU Xia-bing, WANG Zhong-qing, DUAN Xiang-yu, ZHOU Guo-dong, ZHANG Min. Review Question Generation Based on Product Profile [J]. Computer Science, 2022, 49(2): 272-278.
[6] MA Jian-hong, ZHANG Tong. Expert Recommendation Algorithm for Enterprise Engineering Problems [J]. Computer Science, 2022, 49(1): 159-165.
[7] YUAN Jing-ling, DING Yuan-yuan, SHENG De-ming, LI Lin. Image-Text Sentiment Analysis Model Based on Visual Aspect Attention [J]. Computer Science, 2022, 49(1): 219-224.
[8] LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[9] ZOU Ao, HAO Wen-ning, JIN Da-wei, CHEN Gang, TIAN Yuan. Study on Text Retrieval Based on Pre-training and Deep Hash [J]. Computer Science, 2021, 48(11): 300-306.
[10] YU Liang, WEI Yong-feng, LUO Guo-liang, WU Chang-xing. Knowledge Distillation Based Implicit Discourse Relation Recognition [J]. Computer Science, 2021, 48(11): 319-326.
[11] LI Jian-lan, PAN Yue, LI Xiao-cong, LIU Zi-wei, WANG Tian-yu. Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace [J]. Computer Science, 2021, 48(11A): 17-21.
[12] ZHANG Ming-yang, WANG Gang, PENG Qi, ZHANG Yan-feng. Data Analysis of OpenReview [J]. Computer Science, 2021, 48(6): 63-70.
[13] SHI Wei, FU Yue. Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis [J]. Computer Science, 2021, 48(6A): 158-164.
[14] PEI Ying, LI Tian-xiang, WANG Ao-qing, FU Jia-sheng, HAN Xiao-song. Prediction Method of International Natural Gas Price Trends Based on News [J]. Computer Science, 2021, 48(6A): 235-239.
[15] HUO Shuai, PANG Chun-jiang. Research on Sentiment Analysis Based on Transformer and Multi-channel Convolutional Neural Network [J]. Computer Science, 2021, 48(6A): 349-356.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!