计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 95-100.doi: 10.11896/jsjkx.190900012
王胜1, 张仰森1,2, 张雯1, 蒋玉茹1,2, 张睿1
WANG Sheng1, ZHANG Yang-sen1,2, ZHANG Wen1, JIANG Yu-ru1,2, ZHANG Rui1
摘要: 科学技术的发展为文献及学者的管理提出了新的挑战,为解决海量科技文献及学者的自动管理,文中提出了一种基于SL-LDA的领域标签获取方法。在海量科技文献的基础上,分析科技文献数据的分布特点,通过引入科技文献的词频特征构建了SL-LDA主题模型,利用该主题模型对同一学者的科技文献进行“主题-短语”抽取,获得初始领域关键词。接着引入领域体系,对主题模型的抽取结果与体系标签进行向量表征,经过位置特征加权后使用相似度进行体系映射,最终获得学者的领域标签。实验结果表明,在同样的文献数据量下,SL-LDA模型与传统的LDA模型、基于统计的TFIDF算法和基于网络图的Text-Rank算法相比,最终获取的标签词效果更好,准确率更高,F1值也提升到0.572,说明基于SL-LDA的领域标签抽取方法在学术领域具有较好的适用性。
中图分类号:
[1] BUDURA A,BOURGES-WALDEGG D,R IORDAN J.Deri-ving Expertise Profiles from Tags[C]//Proceedings of the 2009 International Conferenceon Computational Science and Engineering.2009:34-41. [2] KHAN S,NABEEL S M.OPEMS:Online Peer-to-Peer Expert-ise Matching System[C]//Proceedings of the 1st International Conferenceon Information and Communication Technologies.2005. [3] ZHANG J.The design and implementation of expert informationmanagement system for think tank [D].Harbin Institute of Technology,2017. [4] DAM K H T,TOUILI T.Automatic extraction of malicious behaviors[C]//2016 11th International Conference on Malicious and Unwanted Software (MALWARE).IEEE,2016. [5] ZHAO H B,LU W.The Study of Expert Research Field Automatic Recognition [J].New Technology of Library and Information Service,2010(2):63-67. [6] BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].J Machine Learning Research Archive,2003,3:993-1022. [7] GROOF R D,XU H.Automatic topic discovery of online hospital reviews using an improved LDA with Variational Gibbs Sampling[C]//IEEE International Conference on Big Data.IEEE,2018. [8] ZHOU W X,ZHANG Y S,ZHANG L.Research on topic detection and expression method for Weibo hot events[J/OL].Application Research of Computers.[2019-02-27].https://doi.org/10.19734/j.issn.1001-3695.2018.08.0601. [9] HU X.News hotspots detection and tracking based on LDA topic model[C]//International Conference on Progress in Informatics & Computing.IEEE,2017. [10] MIHALCEA R,TARAU P.Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411. [11] WEN Y,YUAN H,ZHANG P.Research on keyword extraction based on Word2Vec weighted TextRank[C]//2016 2nd IEEE International Conference on Computer and Communications (ICCC).IEEE,2016. [12] LI W,ZHAO J.TextRank Algorithm by Exploiting Wikipedia for Short Text Keywords Extraction[C]//International Conference on Information Science & Control Engineering.IEEE,2016. [13] CUI L,FAN M,YONG S,et al.A Hierarchy Method Based on LDA and SVM for News Classification[C]//IEEE International Conference on Data Mining Workshop.2015. [14] YANG C Y,PAN Y N,ZHAO L.Study on Topic Extraction of Literatures Based on Weighted Semantic and Citation Relation [J].Library and Information Service,2016,60(9):131-138,146. [15] CHEN Z,JI W.Exploiting noisy web data by OOV ranking for low-resource keyword search[C]//International Symposium on Chinese Spoken Language Processing.2017. [16] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [17] AO F,WANG L,CHEN M,et al.Text and position ranking algorithm based on sample weighted[C]//International Conference on Information Science & Engineering.IEEE,2010. [18] SONG Y,SHI S,LI J,et al.Directional skip-gram:Explicitlydistinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Short Papers).2018:175-180. [19] WU H,YIN S F,MA Y X,et al.WI-LDA:Technical Topic Analysis in Patents [J].Library and Information Service,2018,62(17):68-74. [20] SHAN B,LI F.A Survey of Topic Evolution Based on LDA[J].Journal of Chinese Information Processing,2010,24(6):43-49,68. |
[1] | 徐小龙,赵昌耀,耿卫健,程春玲. 一种基于智能Agent的科技文献快速协作推送机制 Rapid Collaborative Scientific and Technical Literature Push Mechanism Based on Intelligent Agent 计算机科学, 2011, 38(4): 249-253. |
|