计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 275-282.doi: 10.11896/j.issn.1002-137X.2019.03.041
吕佳高1,梁奎阳2,蔡伟3
LV Jia-gao1,LIANG Kui-yang2,CAI Wei3
摘要: 随着科学技术高速发展,科技文献的数量与日俱增,从海量的文献数据中挖掘出前沿科技关键词是一个新的挑战,由专家进行人工分析是一种常见而传统的方式,但这种方式的效率低且成本高。文中提出了一种将文献计量与众包技术相结合的算法:首先利用自然语言处理的词性标注技术处理并获取文献中的名词,然后通过基于文献计量的科技监测方法筛选出潜在的科技关键词,最后利用众包平台的数据进一步筛选潜在的科技关键词。采用计算机领域和生物医药领域的英文文献数据进行实验,结果表明所提算法有一定的效果,其效率比人工分析的方式高,能为专家人工分析起到辅助作用。所提算法能够更好地指导前沿技术关键词的挖掘,为未来更加自动和智能的前沿技术关键词挖掘提供参考。
中图分类号:
[1]JINHA A E.Article 50 million:an estimate of the number of scholarly articles in existence[J].Learned Publishing,2010,23(3):258-263. [2]WARE M,MABE M.The STM report:An overview of scienti- fic and scholarly journal publishing[R].Nebraska:Digital Commons at University of Nebraska-Lincoln,2015. [3]SCOTT J.Social Networks:Critical Concepts in Sociology (Vol.4).London:Routledge,2002:328-331. [4]BRAAM R R,MOED H F,VAN RAAN A F J.Mapping of scien- ce by combined co-citation and word analysis I.Structural aspects[J].Journal of the American Society for information Scien-ce,1991,42(4):233. [5]SMALL H,GRIFFITH B C.The structure of scientific litera- tures I:Identifying and graphing specialties[J].Science studies,1974,4(1):17-40. [6]PERSSON O.The intellectual base and research fronts of “ja- sis” 1986-1990[J].Journal of the American Society for Information Science,1994,45(1):31. [7]CHEN C.CiteSpace II:Detecting and visualizing emerging trends and transient patterns in scientific literature[J].Journal of the American Society for information Science and Technology,2006,57(3):359-377. [8]ZHU L,ZHAO R X,KOU Y T,et al.Study on Integrated Mode of Science and Technology Monitoring Base on Literature[J].Digital Library Forum,2015(10):53-57.(in Chinese) 朱亮,赵瑞雪,寇远涛,等.一种基于文献的综合科技监测模式研究[J].数字图书馆论坛,2015(10):53-57. [9]KLEINBERG J.Bursty and hierarchical structure in streams[C]∥ Proceedings of the Eighth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.ACM,2002:91-101. [10]ZHOU W J.The Criterion Related Validity of Research Frontier Exploration:a Co-words Analysis based on the Natural Language Processing[J].Library and Information,2018,38(1):1-7.(in Chinese) 周文杰.研究前沿探测的效标关联效度研究:基于自然语言处理[J].图书与情报,2018,38(1):1-7. [11]GENG H Y,XIAO X T.The Research Progress and Trends of Cocitation Analysis in Foreign Countries[J].Journal of Information,2006,25(12):68-70.(in Chinese) 耿海英,肖仙桃.国外共引分析研究进展及发展趋势[J].情报杂志,2006,25(12):68-70. [12]SMALL H.A SCI-MAP case study:Building a map of AIDS research[J].Scientometrics,1994,30(1):229-241. [13]SHENG L.Recognize the Fronts and Trends of Biology and Medical Research Domain[D].Beijing:Academy of Military Medical Sciences,2013.(in Chinese) 盛立.生物医学领域研究前沿识别与趋势预测[D].北京:中国人民解放军军事医学科学院,2013. [14]JIANG Y.A Co-Word Analysis of Bibliometric in 1995 ~ 2004[J].Journal of the China Society for Scientific and Technical Information.2006,25(4):504-512.(in Chinese) 蒋颖.1995~ 2004 年文献计量学研究的共词分析[J].情报学报,2006,25(4):504-512. [15]ZHENG Y N,XU X Y,LIU Z H.Study on the Method of Identifying Research Fronts Based on Keywords Co-occurrence[J].Library and Information Service,2016,60(4):85-92.(in Chinese) 郑彦宁,许晓阳,刘志辉.基于关键词共现的研究前沿识别方法研究[J].图书情报工作,2016,60(4):85-92. [16]AN X Y,ZHONG H.The Theoretical Summary of Scienceand Technology Monitoring and the Comparative Analysis of Application System [J].Information Studies:Theory & Application,2010,33(5):124-128.(in Chinese) 安新颖,钟华.科技监测的理论综述与应用系统对比分析[J].情报理论与实践,2010,33(5):124-128. [17]ZHONG H X.Review on Emerging Trend Detection[J].Journal of Modern Information,2017,37(12):28.(in Chinese) 钟辉新.新兴趋势探测研究综述[J].现代情报,2017,37(12):28. [18]FENG J,ZHANG Y Q.Research on the Method of Detecting and Analyzing Scientific Fronts Based on LDA and Ontology[J].Information Studies:Theory & Application,2017,40(8):49-54.(in Chinese) 冯佳,张云秋.基于 LDA 和本体的科学前沿识别与分析方法研究[J].情报理论与实践,2017,40(8):49-54. [19]BAI R J,LENG F H,LIAO J H.A Method of Detecting Research Front Based on Subjects Comparison of Multiple Data Sources[J].Information Studies:Theory & Application,2017,40(8):43-48.(in Chinese) 白如江,冷伏海,廖君华.一种基于多数据源主题对比的科学研究前沿识别方法[J].情报理论与实践,2017,40(8):43-48. [20]ZHOU Q,ZHOU Q J,LENG F H.Research and Demonstration of the Method of Identifying Research Fronts Based on Media of Science and Technology[J].Journal of Modern Information,2018,38(2):62-68.(in Chinese) 周群,周秋菊,冷伏海.基于科技媒体视角的研究前沿识别方法研究与实证[J].现代情报,2018,38(2):62-68. [21]SUN Z.Study on the Integrated Model of Research Front Based on the Multi-Source Data of Scientific Papers[J].Journal of Intelligence,2016,35(8):95-100.(in Chinese) 孙震.基于科学论文多源数据的研究前沿集成识别模型研究[J].情报杂志,2016,35(8):95-100. [22]BRABHAM D C.Crowdsourcing as a model for problem solving:An introduction and cases[J].Convergence,2008,14(1):75-90. [23]KAZAI G.In search of quality in crowdsourcing for search engine evaluation[C]∥European Conference on Information Retrieval.Springer Berlin Heidelberg,2011:165-176. [24]MANNING C,SURDEANU M,BAUER J,et al.The Stanford CoreNLP natural language processing toolkit[C]∥Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations.2014:55-60. |
[1] | 朱敏, 梁朝晖, 姚林, 王翔坤, 曹梦琦. 学术引用信息可视化方法综述 Survey of Visualization Methods on Academic Citation Information 计算机科学, 2022, 49(4): 88-99. https://doi.org/10.11896/jsjkx.210300219 |
[2] | 李嘉明, 赵阔, 屈挺, 刘晓翔. 基于知识图谱的区块链物联网领域研究分析 Research and Analysis of Blockchain Internet of Things Based on Knowledge Graph 计算机科学, 2021, 48(6A): 563-567. https://doi.org/10.11896/jsjkx.200600071 |
|