Computer Science ›› 2019, Vol. 46 ›› Issue (3): 275-282.doi: 10.11896/j.issn.1002-137X.2019.03.041

• Artificial Intelligence • Previous Articles     Next Articles

Frontier Scientific Keyword Extraction Based on Bibliometric and Crowdsourcing

LV Jia-gao1,LIANG Kui-yang2,CAI Wei3   

  1. (State Key Laboratory of Software Development Environment,Beihang University,Beijing 100191,China)1
    (Beijing Guoke Zhiyuan Technology Co.,Ltd.,Beijing 100191,China)2
    (Beijing Sci-Tech Information Center,Beijing 100085,China)3
  • Received:2018-02-03 Revised:2018-05-28 Online:2019-03-15 Published:2019-03-22

Abstract: With the rapid development of science,the annual amount of scientific papers is growing,and new challenge is to extract the frontier scientific keywords from lots of papers.In traditional way,the extraction work is done by experts,which is inefficient and costs much.A new algorithm based on bibliometric analysis and crowdsourcing technique was proposed in this paper.Part-of-speech tagging is used to obtain the nouns from scientific papers,and potentialscie-ntific keywords are selected from these nouns by bibliometric analysis.The last procedure is using data from crowdsourcing platform to check potential scientific keywords and get results.English scientific papers in computer scie-nce and biomedicine are used to conduct experiments.The experiment results suggest that the proposed algorithm has effect on extraction,and it’s more efficient than expert extraction procedure,so it can assist the expert to analysis frontier scientific keywords.In conclusion,this algorithm can do automatic extraction and show possibility of more automatic and intelligent extraction procedure in the future.

Key words: Bibliometric, Crowdsourcing, Keyword extraction

CLC Number: 

  • TP391.1
[1]JINHA A E.Article 50 million:an estimate of the number of
scholarly articles in existence[J].Learned Publishing,2010,23(3):258-263.
[2]WARE M,MABE M.The STM report:An overview of scienti-
fic and scholarly journal publishing[R].Nebraska:Digital Commons at University of Nebraska-Lincoln,2015.
[3]SCOTT J.Social Networks:Critical Concepts in Sociology
(Vol.4).London:Routledge,2002:328-331.
[4]BRAAM R R,MOED H F,VAN RAAN A F J.Mapping of scien-
ce by combined co-citation and word analysis I.Structural aspects[J].Journal of the American Society for information Scien-ce,1991,42(4):233.
[5]SMALL H,GRIFFITH B C.The structure of scientific litera-
tures I:Identifying and graphing specialties[J].Science studies,1974,4(1):17-40.
[6]PERSSON O.The intellectual base and research fronts of “ja-
sis” 1986-1990[J].Journal of the American Society for Information Science,1994,45(1):31.
[7]CHEN C.CiteSpace II:Detecting and visualizing emerging
trends and transient patterns in scientific literature[J].Journal of the American Society for information Science and Technology,2006,57(3):359-377.
[8]ZHU L,ZHAO R X,KOU Y T,et al.Study on Integrated Mode of Science and Technology Monitoring Base on Literature[J].Digital Library Forum,2015(10):53-57.(in Chinese)
朱亮,赵瑞雪,寇远涛,等.一种基于文献的综合科技监测模式研究[J].数字图书馆论坛,2015(10):53-57.
[9]KLEINBERG J.Bursty and hierarchical structure in streams[C]∥
Proceedings of the Eighth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.ACM,2002:91-101.
[10]ZHOU W J.The Criterion Related Validity of Research Frontier Exploration:a Co-words Analysis based on the Natural Language Processing[J].Library and Information,2018,38(1):1-7.(in Chinese)
周文杰.研究前沿探测的效标关联效度研究:基于自然语言处理[J].图书与情报,2018,38(1):1-7.
[11]GENG H Y,XIAO X T.The Research Progress and Trends of Cocitation Analysis in Foreign Countries[J].Journal of Information,2006,25(12):68-70.(in Chinese)
耿海英,肖仙桃.国外共引分析研究进展及发展趋势[J].情报杂志,2006,25(12):68-70.
[12]SMALL H.A SCI-MAP case study:Building a map of AIDS research[J].Scientometrics,1994,30(1):229-241.
[13]SHENG L.Recognize the Fronts and Trends of Biology and
Medical Research Domain[D].Beijing:Academy of Military Medical Sciences,2013.(in Chinese)
盛立.生物医学领域研究前沿识别与趋势预测[D].北京:中国人民解放军军事医学科学院,2013.
[14]JIANG Y.A Co-Word Analysis of Bibliometric in 1995 ~ 2004[J].Journal of the China Society for Scientific and Technical Information.2006,25(4):504-512.(in Chinese)
蒋颖.1995~ 2004 年文献计量学研究的共词分析[J].情报学报,2006,25(4):504-512.
[15]ZHENG Y N,XU X Y,LIU Z H.Study on the Method of Identifying Research Fronts Based on Keywords Co-occurrence[J].Library and Information Service,2016,60(4):85-92.(in Chinese)
郑彦宁,许晓阳,刘志辉.基于关键词共现的研究前沿识别方法研究[J].图书情报工作,2016,60(4):85-92.
[16]AN X Y,ZHONG H.The Theoretical Summary of Scienceand Technology Monitoring and the Comparative Analysis of Application System [J].Information Studies:Theory & Application,2010,33(5):124-128.(in Chinese)
安新颖,钟华.科技监测的理论综述与应用系统对比分析[J].情报理论与实践,2010,33(5):124-128.
[17]ZHONG H X.Review on Emerging Trend Detection[J].Journal of Modern Information,2017,37(12):28.(in Chinese)
钟辉新.新兴趋势探测研究综述[J].现代情报,2017,37(12):28.
[18]FENG J,ZHANG Y Q.Research on the Method of Detecting and Analyzing Scientific Fronts Based on LDA and Ontology[J].Information Studies:Theory & Application,2017,40(8):49-54.(in Chinese)
冯佳,张云秋.基于 LDA 和本体的科学前沿识别与分析方法研究[J].情报理论与实践,2017,40(8):49-54.
[19]BAI R J,LENG F H,LIAO J H.A Method of Detecting Research Front Based on Subjects Comparison of Multiple Data Sources[J].Information Studies:Theory & Application,2017,40(8):43-48.(in Chinese)
白如江,冷伏海,廖君华.一种基于多数据源主题对比的科学研究前沿识别方法[J].情报理论与实践,2017,40(8):43-48.
[20]ZHOU Q,ZHOU Q J,LENG F H.Research and Demonstration of the Method of Identifying Research Fronts Based on Media of Science and Technology[J].Journal of Modern Information,2018,38(2):62-68.(in Chinese)
周群,周秋菊,冷伏海.基于科技媒体视角的研究前沿识别方法研究与实证[J].现代情报,2018,38(2):62-68.
[21]SUN Z.Study on the Integrated Model of Research Front Based on the Multi-Source Data of Scientific Papers[J].Journal of Intelligence,2016,35(8):95-100.(in Chinese)
孙震.基于科学论文多源数据的研究前沿集成识别模型研究[J].情报杂志,2016,35(8):95-100.
[22]BRABHAM D C.Crowdsourcing as a model for problem solving:An introduction and cases[J].Convergence,2008,14(1):75-90.
[23]KAZAI G.In search of quality in crowdsourcing for search engine evaluation[C]∥European Conference on Information Retrieval.Springer Berlin Heidelberg,2011:165-176.
[24]MANNING C,SURDEANU M,BAUER J,et al.The Stanford CoreNLP natural language processing toolkit[C]∥Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations.2014:55-60.
[1] FU Yan-ming, ZHU Jie-fu, JIANG Kan, HUANG Bao-hua, MENG Qing-wen, ZHOU Xing. Incentive Mechanism Based on Multi-constrained Worker Selection in Mobile Crowdsourcing [J]. Computer Science, 2022, 49(9): 275-282.
[2] CHEN Dan-hong, PENG Zhang-lin, WAN De-quan, YANG Shan-lin. Identification and Segmentation of User Value in Crowdsourcing Platforms:An Improved RFMModel [J]. Computer Science, 2022, 49(4): 37-42.
[3] SHEN Biao, SHEN Li-wei, LI Yi. Dynamic Task Scheduling Method for Space Crowdsourcing [J]. Computer Science, 2022, 49(2): 231-240.
[4] LI Jia-ming, ZHAO Kuo, QU Ting, LIU Xiao-xiang. Research and Analysis of Blockchain Internet of Things Based on Knowledge Graph [J]. Computer Science, 2021, 48(6A): 563-567.
[5] ZHANG Shao-jie, LU Xu-dong, GUO Wei, WANG Shi-peng, HE Wei. Prevention of Dishonest Behavior in Supply-Demand Matching [J]. Computer Science, 2021, 48(4): 303-308.
[6] ZHAO Yang, NI Zhi-wei, ZHU Xu-hui, LIU Hao, RAN Jia-min. Multi-worker and Multi-task Path Planning Based on Improved Lion Evolutionary Algorithm forSpatial Crowdsourcing Platform [J]. Computer Science, 2021, 48(11A): 30-38.
[7] LI Yu, DUAN Hong-yue, YIN Yu-yu, GAO Hong-hao. Survey of Crowdsourcing Applications in Blockchain Systems [J]. Computer Science, 2021, 48(11): 12-27.
[8] CHEN Qing-chao, WANG Tao, YIN Shi-zhuang, FENG Wen-bo. Chain Merging Method for Unknown Text Protocol Candidate Keyword Stored in Multi-levelDictionary [J]. Computer Science, 2020, 47(12): 332-335.
[9] YU Dun-hui, CHENG Tao, YUAN Xu. Software Crowdsourcing Task Recommendation Algorithm Based on Learning to Rank [J]. Computer Science, 2020, 47(12): 106-113.
[10] DUAN Jian-yong, YOU Shi-xin, ZHANG Mei, WANG Hao. Keyword Extraction Based on Multi-feature Fusion [J]. Computer Science, 2020, 47(11A): 73-77.
[11] WANG Kuo, WANG Zhong-jie. Crowdsourcing Collaboration Process Recovery Method [J]. Computer Science, 2020, 47(10): 19-25.
[12] ZHANG Guang-yuan, WANG Ning. Truth Inference Based on Confidence Interval of Small Samples in Crowdsourcing [J]. Computer Science, 2020, 47(10): 26-31.
[13] HU Ying, WANG Ying-jie, TONG Xiang-rong. Task Recommendation Model Based on Crowd Worker’s Movement Trajectory [J]. Computer Science, 2020, 47(10): 32-40.
[14] XU Li. Text Keyword Extraction Method Based on Weighted TextRank [J]. Computer Science, 2019, 46(6A): 142-145.
[15] HOU Yu-chen, WU Wei. Design and Implementation of Crowdsourcing System for Still Image Activity Annotation [J]. Computer Science, 2019, 46(11A): 580-583.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!