计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 36-42.doi: 10.11896/jsjkx.210500207

所属专题: 智能数据治理技术与系统

• 智能数据治理技术与系统* 上一篇    下一篇

面向科技前瞻预测的大数据治理研究

王俊1,2,3, 王修来1,2, 庞威2, 赵鸿飞2   

  1. 1 南京信息工程大学管理工程学院 南京210044
    2 东部战区总医院 南京210000
    3 南京传媒学院传媒技术学院 南京211172
  • 收稿日期:2021-05-29 修回日期:2021-07-28 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 王修来(20201924002@nuist.edu.cn)
  • 作者简介:intraweb@163.com
  • 基金资助:
    江苏省第十四批“六大人才高峰”创新人才团队项目(TD-RJFW-005)

Research on Big Data Governance for Science and Technology Forecast

WANG Jun1,2,3, WANG Xiu-lai 1,2, PANG Wei2, ZHAO Hong-fei2   

  1. 1 School of Management Science and Engineering,Nanjing University of Information Science & Technology,Nanjing 210044,China
    2 East War District General Hospital,Nanjing 210000,China
    3 College of Media Technology,China Communication University,Nanjing 211172,China
  • Received:2021-05-29 Revised:2021-07-28 Online:2021-09-15 Published:2021-09-10
  • About author:WANG Jun,born in 1979,Ph.D,asso-ciate professor.His main research interests include artificial intelligence,machine learning and big data analysis.
    WANG Xiu-lai,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include big data decision making,artificial intelligence and human resource management.
  • Supported by:
    14th Batch of ‘Six Talent Peaks' Innovative Talent Team Project in Jiangsu Province (TD-RJFW-005).

摘要: 从模仿到创新、从跟随到引领,不仅是现阶段我国科学技术发展需要完成的重大转变,更是国家发展的重大战略需求。近年来,国内外相关学者陆续开展了科技发展趋势分析和热点跟踪等方面的研究,但由于缺乏系统的大数据采集与治理体系,其数据分析与挖掘范围往往局限于科技文献这一单一数据样本。文中面向科技发展前瞻预测这一目标,全面分析了影响科学技术发展过程的各类科技文献、学者动态、论坛热点和社交评论等海量异构数据,通过构建数据驱动的大数据治理体系,解决科技大数据在探测发现、精准采集、清洗聚合、融合处理、模型构建、预测计算过程中的数据整治难题。同时,在大数据整治基础上采用LDA模型实现技术趋势预测与分析,研究成果为系统解决海量科技大数据中隐含信息发现和关系推理提供了技术支撑。关键词:大数据;大数据治理;前瞻预测;体系研究;LDA模型;数据清洗

关键词: LDA模型, 大数据, 大数据治理, 前瞻预测, 数据清洗, 体系研究

Abstract: From imitation to innovation,from following to leading,is not only a major change in the development of science and technology in China at this stage,but also a major strategic demand for national development.In recent years,relevant scholars at home and abroad have carried out the research of science and technology development trend analysis and hot spot tracking,but due to the lack of systematic big data collection and governance system,the scope of data analysis and mining is often limited to the single data sample of science and technology literature.Aiming at the goal of forward-looking prediction of science and technology development,this paper comprehensively analyzes the massive heterogeneous data that affect the development process of science and technology,such as all kinds of scientific and technological literature,scholar dynamics,forum hot spots and social comments.By building a data-driven big data governance system,this paper solves the data remediation problems in the process of detection and discovery,accurate collection,cleaning and aggregation,fusion processing,model construction,prediction and calculation.At the same time,on the basis of big data remediation,LDA model is used to achieve technology trend prediction and ana-lysis.The research results provide technical support for the system to solve the problem of hidden information discovery and relationship reasoning in massive scientific and technological big data.

Key words: Big data, Big data governance, Data cleaning, Forward looking forecast, LDA model, System research

中图分类号: 

  • TP311
[1]CHANG Z J,QIAN L,XIE J,et al.Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J].Data Analysis and Knowledge Discovery,2021,5(3):69-77.
[2]GRUEBER M,STUDT T.Global R&D funding foreast[J].R&D Magazine,2014,16:1-35.
[3]IDEN J,METHLIE L B,CHRISTENSEN G E.The nature of strategic foresight research:A syetematic literature review[J].Technological Forecasting and Social Change,2017,116:87-97.
[4]LIU A D,DU X H,WANG N,et al.Blockchain-based access control mechanism for big data[J].Ruan Jian Xue Bao/Journal of Software,2019,30(9):2636-2654.
[5]LINSTONE H A,TUROFF M.The Delphi method:Techniques and applications[J].Journal of Marketing Research,1976,18(3):363-364.
[6]RAFORD N.Online foresight platforms:Evidence for their impact on scenario planning &strategic forsight[J].Technogical Forecasting and Social Change,2015,97:65-76.
[7]TANG Y,SUN H,YAO Q,et al.The selection of key technologies by the silicon photovoltaic industry based on the Delphi method and AHP(analytic hierarchy process):Case study of China[J].Energy,2014,75:474-482.
[8]https://www.iarpa.gov/index.php/research-programs/fuse.
[9]https://www.iarpa.gov/index.php/research-programs/forest.
[10]Research Group on the year 2000.China in the year 2000 [M].Science and Technology Literature Press,1984.
[11]ZHOU Y C,LI S Y.The choice of national key technology:a new round of competition for technological advantage[M].Science and Technology Literature Press,1995.
[12]MU R P,WANG R X.2004.Development of technology foresight and its application in China[J].Bulletin of Chinese Academy of Sciences,19(4):259-263.
[13]LI J Z,WANG H Z,GAO H.State-of-the-Art of research on big data usability[J].Ruan Jian Xue Bao/Journal of Software,2016,27(7):1605-1625.
[14]HUANG L S,TIAN M M,HUANG H.Preserving privacy in big data:A survey from the cryptographic perpective[J].Ruan Jian Xue Bao,2015,26(4):945-959.
[15]SOARES S.Big data governance:An emerging imperative [M].Beijing:Tsinghua University Press,2014.
[16]ZENG W,CHE Y.Research on Information Analysis Technology on Science and Technology Big Data[J].Information Science,2019,37(3):93-96.
[17]CHU X,ILYAS I F.Qualitative data cleaning[J].Proceedings of the VLDB Endowment,2016,9(13):1605-1608.
[18]WU X D,DONG B B,DU X Z,et al.Data governance technology[J].Ruan Jian Xue Bao,2019,30(9):2830-2856.
[19]HAN Y N,LIU J W,LUO X L.A Survey on Probabilistic Topic Model[J].CHINESE JOURNAL OF COMPUTERS,2021,44(6):1095-1139.
[20]WANG Y,MA C,WANG W,et al.An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms[J].Journal of Supercomputing,2015,71(5):1736-1753.
[21]LI W D.The Research and Implementation of Mining Large Data Based on Spark[D].Jinan:Shandong University,2015.
[22]GAO J S,LIU H Q.Research on the Linked Data at Domestic and Abroad Based on Knowledge Mapping[J].Information Science,2018,36(3):117-124.
[23]GB/T 13745-2009,Subject classification and code [S].Chinese Academy of Standardization,2009.
[1] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[2] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[3] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[4] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[5] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳.
面向大数据分析的智能交互向导系统
Smart Interactive Guide System for Big Data Analytics
计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083
[6] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓.
基于深度学习的民事案件判决结果分类方法研究
Study on Judicial Data Classification Method Based on Natural Language Processing Technologies
计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130
[7] 王雪岑, 张昱, 刘迎婕, 于戈.
基于表示学习的在线学习交互质量评价方法
Evaluation of Quality of Interaction in Online Learning Based on Representation Learning
计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042
[8] 滕建, 滕飞, 李天瑞.
基于3D卷积和LSTM编码解码的出行需求预测
Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder
计算机科学, 2021, 48(12): 195-203. https://doi.org/10.11896/jsjkx.210400022
[9] 张育龙, 王强, 陈明康, 孙静涛.
图像去雨算法在云物联网应用中的研究综述
Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems
计算机科学, 2021, 48(12): 231-242. https://doi.org/10.11896/jsjkx.201000055
[10] 曹萌, 于洋, 梁英, 史红周.
基于区块链的大数据交易关键技术与发展趋势
Key Technologies and Development Trends of Big Data Trade Based on Blockchain
计算机科学, 2021, 48(11A): 184-190. https://doi.org/10.11896/jsjkx.210100163
[11] 刘亚臣, 黄雪莹.
卫星监测时空大数据蠕变特征提取及预警算法
Research on Creep Feature Extraction and Early Warning Algorithm Based on Satellite MonitoringSpatial-Temporal Big Data
计算机科学, 2021, 48(11A): 258-264. https://doi.org/10.11896/jsjkx.201000071
[12] 张光君, 张翔.
应用“大数据+区块链”优化立法评估制度的机理与路径
Mechanism and Path of Optimizing Institution of Legislative Evaluation by Applying “Big Data+Blockchain”
计算机科学, 2021, 48(10): 324-333. https://doi.org/10.11896/jsjkx.201200105
[13] 叶雅珍, 刘国华, 朱扬勇.
数据产品流通的两阶段授权模式
Two-step Authorization Pattern of Data Product Circulation
计算机科学, 2021, 48(1): 119-124. https://doi.org/10.11896/jsjkx.191100217
[14] 赵会群, 吴凯锋.
一种大数据估价算法
Big Data Valuation Algorithm
计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156
[15] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁.
显示导向型的大规模地理矢量实时可视化技术
Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data
计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!