计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 36-42.doi: 10.11896/jsjkx.210500207

• 智能数据治理技术与系统* 上一篇    下一篇

面向科技前瞻预测的大数据治理研究

王俊1,2,3, 王修来1,2, 庞威2, 赵鸿飞2   

  1. 1 南京信息工程大学管理工程学院 南京210044
    2 东部战区总医院 南京210000
    3 南京传媒学院传媒技术学院 南京211172
  • 收稿日期:2021-05-29 修回日期:2021-07-28 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 王修来(20201924002@nuist.edu.cn)
  • 作者简介:intraweb@163.com
  • 基金资助:
    江苏省第十四批“六大人才高峰”创新人才团队项目(TD-RJFW-005)

Research on Big Data Governance for Science and Technology Forecast

WANG Jun1,2,3, WANG Xiu-lai 1,2, PANG Wei2, ZHAO Hong-fei2   

  1. 1 School of Management Science and Engineering,Nanjing University of Information Science & Technology,Nanjing 210044,China
    2 East War District General Hospital,Nanjing 210000,China
    3 College of Media Technology,China Communication University,Nanjing 211172,China
  • Received:2021-05-29 Revised:2021-07-28 Online:2021-09-15 Published:2021-09-10
  • About author:WANG Jun,born in 1979,Ph.D,asso-ciate professor.His main research interests include artificial intelligence,machine learning and big data analysis.
    WANG Xiu-lai,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include big data decision making,artificial intelligence and human resource management.
  • Supported by:
    14th Batch of ‘Six Talent Peaks' Innovative Talent Team Project in Jiangsu Province (TD-RJFW-005).

摘要: 从模仿到创新、从跟随到引领,不仅是现阶段我国科学技术发展需要完成的重大转变,更是国家发展的重大战略需求。近年来,国内外相关学者陆续开展了科技发展趋势分析和热点跟踪等方面的研究,但由于缺乏系统的大数据采集与治理体系,其数据分析与挖掘范围往往局限于科技文献这一单一数据样本。文中面向科技发展前瞻预测这一目标,全面分析了影响科学技术发展过程的各类科技文献、学者动态、论坛热点和社交评论等海量异构数据,通过构建数据驱动的大数据治理体系,解决科技大数据在探测发现、精准采集、清洗聚合、融合处理、模型构建、预测计算过程中的数据整治难题。同时,在大数据整治基础上采用LDA模型实现技术趋势预测与分析,研究成果为系统解决海量科技大数据中隐含信息发现和关系推理提供了技术支撑。关键词:大数据;大数据治理;前瞻预测;体系研究;LDA模型;数据清洗

关键词: 大数据, 大数据治理, 前瞻预测, 体系研究, LDA模型, 数据清洗

Abstract: From imitation to innovation,from following to leading,is not only a major change in the development of science and technology in China at this stage,but also a major strategic demand for national development.In recent years,relevant scholars at home and abroad have carried out the research of science and technology development trend analysis and hot spot tracking,but due to the lack of systematic big data collection and governance system,the scope of data analysis and mining is often limited to the single data sample of science and technology literature.Aiming at the goal of forward-looking prediction of science and technology development,this paper comprehensively analyzes the massive heterogeneous data that affect the development process of science and technology,such as all kinds of scientific and technological literature,scholar dynamics,forum hot spots and social comments.By building a data-driven big data governance system,this paper solves the data remediation problems in the process of detection and discovery,accurate collection,cleaning and aggregation,fusion processing,model construction,prediction and calculation.At the same time,on the basis of big data remediation,LDA model is used to achieve technology trend prediction and ana-lysis.The research results provide technical support for the system to solve the problem of hidden information discovery and relationship reasoning in massive scientific and technological big data.

Key words: Big data, Big data governance, Forward looking forecast, System research, LDA model, Data cleaning

中图分类号: 

  • TP311
[1]CHANG Z J,QIAN L,XIE J,et al.Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J].Data Analysis and Knowledge Discovery,2021,5(3):69-77.
[2]GRUEBER M,STUDT T.Global R&D funding foreast[J].R&D Magazine,2014,16:1-35.
[3]IDEN J,METHLIE L B,CHRISTENSEN G E.The nature of strategic foresight research:A syetematic literature review[J].Technological Forecasting and Social Change,2017,116:87-97.
[4]LIU A D,DU X H,WANG N,et al.Blockchain-based access control mechanism for big data[J].Ruan Jian Xue Bao/Journal of Software,2019,30(9):2636-2654.
[5]LINSTONE H A,TUROFF M.The Delphi method:Techniques and applications[J].Journal of Marketing Research,1976,18(3):363-364.
[6]RAFORD N.Online foresight platforms:Evidence for their impact on scenario planning &strategic forsight[J].Technogical Forecasting and Social Change,2015,97:65-76.
[7]TANG Y,SUN H,YAO Q,et al.The selection of key technologies by the silicon photovoltaic industry based on the Delphi method and AHP(analytic hierarchy process):Case study of China[J].Energy,2014,75:474-482.
[8]https://www.iarpa.gov/index.php/research-programs/fuse.
[9]https://www.iarpa.gov/index.php/research-programs/forest.
[10]Research Group on the year 2000.China in the year 2000 [M].Science and Technology Literature Press,1984.
[11]ZHOU Y C,LI S Y.The choice of national key technology:a new round of competition for technological advantage[M].Science and Technology Literature Press,1995.
[12]MU R P,WANG R X.2004.Development of technology foresight and its application in China[J].Bulletin of Chinese Academy of Sciences,19(4):259-263.
[13]LI J Z,WANG H Z,GAO H.State-of-the-Art of research on big data usability[J].Ruan Jian Xue Bao/Journal of Software,2016,27(7):1605-1625.
[14]HUANG L S,TIAN M M,HUANG H.Preserving privacy in big data:A survey from the cryptographic perpective[J].Ruan Jian Xue Bao,2015,26(4):945-959.
[15]SOARES S.Big data governance:An emerging imperative [M].Beijing:Tsinghua University Press,2014.
[16]ZENG W,CHE Y.Research on Information Analysis Technology on Science and Technology Big Data[J].Information Science,2019,37(3):93-96.
[17]CHU X,ILYAS I F.Qualitative data cleaning[J].Proceedings of the VLDB Endowment,2016,9(13):1605-1608.
[18]WU X D,DONG B B,DU X Z,et al.Data governance technology[J].Ruan Jian Xue Bao,2019,30(9):2830-2856.
[19]HAN Y N,LIU J W,LUO X L.A Survey on Probabilistic Topic Model[J].CHINESE JOURNAL OF COMPUTERS,2021,44(6):1095-1139.
[20]WANG Y,MA C,WANG W,et al.An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms[J].Journal of Supercomputing,2015,71(5):1736-1753.
[21]LI W D.The Research and Implementation of Mining Large Data Based on Spark[D].Jinan:Shandong University,2015.
[22]GAO J S,LIU H Q.Research on the Linked Data at Domestic and Abroad Based on Knowledge Mapping[J].Information Science,2018,36(3):117-124.
[23]GB/T 13745-2009,Subject classification and code [S].Chinese Academy of Standardization,2009.
[1] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳. 面向大数据分析的智能交互向导系统[J]. 计算机科学, 2021, 48(9): 110-117.
[2] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究[J]. 计算机科学, 2021, 48(8): 80-85.
[3] 王雪岑, 张昱, 刘迎婕, 于戈. 基于表示学习的在线学习交互质量评价方法[J]. 计算机科学, 2021, 48(2): 207-211.
[4] 叶雅珍, 刘国华, 朱扬勇. 数据产品流通的两阶段授权模式[J]. 计算机科学, 2021, 48(1): 119-124.
[5] 赵会群, 吴凯锋. 一种大数据估价算法[J]. 计算机科学, 2020, 47(9): 110-116.
[6] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术[J]. 计算机科学, 2020, 47(9): 117-122.
[7] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲. FS-CRF:基于特征切分与级联随机森林的异常点检测模型[J]. 计算机科学, 2020, 47(8): 185-188.
[8] 朝乐门. 数据科学导论的课程设计及教学改革[J]. 计算机科学, 2020, 47(7): 1-7.
[9] 顾荣杰, 吴治平, 石焕. 基于TFR 模型的公安云平台数据分级分类安全访问控制模型研究[J]. 计算机科学, 2020, 47(6A): 400-403.
[10] 李泳. 基于BigQuant 大数据平台的股票投资策略开发[J]. 计算机科学, 2020, 47(6A): 612-615.
[11] 葛雨明, 韩庆文, 王妙琼, 曾令秋, 李璐. 汽车大数据应用模式与挑战分析[J]. 计算机科学, 2020, 47(6): 59-65.
[12] 刘纪芹, 史开泉. 大数据分解-融合及其智能获取[J]. 计算机科学, 2020, 47(6): 66-73.
[13] 曾伟良, 吴淼森, 孙为军, 谢胜利. 自动驾驶出租车调度系统研究综述[J]. 计算机科学, 2020, 47(5): 181-189.
[14] 周凯, 任怡, 汪哲, 管剑波, 张芳, 赵言亢. 基于主题模型的Ubuntu操作系统缺陷报告的分类及分析[J]. 计算机科学, 2020, 47(12): 35-41.
[15] 禹鑫燚, 施甜峰, 唐权瑞, 殷慧武, 欧林林. 面向预测性维护的工业设备管理系统[J]. 计算机科学, 2020, 47(11A): 667-672.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[2] 黄冬梅, 杜艳玲, 贺琪, 随宏运, 李瑶. 基于多属性最优化的海洋监测数据副本布局策略[J]. 计算机科学, 2018, 45(6): 72 -75 .
[3] 王倩, 于来行, 曹彦, 张磊, 秦杰, 叶海琴. 基于Fibonacci置乱的小波域数字图像盲水印算法[J]. 计算机科学, 2018, 45(6): 135 -140 .
[4] 陈福才, 李思豪, 张建朋, 黄瑞阳. 基于标签关系改进的多标签特征选择算法[J]. 计算机科学, 2018, 45(6): 228 -234 .
[5] 李小薪, 周元申, 周旋, 李晶晶, 刘志勇. 基于奇异值分解的Gabor遮挡字典学习[J]. 计算机科学, 2018, 45(6): 275 -283 .
[6] 张婧,周安民,刘亮,贾鹏,刘露平. Crash可利用性分析方法研究综述[J]. 计算机科学, 2018, 45(5): 5 -14 .
[7] 王国豪,李庆华,刘安丰. 多目标最优化云工作流调度进化遗传算法[J]. 计算机科学, 2018, 45(5): 31 -37 .
[8] 薛善良,杨佩茹,周奚. 基于模糊神经网络的WSN无线数据收发单元故障诊断[J]. 计算机科学, 2018, 45(5): 38 -43 .
[9] 李童悦,马文平. WSN中基于非线性自适应PSO的分簇策略[J]. 计算机科学, 2018, 45(5): 44 -48 .
[10] 庄陵,尹耀虎. 认知异构网络中基于不完全频谱感知的资源分配算法[J]. 计算机科学, 2018, 45(5): 49 -53 .