计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 36-42.doi: 10.11896/jsjkx.210500207
所属专题: 智能数据治理技术与系统
王俊1,2,3, 王修来1,2, 庞威2, 赵鸿飞2
WANG Jun1,2,3, WANG Xiu-lai 1,2, PANG Wei2, ZHAO Hong-fei2
摘要: 从模仿到创新、从跟随到引领,不仅是现阶段我国科学技术发展需要完成的重大转变,更是国家发展的重大战略需求。近年来,国内外相关学者陆续开展了科技发展趋势分析和热点跟踪等方面的研究,但由于缺乏系统的大数据采集与治理体系,其数据分析与挖掘范围往往局限于科技文献这一单一数据样本。文中面向科技发展前瞻预测这一目标,全面分析了影响科学技术发展过程的各类科技文献、学者动态、论坛热点和社交评论等海量异构数据,通过构建数据驱动的大数据治理体系,解决科技大数据在探测发现、精准采集、清洗聚合、融合处理、模型构建、预测计算过程中的数据整治难题。同时,在大数据整治基础上采用LDA模型实现技术趋势预测与分析,研究成果为系统解决海量科技大数据中隐含信息发现和关系推理提供了技术支撑。关键词:大数据;大数据治理;前瞻预测;体系研究;LDA模型;数据清洗
中图分类号:
[1]CHANG Z J,QIAN L,XIE J,et al.Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J].Data Analysis and Knowledge Discovery,2021,5(3):69-77. [2]GRUEBER M,STUDT T.Global R&D funding foreast[J].R&D Magazine,2014,16:1-35. [3]IDEN J,METHLIE L B,CHRISTENSEN G E.The nature of strategic foresight research:A syetematic literature review[J].Technological Forecasting and Social Change,2017,116:87-97. [4]LIU A D,DU X H,WANG N,et al.Blockchain-based access control mechanism for big data[J].Ruan Jian Xue Bao/Journal of Software,2019,30(9):2636-2654. [5]LINSTONE H A,TUROFF M.The Delphi method:Techniques and applications[J].Journal of Marketing Research,1976,18(3):363-364. [6]RAFORD N.Online foresight platforms:Evidence for their impact on scenario planning &strategic forsight[J].Technogical Forecasting and Social Change,2015,97:65-76. [7]TANG Y,SUN H,YAO Q,et al.The selection of key technologies by the silicon photovoltaic industry based on the Delphi method and AHP(analytic hierarchy process):Case study of China[J].Energy,2014,75:474-482. [8]https://www.iarpa.gov/index.php/research-programs/fuse. [9]https://www.iarpa.gov/index.php/research-programs/forest. [10]Research Group on the year 2000.China in the year 2000 [M].Science and Technology Literature Press,1984. [11]ZHOU Y C,LI S Y.The choice of national key technology:a new round of competition for technological advantage[M].Science and Technology Literature Press,1995. [12]MU R P,WANG R X.2004.Development of technology foresight and its application in China[J].Bulletin of Chinese Academy of Sciences,19(4):259-263. [13]LI J Z,WANG H Z,GAO H.State-of-the-Art of research on big data usability[J].Ruan Jian Xue Bao/Journal of Software,2016,27(7):1605-1625. [14]HUANG L S,TIAN M M,HUANG H.Preserving privacy in big data:A survey from the cryptographic perpective[J].Ruan Jian Xue Bao,2015,26(4):945-959. [15]SOARES S.Big data governance:An emerging imperative [M].Beijing:Tsinghua University Press,2014. [16]ZENG W,CHE Y.Research on Information Analysis Technology on Science and Technology Big Data[J].Information Science,2019,37(3):93-96. [17]CHU X,ILYAS I F.Qualitative data cleaning[J].Proceedings of the VLDB Endowment,2016,9(13):1605-1608. [18]WU X D,DONG B B,DU X Z,et al.Data governance technology[J].Ruan Jian Xue Bao,2019,30(9):2830-2856. [19]HAN Y N,LIU J W,LUO X L.A Survey on Probabilistic Topic Model[J].CHINESE JOURNAL OF COMPUTERS,2021,44(6):1095-1139. [20]WANG Y,MA C,WANG W,et al.An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms[J].Journal of Supercomputing,2015,71(5):1736-1753. [21]LI W D.The Research and Implementation of Mining Large Data Based on Spark[D].Jinan:Shandong University,2015. [22]GAO J S,LIU H Q.Research on the Linked Data at Domestic and Abroad Based on Knowledge Mapping[J].Information Science,2018,36(3):117-124. [23]GB/T 13745-2009,Subject classification and code [S].Chinese Academy of Standardization,2009. |
[1] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[2] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[3] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[4] | 孙轩, 王焕骁. 政务大数据安全防护能力建设:基于技术和管理视角的探讨 Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives 计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010 |
[5] | 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳. 面向大数据分析的智能交互向导系统 Smart Interactive Guide System for Big Data Analytics 计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083 |
[6] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[7] | 王雪岑, 张昱, 刘迎婕, 于戈. 基于表示学习的在线学习交互质量评价方法 Evaluation of Quality of Interaction in Online Learning Based on Representation Learning 计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042 |
[8] | 滕建, 滕飞, 李天瑞. 基于3D卷积和LSTM编码解码的出行需求预测 Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder 计算机科学, 2021, 48(12): 195-203. https://doi.org/10.11896/jsjkx.210400022 |
[9] | 张育龙, 王强, 陈明康, 孙静涛. 图像去雨算法在云物联网应用中的研究综述 Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems 计算机科学, 2021, 48(12): 231-242. https://doi.org/10.11896/jsjkx.201000055 |
[10] | 曹萌, 于洋, 梁英, 史红周. 基于区块链的大数据交易关键技术与发展趋势 Key Technologies and Development Trends of Big Data Trade Based on Blockchain 计算机科学, 2021, 48(11A): 184-190. https://doi.org/10.11896/jsjkx.210100163 |
[11] | 刘亚臣, 黄雪莹. 卫星监测时空大数据蠕变特征提取及预警算法 Research on Creep Feature Extraction and Early Warning Algorithm Based on Satellite MonitoringSpatial-Temporal Big Data 计算机科学, 2021, 48(11A): 258-264. https://doi.org/10.11896/jsjkx.201000071 |
[12] | 张光君, 张翔. 应用“大数据+区块链”优化立法评估制度的机理与路径 Mechanism and Path of Optimizing Institution of Legislative Evaluation by Applying “Big Data+Blockchain” 计算机科学, 2021, 48(10): 324-333. https://doi.org/10.11896/jsjkx.201200105 |
[13] | 叶雅珍, 刘国华, 朱扬勇. 数据产品流通的两阶段授权模式 Two-step Authorization Pattern of Data Product Circulation 计算机科学, 2021, 48(1): 119-124. https://doi.org/10.11896/jsjkx.191100217 |
[14] | 赵会群, 吴凯锋. 一种大数据估价算法 Big Data Valuation Algorithm 计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156 |
[15] | 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术 Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data 计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121 |
|