计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 189-194.doi: 10.11896/j.issn.1002-137X.2016.12.034

• 数据挖掘 • 上一篇    下一篇

大数据环境下的多源数据演化更新研究

余放,陈盛双,李石君,余伟   

  1. 武汉理工大学理学院 武汉430070,武汉理工大学理学院 武汉430070,武汉大学计算机学院 武汉430072,武汉大学计算机学院 武汉430072
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(61502350),湖北省自然科学基金项目(2014CFB289)资助

Research on Evolution and Updating among Multi-source Data Based on Big Data

YU Fang, CHEN Sheng-shuang, LI Shi-jun and YU Wei   

  • Online:2018-12-01 Published:2018-12-01

摘要: 大数据环境下的多源数据呈现出数据量大、数据种类多、数据变化快的特点,这些特点对数据更新提出了新的挑战。通过分析大数据下多源数据的特点,定义了演化数据的概念,基于此建立了大数据的动态变频遍历更新模型。首先通过抽象数据的演化方式,建立了演化数据的势与稳定性概念,从而推导出更一般的代数意义上的演化运算工具;其次通过将运算工具导入大数据数据更新的实际应用中,推导出基于概率的变频遍历与动态权值模型;最后通过实验验证了在大数据环境下动态变频遍历模型(Dynamic Frequency Conversion Traversal,DFCT) 对多源数据具有较高的更新效率。

关键词: 大数据,演化数据,DFCT模型,数据更新

Abstract: Multi-source data based on big data presents the characteristics of a large amount of data,a great variety of data and data changing quickly.These characteristics put forward a new challenge to data updating.The concept of evolutionary data was defined by the analysis of the characteristics among multi- source data based on the big data.Based on this,a dynamic frequency conversion traversal data updating model was created.Firstly, abstracting the data evolutiona-ry way and establishing the concept of evolutionary potential and stability of data,a more general evolutionary computing tools in algebra sense was derived.Secondly,frequency conversion traversal and dynamic weighting model based on probability was deduced by deriving a more general evolutionary computing tools in algebra sense.Finally,by importing tools into the practical application of data updating,dynamic frequency traversal model of multi- source data is verified by experiment with high updated efficiency on big data.

Key words: Big data,Evolutionary data,DFCT model,Data updating

[1] Chen Shi-min.Big Data Analysis and Data Velocity[J].Journal of Computer Research and Development,2015,52(2):3333-3342(in Chinese) 陈世敏.大数据分析与高速数据更新[J].计算机研究与发展,2015,52(2):3333-3342
[2] Li Jian-zhong,Liu Xian-min.An Important Aspect of Big Data:Data Usability[J].Journal of Computer Research and Development,2013,50(6):1147-1162(in Chinese) 李建中,刘显敏.大数据的一个重要方面:数据可用性[J].计算机研究与发展,2013,50(6):1147-1162
[3] Tian J,Guo H,Hu H,et al.OFDM Signal Sensing over Doubly-Selective Fading Channels[C]∥2010 IEEE Global Telecommunications Conference (GLOBECOM 2010).IEEE,2010:1-5
[4] Cheng Xue-qi,Jin Xiao-long,Wang Yuan-zhuo,et al.Survey on Big Data System and Analytic Technology[J].Journal of Software,2014(9):1889-1908(in Chinese) 程学旗,靳小龙,王元卓,等.大数据系统和分析技术综述[J].软件学报,2014(9):1889-1908
[5] Meng Xiao-feng,Li Yong,Zhu Jian-hua,et al.Social Computing in the Era of Big Data:Opportunities and Challenges[J].Journal of Computer Research and Development,2013,50(12):2483-2491(in Chinese) 孟小峰,李勇,祝建华,等.社会计算:大数据时代的机遇与挑战[J].计算机研究与发展,2013,50(12):2483-2491
[6] Shi Jin-gang,Bao Yu-bin,Leng Fang-ling,et al.Study on Log-Based Change Data Capture and Handling Mechanism in Real-Time Data Warehouse[C]∥Proceedings of 2008 International Conference on Computer Sience and Software Engineering.Wuhan,2008:478-481
[7] Li Shi-jun,Yu Jun-qing,Ou Wei-jie.Web Information Extraction Based on HTML Pattern Algebra[J].Journal of Computer Research and Development,2006,43(9):1644-1650(in Chinese) 李石君,于俊清,欧伟杰.基于HTML模式代数的Web信息提取方法[J].计算机研究与发展,2006,43(9):1644-1650
[8] Building the Data Warehouse[M].New York:John Wiley &Sons,1996
[9] Korn F,Muthukrishnan S,Zhu Y.Checks and balances:Monitoring data quality problems in network traffic databases[C]∥Proc of the 29th IntConf on Very Large Databases.San Francisco,USA,2003:536-547
[10] Xu K S,Kliger M,Hero A O I.Evolutionary spectral clustering with adaptive forgetting factor[C]∥International Conference on Acoustics,Speech,and Signal Processing,1988(ICASSP-88).2010:2174-2177
[11] Wang Y,Liu S X,Feng J,et al.Mining Naturally Smooth Evolution of Clusters from Dynamic Data[C]∥ Proc.of SIAM Conf.on Data Mining.2007:125-134
[12] Li J,Li S.Evolutionary Hierarchical Dirichlet Process for Timeline Summarization[C]∥Meeting of the Association for Computational Linguistic.2013:556-560
[13] Kim H D,Lee D H,Choe H,et al.The evolution of cluster network structure and firm growth:a study of industrial software clusters[J].Scientometrics,2014,99(1):77-95
[14] Hedeler C,Belhajjame K,Fernandes A A A,et al.Dimensions of Dataspaces[M]∥Dataspace:The Final Frontier.Springer Berlin Heidelberg,2009:55-66
[15] Ci Xiang,Ma You-zhong,Meng Xiao-feng,et al.Method forTop-K Query on Big Data in Cloud[J].Journal of Software,2014,25(4):813-825(in Chinese) 慈祥,马友忠,孟小峰,等.一种云环境下的大数据Top-K查询方法[J].软件学报,2014,25(4):813-825
[16] Peng Yuan-hao,PAN Jiu-hui.Study on Incremental Data Capturing Method Based on Log Analysis[J].Computer Engineering,2015,6(6):56-60(in Chinese) 彭远浩,潘久辉.基于日志分析的增量数据捕获方法研究[J].计算机工程,2015,6(6):56-60

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!