计算机科学 ›› 2013, Vol. 40 ›› Issue (6): 192-195.

• 软件与数据库技术 • 上一篇    下一篇

基于Tree-lib的大数据实时分析研究

沈来信,王伟   

  1. 同济大学嵌入式系统与服务计算教育部重点实验室 上海200092;黄山学院信息工程学院 黄山245041
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然基金项目(61103068,61174158),安徽省优秀青年人才基金项目(2012SQRL183)资助

Real Time Analytics Study of Big Data Based on Tree-lib

SHEN Lai-xin and WANG Wei   

  • Online:2018-11-16 Published:2018-11-16

摘要: 为提高大数据的存储和并行处理能力,建立了以列存储Infobright与分布式MySQL Cluster为核心的大数据实时并发分析、管理模式,以完成对开源brighthouse引擎的二次开发。利用管理程序Tree-lib对分布式大数据进行可视化监控、维护和管理。实验结果表明,Infobright和Cluster组合具备对大数据的高压缩存储、多并发查询和高效实时分析的能力,Tree-lib完成对树和库的生成、检测、更新、备份和灾难恢复等,实现可视化双向管理和维护的目的。

关键词: 大数据,Infobright,MySQL Cluster,brighthouse引擎,Tree-lib,灾难恢复

Abstract: In order to improve the storage and parallel processing capabilities of big data,the big data of real time concurrent analytics and management mode were built with the center of column stored Infobright and distributed MySQL Cluster,to complete to secondary develop of the open source brighthouse engine.The managing procedure Tree-lib was used to visual monitor,maintain and manage the distributed big data.The results of experiments show the combination of Infobright and Cluster have the capability of high compression storage and multiple concurrent inquiries and the efficient real time analysis with big data.The Tree-lib accomplishes generation,detection,update,backup and disaster recover of tree and library.Finally,the purpose of visual bidirectional management and maintenance is achieved.

Key words: Big data,Infobright,MySQL Cluster,Brighthouse engine,Tree-lib,Disaster recover

[1] Slezak D.Brighthouse:An Analytic Data warehouse for Ad-Queries[C]∥PVLDB ’08.August 2008:1337-1344
[2] Russom P.Big Data Analytics[R].Tdwi Best Practices Report.Fourth Quarter,2011:15-21
[3] Bryant R E,Katz R H.Big-Data Computing:Creating revolu-tionary breakthroughs in commerce,science,and society(Version 8)[M].Computing Community Consortium.2008:1-7
[4] Herodotou H,Lim H,Luo Gang.Starfish:A Self-tuning System for Big Data Analytics[C]∥5th Biennial Conference on Innovative Data Systems Research (CIDR’11).Asilomar,California,USA,2011:261-272
[5] Bernardino J,Madeira H.Data Warehousing and OLAP:Improving Query Performance Using Distributed Computing[C]∥12th Conference on Advanced Information Systems Enginee-ring.June 2000:1-12
[6] Allcock B,Chervenak A.Data Grid tools:enabling science on big distributed data[C]∥Journal of Physics:Conference Series.2005:1-5
[7] Costa P,Donnelly A.Camdoop:Exploiting In-network Aggregation for Big Data Application[C]∥9th USENIX Symposium on Networked Systems Design and Implementation.April 2012:1-14
[8] 王珊,王会举,覃雄派,等.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752
[9] 张延松,焦敏,王占伟,等.海量数据分析的One-size-fits-all OLAP技术[J].计算机学报,2011,34(10):1936-1946
[10] 琳琳,信俊昌,王国仁,等.基于Map-Reduce的海量数据高效Skyline查询处理[J].计算机学报,2011,34(10):1875-1796
[11] 覃雄派,王会举,杜小勇,等.大数据分析—RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45
[12] 吴广君,王树鹏,陈明,等.海量结构化数据存储检索系统[J].计算机研究与发展,2012,49(Suppl.):1-5
[13] 曾志勇,杨辉,余建坤.基于HMT和哈希树的Apriori并行算法研究[J].计算机工程与设计,2012,33(1):214-248
[14] 关晓蔷,钱宇华.基于不完备信息系统的决策树生成算法[J].计算机科学,2012,30(1):156-158
[15] 王柯柯,崔贯勋,倪伟,等.基于单元的快速的大数据集离群数据挖掘算法[J].重庆邮电大学学报:自然科学版,2010,2(5):673-677

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!