Computer Science ›› 2015, Vol. 42 ›› Issue (11): 48-52.doi: 10.11896/j.issn.1002-137X.2015.11.008

Previous Articles     Next Articles

Analysis of Architecture Characteristics of Big Data Workloads

LUO Jian-ping, XIE Meng-yao and WANG Hua-feng   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Aiming at the big data workloads with off-line analysis and interactive queries,first we analyzed some common features of these workloads,extracted the common set of operations and arranged the workloads in groups.Then,we tested the big data workloads on the BigDataBench platform and got the micro-architecture characteristics using PCA and SimpleKMeans algorithm for dimensionality reduction and clustering analysis.Our study revealed that big data workloads share a common set of operations such as the Join and Cross Production.We also observed that some of the big data workloads have many similar features.For example,the Difference and Projection operations share micro-architectural characteristics.The result of our experiment has a guiding significance for the design of hardware platforms like processors and the optimization of applications.Meanwhile,it also provides valuable insights into the implementation of the big data benchmark platform.

Key words: Big data,Big data workloads,Architecture characteristic

[1] Wen Xiong,Yu Zhi-bin,Bei Zhen-dong,et al.A characterization of big data benchmarks[C]∥2013 IEEE International Confe-rence on Big Data.2013:118-125
[2] Gao Wan-ling,Zhu Yu-qing,Jia Zhen,et al.BigDataBench:a Big Data Benchmark Suite fromWeb Search Engines[C]∥The Third Workshop on Architectures and Systems for Big Data ( ASBD 2013 ) in Conjunction with The 40th International Symposium on Computer Architecture.2013
[3] Wang Lei,Zhan Jian-feng,Luo Chun-jie,et al.BigDataBench:A big data benchmark suite from internet services[C]∥2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).2014:488-499
[4] Jia Zhen,Wang Lei,Zhan Jian-feng,et al.Characterizing data analysis workloads in data centers[C]∥2013 IEEE International Symposium on Workload Characterization (IISWC).IEEE,2013:66-76
[5] White T.Hadoop:The Definitive Guide(Second Edition)[M].San Francisco:O’Reilly Media,2011
[6] Han Jia-wei,Kamber M,Pei Jian.Data Mining:Concepts andTechniques(Third Edition)[M].San Francisco:Elsevier,2012
[7] ICTBench.http://prof.ict.ac.cn/ICTBench/
[8] Shark.http://shark.cs.berkeley.edu/
[9] Spark.http://spark.apache.org/
[10] Hive.http://hive.apache.org/
[11] 冯琳.集群计算引擎Spark中的内存优化研究与实现[D].北京:清华大学,2013 Feng Lin.Research and Implementation ofMemory Optimization Based on Parallel Computing Engine Spark[D].Beijing:Tsinghua University,2013
[12] 赵龙,江荣安.基于 Hive 的海量搜索日志分析系统研究[J].计算机应用研究,2013,30(11):3343-3345 Zhao Long,Jiang Rong-an.Research of massive searching logs analysis system based on Hive[J].Application Research of Computers,2013,30(11):3343-3345
[13] 叶文宸.基于hive的性能优化方法的研究与实践[D].南京:南京大学,2011 Ye Wen-chen.The Research and Practice of Performance Optimization Based on Hive[D].Nanjing:Nanjing University,2011
[14] 唐振坤.基于Spark的机器学习平台设计与实现[D].厦门:厦门大学,2014 Tang Zheng-kun.Design and Implementation of Machine Learning Platform Based on Spark[D].Xiamen:Xiamen University,2014
[15] 刘记云.基于MapReduce的个性化PageRank算法研究[D].哈尔滨:哈尔滨工程大学,2013 Liu Ji-yun.A Research on Personalized PageRank Based on MapReduce[D].Harbin:Harbin Engineering University,2013
[16] 李林.基于Hadoop平台的视觉数据聚类研究与实现[D].西安:西安电子科技大学,2013 Li Lin.Research and Implementation of Clustering on Visual Data Based on Hadoop[D].Xi’an:XiDian University,2013
[17] 黄永兵,陈明宇.移动设备应用程序的体系结构特征分析[J].计算机学报,2015:38(2):386-396 Huang Yong-bing,Chen Ming-yu.Architecture Characteristics and Analysis of Mobile Device Applications[J].Chinese Journal of Computers,2015,8(2):386-396

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!