计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 80-83.doi: 10.11896/j.issn.1002-137X.2015.11.016

• 2014年全国高性能计算机学术年会 • 上一篇    下一篇

基于MapReduce的MIC算法并行化

吕瑞,蔡国永,裴广战   

  1. 桂林电子科技大学广西可信软件重点实验室 桂林541004,桂林电子科技大学广西可信软件重点实验室 桂林541004,桂林电子科技大学广西可信软件重点实验室 桂林541004
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受广西自然科学基金(2011GXNSFA018156),研究生创新项目(GDYCSZ201464)资助

Parallelization of MIC Algorithm Based on MapReduce

LV Rui, CAI Guo-yong and PEI Guang-zhan   

  • Online:2018-11-14 Published:2018-11-14

摘要: MIC是一种分析变量之间可能存在的关系的方法。该方法不仅能够有效识别出变量间各种复杂类型的关系,还能够准确描述噪音数据对存在关系的影响,对探索大数据集中变量之间的关系具有重要意义。针对该方法在处理包含大量变量的数据集时性能方面的不足,首次对它进行了基于MapReduce模型的并行化。提出的并行化方法首先对原算法进行更细颗粒度的划分,然后采用一种基于Map-Reduce-Map任务链的并行模型,该模型不仅有效地增加了并行的计算单元,还大大地降低了不必要的系统开销。最后,通过理论分析和实验验证得出,改进后的算法与原算法相比,在准确率方面具有等效性,运行速度大幅度提升且具有良好的可扩展性;实验同时指出了算法性能的提升与系统资源的关系。

关键词: 大数据,MIC,关系挖掘,MapReduce,并行化

Abstract: MIC is a kind of method to analyze the possible relationships existing between variables,which can not only effectively identify the various complex types of relationships,but also accurately describe the impact of noise on the relationships.Exploring variable relationships in the large data sets is considered significant to big data mining.Aiming at the shortage of performance in dealing with the data set containing a large number of variables,this paper proposed a parallelization method based on MapReduce.Firstly,a finer and smaller partition to the raw algorithm was conducted,and then a parallel model based on the task chain Map-Reduce-Map was adopted.The model not only effectively increases the parallel computing units,but also greatly reduces unnecessary consumption of system resource.Theoretical ana-lysis and experimental verification demonstrate that the improved algorithm has the same accuracy as well as the original algorithm and a great improvment in terms of running speed.The relationship between speed-up ratio and the amount of process Map2 shows that our method has a good scalability in the aspects of system resources.

Key words: Big data,MIC,Relationship mining,MapReduce,Parallelization

[1] Zhou A Y.Data intensive computing-challenges of data management techniques[J].Communications of CCF,2009,5(7):50-53
[2] Biaynicki-Birula I,Mycielski J.Uncertainty relations for information entropy in wave mechanics[J].Communications in Mathematical Physics,1975,44(2):129-132
[3] Wells III W M,Viola P,Atsumi H,et al.Multi-modal volume registration by maximization of mutual information[J].Medical Image Analysis,1996,1(1):35-51
[4] Batina L,Gierlichs B,Prouff E,et al.Mutual information analysis:a comprehensive study[J].Journal of Cryptology,2011,24(2):269-291
[5] Reshef D N,Reshef Y A,Finucane H K,et al.Detecting novel associations in large data sets[J].Science,2011,334(6062):1518-1524
[6] Labrinidis A,Jagadish H V.Challenges and opportunities with big data[J].Proceedings of the VLDB Endowment,2012,5(12):2032-2033
[7] McAfee A,Brynjolfsson E,Davenport T H,et al.Big Data[J].The management revolution.Harvard Bus Rev,2012,90(10):61-67
[8] Lohr S.The age of big data[Z].New York Times,2012,16(4):10-15
[9] Dean J,Ghemawat S.MapReduce:a flexible data processing tool[J].Communications of the ACM,2010,53(1):72-77
[10] Wang H,Huang W,Zhang Q,et al.An improved algorithm for the packing of unequal circles within a larger containing circle[J].European Journal of Operational Research,2002,141(2):440-453
[11] White T.Hadoop:The definitive guide[M].O’Reilly Media,Inc.,2012
[12] Groot S,Kitsuregawa M.Jumbo:Beyond MapReduce for workload balancing[C]∥36th International Conference on Very Large Data Bases.Singapore,2010
[13] Zheng Q.Improving MapReduce fault tolerance in the cloud[C]∥2010 IEEE International Symposium on Parallel & Distributed Processing,Workshops and Phd Forum (IPDPSW).IEEE,2010:1-6
[14] Lee K H,Lee Y J,Choi H,et al.Parallel data processing with MapReduce:a survey[J].ACM SIGMOD Record,2012,40(4):11-20
[15] McKenna A,Hanna M,Banks E,et al.The Genome Analysis Toolkit:a MapReduce framework for analyzing next-generation DNA sequencing data[J].Genome research,2010,20(9):1297-1303

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!