Computer Science ›› 2015, Vol. 42 ›› Issue (11): 80-83.doi: 10.11896/j.issn.1002-137X.2015.11.016

Previous Articles     Next Articles

Parallelization of MIC Algorithm Based on MapReduce

LV Rui, CAI Guo-yong and PEI Guang-zhan   

  • Online:2018-11-14 Published:2018-11-14

Abstract: MIC is a kind of method to analyze the possible relationships existing between variables,which can not only effectively identify the various complex types of relationships,but also accurately describe the impact of noise on the relationships.Exploring variable relationships in the large data sets is considered significant to big data mining.Aiming at the shortage of performance in dealing with the data set containing a large number of variables,this paper proposed a parallelization method based on MapReduce.Firstly,a finer and smaller partition to the raw algorithm was conducted,and then a parallel model based on the task chain Map-Reduce-Map was adopted.The model not only effectively increases the parallel computing units,but also greatly reduces unnecessary consumption of system resource.Theoretical ana-lysis and experimental verification demonstrate that the improved algorithm has the same accuracy as well as the original algorithm and a great improvment in terms of running speed.The relationship between speed-up ratio and the amount of process Map2 shows that our method has a good scalability in the aspects of system resources.

Key words: Big data,MIC,Relationship mining,MapReduce,Parallelization

[1] Zhou A Y.Data intensive computing-challenges of data management techniques[J].Communications of CCF,2009,5(7):50-53
[2] Biaynicki-Birula I,Mycielski J.Uncertainty relations for information entropy in wave mechanics[J].Communications in Mathematical Physics,1975,44(2):129-132
[3] Wells III W M,Viola P,Atsumi H,et al.Multi-modal volume registration by maximization of mutual information[J].Medical Image Analysis,1996,1(1):35-51
[4] Batina L,Gierlichs B,Prouff E,et al.Mutual information analysis:a comprehensive study[J].Journal of Cryptology,2011,24(2):269-291
[5] Reshef D N,Reshef Y A,Finucane H K,et al.Detecting novel associations in large data sets[J].Science,2011,334(6062):1518-1524
[6] Labrinidis A,Jagadish H V.Challenges and opportunities with big data[J].Proceedings of the VLDB Endowment,2012,5(12):2032-2033
[7] McAfee A,Brynjolfsson E,Davenport T H,et al.Big Data[J].The management revolution.Harvard Bus Rev,2012,90(10):61-67
[8] Lohr S.The age of big data[Z].New York Times,2012,16(4):10-15
[9] Dean J,Ghemawat S.MapReduce:a flexible data processing tool[J].Communications of the ACM,2010,53(1):72-77
[10] Wang H,Huang W,Zhang Q,et al.An improved algorithm for the packing of unequal circles within a larger containing circle[J].European Journal of Operational Research,2002,141(2):440-453
[11] White T.Hadoop:The definitive guide[M].O’Reilly Media,Inc.,2012
[12] Groot S,Kitsuregawa M.Jumbo:Beyond MapReduce for workload balancing[C]∥36th International Conference on Very Large Data Bases.Singapore,2010
[13] Zheng Q.Improving MapReduce fault tolerance in the cloud[C]∥2010 IEEE International Symposium on Parallel & Distributed Processing,Workshops and Phd Forum (IPDPSW).IEEE,2010:1-6
[14] Lee K H,Lee Y J,Choi H,et al.Parallel data processing with MapReduce:a survey[J].ACM SIGMOD Record,2012,40(4):11-20
[15] McKenna A,Hanna M,Banks E,et al.The Genome Analysis Toolkit:a MapReduce framework for analyzing next-generation DNA sequencing data[J].Genome research,2010,20(9):1297-1303

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!