计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 80-83.doi: 10.11896/j.issn.1002-137X.2015.11.016
吕瑞,蔡国永,裴广战
LV Rui, CAI Guo-yong and PEI Guang-zhan
摘要: MIC是一种分析变量之间可能存在的关系的方法。该方法不仅能够有效识别出变量间各种复杂类型的关系,还能够准确描述噪音数据对存在关系的影响,对探索大数据集中变量之间的关系具有重要意义。针对该方法在处理包含大量变量的数据集时性能方面的不足,首次对它进行了基于MapReduce模型的并行化。提出的并行化方法首先对原算法进行更细颗粒度的划分,然后采用一种基于Map-Reduce-Map任务链的并行模型,该模型不仅有效地增加了并行的计算单元,还大大地降低了不必要的系统开销。最后,通过理论分析和实验验证得出,改进后的算法与原算法相比,在准确率方面具有等效性,运行速度大幅度提升且具有良好的可扩展性;实验同时指出了算法性能的提升与系统资源的关系。
[1] Zhou A Y.Data intensive computing-challenges of data management techniques[J].Communications of CCF,2009,5(7):50-53 [2] Biaynicki-Birula I,Mycielski J.Uncertainty relations for information entropy in wave mechanics[J].Communications in Mathematical Physics,1975,44(2):129-132 [3] Wells III W M,Viola P,Atsumi H,et al.Multi-modal volume registration by maximization of mutual information[J].Medical Image Analysis,1996,1(1):35-51 [4] Batina L,Gierlichs B,Prouff E,et al.Mutual information analysis:a comprehensive study[J].Journal of Cryptology,2011,24(2):269-291 [5] Reshef D N,Reshef Y A,Finucane H K,et al.Detecting novel associations in large data sets[J].Science,2011,334(6062):1518-1524 [6] Labrinidis A,Jagadish H V.Challenges and opportunities with big data[J].Proceedings of the VLDB Endowment,2012,5(12):2032-2033 [7] McAfee A,Brynjolfsson E,Davenport T H,et al.Big Data[J].The management revolution.Harvard Bus Rev,2012,90(10):61-67 [8] Lohr S.The age of big data[Z].New York Times,2012,16(4):10-15 [9] Dean J,Ghemawat S.MapReduce:a flexible data processing tool[J].Communications of the ACM,2010,53(1):72-77 [10] Wang H,Huang W,Zhang Q,et al.An improved algorithm for the packing of unequal circles within a larger containing circle[J].European Journal of Operational Research,2002,141(2):440-453 [11] White T.Hadoop:The definitive guide[M].O’Reilly Media,Inc.,2012 [12] Groot S,Kitsuregawa M.Jumbo:Beyond MapReduce for workload balancing[C]∥36th International Conference on Very Large Data Bases.Singapore,2010 [13] Zheng Q.Improving MapReduce fault tolerance in the cloud[C]∥2010 IEEE International Symposium on Parallel & Distributed Processing,Workshops and Phd Forum (IPDPSW).IEEE,2010:1-6 [14] Lee K H,Lee Y J,Choi H,et al.Parallel data processing with MapReduce:a survey[J].ACM SIGMOD Record,2012,40(4):11-20 [15] McKenna A,Hanna M,Banks E,et al.The Genome Analysis Toolkit:a MapReduce framework for analyzing next-generation DNA sequencing data[J].Genome research,2010,20(9):1297-1303 |
No related articles found! |
|