计算机科学 ›› 2015, Vol. 42 ›› Issue (10): 208-210.
魏玲,魏永江,高长元
WEI Ling, WEI Yong-jiang and GAO Chang-yuan
摘要: 为提高Apriori算法挖掘频繁项目集的效率,引进了Bigtable技术与MapReduce模型来对Apriori算法进行优化,设计出大数据环境下挖掘频繁项目集的新算法BM-Apriori算法。与单纯基于MapReduce模型的Apriori改进算法相比,新算法利用Bigtable的时间戳属性代替了键/值对的产生,只需扫描数据库一次即可,节约了模式匹配的时间。同时,BM-Apriori算法在项集列表中新增事务标号列,自动获取事务标号以计算支持度。将BM-Apriori算法在Hadoop平台上进行了实验,结果表明Bigtable技术的融入使得BM-Apriori算法具有更高的效率与可拓展性。
[1] Hajian S,Domingo-Ferrer J.A methodology for direct and indi-rect discrimination prevention in data mining [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(7):1445-1459 [2] Lara J,Lizcano D,Martinez A,et al.A UML profile for the conceptual modeling of structurally complex data:Easing human effort in the KDD process [J].Information and Software Techno-logy,2014,56(3):335-351 [3] Agrawal R,Imielinski T,Swami A.Database mining:a performance perspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925 [4] 张震,汪斌强,陈庶樵,等.基于多维计数型布鲁姆过滤器的大流检测机制[J].电子与信息学报,2010,32(7):1608-1613 Zhang Zhen,Wang Bin-qiang,Chen Shu-qiao,et al.A Mechanism of Identifying Heavy Hitters Based on Multi-dimensional Counting Bloom Filter[J].Journal of Electronics & Information Technology,2010,32(7):1608-1613 [5] Wang B L,Shen Y G.Improvement of Apriori algorithm based on boolean matrix [J].Advanced Materials Research,2011,159:144-148 [6] 罗丹,李陶深.一种基于压缩矩阵的Apriori算法改进研究[J].计算机科学,2013,40(12):75-78 Luo Dan,Li Tao-shen.Research on improved Apriori algorithm based on matrix compression [J].Computer Science,2013,40(12):75-78 [7] 李晓虹,尚晋.一种改进的新Apriori算法[J].计算机科学,2007,32(4):196-197 Li Xiao-hong,Shang Jin.An improved Apriori algorithm[J].Computer Science,2007,32(4):196-197 [8] Grudzinski P,Wojciechowski M.Integration of candidate hashtrees in concurrent processing of frequent itemset queries using Apriori[J].Control and Cybernetics,2009,38(1):47-65 [9] Jongwook W.Market Basket Analysis algorithms with MapReduce[J].Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery,2013,3(6):445-452 [10] Karim R,Hossain A,Rashid M,et al.A MapReduce Framework for Mining Maximal Contiguous Frequent Patterns in Large DNA Sequence Datasets [J].IETE Techical Review,2012,29(2):162-168 [11] Chang F,Dean J,Ghemawat S,et al.Bigtable:A distributedstorage system for structured data [J].ACM Transactions on Computer Systems,2008,46(2):205-218 [12] Kim W.Web data stores (aka NoSQL databases):a data model and data management perspective [J].International Journal of Web and Grid Services,2014,10(1):100-110 |
No related articles found! |
|