基于Bigtable与MapReduce的Apriori算法改进

计算机科学 ›› 2015, Vol. 42 ›› Issue (10): 208-210.

基于Bigtable与MapReduce的Apriori算法改进

魏玲,魏永江,高长元

哈尔滨理工大学管理信息系统重点实验室哈尔滨150000,哈尔滨理工大学管理信息系统重点实验室哈尔滨150000,哈尔滨理工大学管理信息系统重点实验室哈尔滨150000

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受国家自然科学基金资助

Improved Apriori Algorithm Based on Bigtable and MapReduce

WEI Ling, WEI Yong-jiang and GAO Chang-yuan

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 为提高Apriori算法挖掘频繁项目集的效率,引进了Bigtable技术与MapReduce模型来对Apriori算法进行优化,设计出大数据环境下挖掘频繁项目集的新算法BM-Apriori算法。与单纯基于MapReduce模型的Apriori改进算法相比,新算法利用Bigtable的时间戳属性代替了键/值对的产生,只需扫描数据库一次即可,节约了模式匹配的时间。同时,BM-Apriori算法在项集列表中新增事务标号列,自动获取事务标号以计算支持度。将BM-Apriori算法在Hadoop平台上进行了实验,结果表明Bigtable技术的融入使得BM-Apriori算法具有更高的效率与可拓展性。

关键词: Apriori算法,Bigtable,MapReduce,大数据

Abstract: BM-Apriori algorithm was designed for big data to address the poor efficiency problem of Apriori in mining frequent item sets.BM-Apriori takes advantages of Bigtable and MapReduce together to optimize Apriori algorithm.Compared with the improved Apriori algorithm simply based on MapReduce model,timestamp of Bigtable is utilized in this algorithm to avoid generating a large number of key/value pairs.It saves the pattern matching time and scans the database only once.Also,to obtain transaction marks automatically,transaction mark column is added to set list for computing support numbers.BM-Apriori was executed on Hadoop platform.The experimental results show that BM-Apriori has higher efficiency and scalability.

Key words: Apriori algorithm,Bigtable,MapReduce,Big data

魏玲,魏永江,高长元. 基于Bigtable与MapReduce的Apriori算法改进[J]. 计算机科学, 2015, 42(10): 208-210. https://doi.org/

WEI Ling, WEI Yong-jiang and GAO Chang-yuan. Improved Apriori Algorithm Based on Bigtable and MapReduce[J]. Computer Science, 2015, 42(10): 208-210. https://doi.org/

参考文献

[1] Hajian S,Domingo-Ferrer J.A methodology for direct and indi-rect discrimination prevention in data mining [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(7):1445-1459
[2] Lara J,Lizcano D,Martinez A,et al.A UML profile for the conceptual modeling of structurally complex data:Easing human effort in the KDD process [J].Information and Software Techno-logy,2014,56(3):335-351
[3] Agrawal R,Imielinski T,Swami A.Database mining:a performance perspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925
[4] 张震,汪斌强,陈庶樵,等.基于多维计数型布鲁姆过滤器的大流检测机制[J].电子与信息学报,2010,32(7):1608-1613 Zhang Zhen,Wang Bin-qiang,Chen Shu-qiao,et al.A Mechanism of Identifying Heavy Hitters Based on Multi-dimensional Counting Bloom Filter[J].Journal of Electronics & Information Technology,2010,32(7):1608-1613
[5] Wang B L,Shen Y G.Improvement of Apriori algorithm based on boolean matrix [J].Advanced Materials Research,2011,159:144-148
[6] 罗丹,李陶深.一种基于压缩矩阵的Apriori算法改进研究[J].计算机科学,2013,40(12):75-78 Luo Dan,Li Tao-shen.Research on improved Apriori algorithm based on matrix compression [J].Computer Science,2013,40(12):75-78
[7] 李晓虹,尚晋.一种改进的新Apriori算法[J].计算机科学,2007,32(4):196-197 Li Xiao-hong,Shang Jin.An improved Apriori algorithm[J].Computer Science,2007,32(4):196-197
[8] Grudzinski P,Wojciechowski M.Integration of candidate hashtrees in concurrent processing of frequent itemset queries using Apriori[J].Control and Cybernetics,2009,38(1):47-65
[9] Jongwook W.Market Basket Analysis algorithms with MapReduce[J].Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery,2013,3(6):445-452
[10] Karim R,Hossain A,Rashid M,et al.A MapReduce Framework for Mining Maximal Contiguous Frequent Patterns in Large DNA Sequence Datasets [J].IETE Techical Review,2012,29(2):162-168
[11] Chang F,Dean J,Ghemawat S,et al.Bigtable:A distributedstorage system for structured data [J].ACM Transactions on Computer Systems,2008,46(2):205-218
[12] Kim W.Web data stores (aka NoSQL databases):a data model and data management perspective [J].International Journal of Web and Grid Services,2014,10(1):100-110

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed