Computer Science ›› 2015, Vol. 42 ›› Issue (10): 208-210.

Previous Articles     Next Articles

Improved Apriori Algorithm Based on Bigtable and MapReduce

WEI Ling, WEI Yong-jiang and GAO Chang-yuan   

  • Online:2018-11-14 Published:2018-11-14

Abstract: BM-Apriori algorithm was designed for big data to address the poor efficiency problem of Apriori in mining frequent item sets.BM-Apriori takes advantages of Bigtable and MapReduce together to optimize Apriori algorithm.Compared with the improved Apriori algorithm simply based on MapReduce model,timestamp of Bigtable is utilized in this algorithm to avoid generating a large number of key/value pairs.It saves the pattern matching time and scans the database only once.Also,to obtain transaction marks automatically,transaction mark column is added to set list for computing support numbers.BM-Apriori was executed on Hadoop platform.The experimental results show that BM-Apriori has higher efficiency and scalability.

Key words: Apriori algorithm,Bigtable,MapReduce,Big data

[1] Hajian S,Domingo-Ferrer J.A methodology for direct and indi-rect discrimination prevention in data mining [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(7):1445-1459
[2] Lara J,Lizcano D,Martinez A,et al.A UML profile for the conceptual modeling of structurally complex data:Easing human effort in the KDD process [J].Information and Software Techno-logy,2014,56(3):335-351
[3] Agrawal R,Imielinski T,Swami A.Database mining:a performance perspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925
[4] 张震,汪斌强,陈庶樵,等.基于多维计数型布鲁姆过滤器的大流检测机制[J].电子与信息学报,2010,32(7):1608-1613 Zhang Zhen,Wang Bin-qiang,Chen Shu-qiao,et al.A Mechanism of Identifying Heavy Hitters Based on Multi-dimensional Counting Bloom Filter[J].Journal of Electronics & Information Technology,2010,32(7):1608-1613
[5] Wang B L,Shen Y G.Improvement of Apriori algorithm based on boolean matrix [J].Advanced Materials Research,2011,159:144-148
[6] 罗丹,李陶深.一种基于压缩矩阵的Apriori算法改进研究[J].计算机科学,2013,40(12):75-78 Luo Dan,Li Tao-shen.Research on improved Apriori algorithm based on matrix compression [J].Computer Science,2013,40(12):75-78
[7] 李晓虹,尚晋.一种改进的新Apriori算法[J].计算机科学,2007,32(4):196-197 Li Xiao-hong,Shang Jin.An improved Apriori algorithm[J].Computer Science,2007,32(4):196-197
[8] Grudzinski P,Wojciechowski M.Integration of candidate hashtrees in concurrent processing of frequent itemset queries using Apriori[J].Control and Cybernetics,2009,38(1):47-65
[9] Jongwook W.Market Basket Analysis algorithms with MapReduce[J].Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery,2013,3(6):445-452
[10] Karim R,Hossain A,Rashid M,et al.A MapReduce Framework for Mining Maximal Contiguous Frequent Patterns in Large DNA Sequence Datasets [J].IETE Techical Review,2012,29(2):162-168
[11] Chang F,Dean J,Ghemawat S,et al.Bigtable:A distributedstorage system for structured data [J].ACM Transactions on Computer Systems,2008,46(2):205-218
[12] Kim W.Web data stores (aka NoSQL databases):a data model and data management perspective [J].International Journal of Web and Grid Services,2014,10(1):100-110

No related articles found!
Full text



[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[7] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[8] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[9] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[10] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .