计算机科学 ›› 2014, Vol. 41 ›› Issue (Z6): 339-342.
张琳,邵天昊
ZHANG Lin and SHAO Tian-hao
摘要: 基于云计算的思想运用MapReduce模型解决了传统贝叶斯分类算法不适应大规模数据的缺陷,很大程度地提高了分类速度。结合并行化的特点对算法进行了相应的改进,加入了同义词合并和词频过滤等方法,使得向量维数降低,减少了误判。然后对其中特殊的关键词进行加权,增强了分类准确性。最后在 Hadoop 云计算平台上进行了实验,证明了传统的文本分类算法并行化后在 Hadoop上运行具有较好的加速比,并且改进后的算法能够提高分类精确度。
[1] Jing Y S,Pavlovic V,Rehg J M.Boosted Bayesian network classifiers[J].Machine Learning,2008,73(2):155-184 [2] Webb G I,Boughton J R,Zheng F,et al.Learning by extrapolation from marginal to full-multivariate probability distributions:Decreasingly naive Bayesian classification[J].Machine Lear-ning,2012,86(2):233-272 [3] Tillman R E.Structure learning with independent non-identically distributed data[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.New York,2009:1041-1048 [4] Su J,Zhang H,Ling C X,et al.Discriminative parameter learning for Bayesian networks[C]∥Proceedings of the 25th International Conference on Machine Learning(ICML 2008).Helsinki,Finland,2008:1014-1023 [5] Ekanayake J,Li H,Zhang B,et al.Twister:A runtime for iterative MapReduce[C]∥Proceedings of the 19th ACM International Symposium onHigh Performance Distributed Computing.Chicago,Illinois,USA,2010:810-818 [6] Dean J,Ghemawat S.Mapreduce:Smiplified data processingonlarge clusters[C]∥Proceedings of the 6th Symposium onOpe-rating System Design and Implementation.SanFrancisco,California,USA:USENIX Association,2004:137-150 [7] Thusoo A,Sarma J S,Jain N,et al.Hive:A warehousing so-lution over a map-reduce framework[C]∥Proceedings of the Conference on Very Large Databases (VLDB.09).Lyon,France,2009:1626-1629 [8] Dean J,Ghemawat S.Map/Reduce advantages overparallel databases include storage-systemindependenceand fine-grain fault tolerance for large jobs[J].Communi-cations of the ACM,2010,3(1):72-77 [9] Dittrich J,Quiane-Ruiz J-A,Jindal A,et al.Hadoop+ :Ma-king a yellow elephant run like a cheetah(without it evennoti-cing)[J].Proceedings of the VLDB Endowment,2010,3(1):518-529 [10] Bu Y,Howe B,Balazinska M,et al.HaLoop:Efficient itera-tive data processing on large clusters[J].Proceedings of theVLDB Endowment,2010,3(1):285-296 |
No related articles found! |
|