Computer Science ›› 2014, Vol. 41 ›› Issue (Z6): 339-342.

Previous Articles     Next Articles

Improved Bayesian Text Classification Algorithm in Cloud Computing Environment

ZHANG Lin and SHAO Tian-hao   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Used the idea of cloud computing,according to MapReduce model to solve the traditional Bayesian classification algorithm suited to large-scale data deficiencies,greatly improved the speed of classification.And the combination of the characteristics of the parallel algorithm was improved accordingly.Adding synonyms and word frequency filtering combined approach allows vector dimensionality reduction,reducing false positives.Wherein the particular keyword was then weighted to enhance the accuracy of classification.Finally,the Hadoop cloud computing platform was experimentally proved that the traditional text classification algorithm after parallelization on Hadoop cloud computing platforms,has better speedup,and the improved algorithm can improve the classification accuracy.

Key words: Cloud computing,Text classification,Parallel,Hadoop

[1] Jing Y S,Pavlovic V,Rehg J M.Boosted Bayesian network classifiers[J].Machine Learning,2008,73(2):155-184
[2] Webb G I,Boughton J R,Zheng F,et al.Learning by extrapolation from marginal to full-multivariate probability distributions:Decreasingly naive Bayesian classification[J].Machine Lear-ning,2012,86(2):233-272
[3] Tillman R E.Structure learning with independent non-identically distributed data[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.New York,2009:1041-1048
[4] Su J,Zhang H,Ling C X,et al.Discriminative parameter learning for Bayesian networks[C]∥Proceedings of the 25th International Conference on Machine Learning(ICML 2008).Helsinki,Finland,2008:1014-1023
[5] Ekanayake J,Li H,Zhang B,et al.Twister:A runtime for iterative MapReduce[C]∥Proceedings of the 19th ACM International Symposium onHigh Performance Distributed Computing.Chicago,Illinois,USA,2010:810-818
[6] Dean J,Ghemawat S.Mapreduce:Smiplified data processingonlarge clusters[C]∥Proceedings of the 6th Symposium onOpe-rating System Design and Implementation.SanFrancisco,California,USA:USENIX Association,2004:137-150
[7] Thusoo A,Sarma J S,Jain N,et al.Hive:A warehousing so-lution over a map-reduce framework[C]∥Proceedings of the Conference on Very Large Databases (VLDB.09).Lyon,France,2009:1626-1629
[8] Dean J,Ghemawat S.Map/Reduce advantages overparallel databases include storage-systemindependenceand fine-grain fault tolerance for large jobs[J].Communi-cations of the ACM,2010,3(1):72-77
[9] Dittrich J,Quiane-Ruiz J-A,Jindal A,et al.Hadoop+ :Ma-king a yellow elephant run like a cheetah(without it evennoti-cing)[J].Proceedings of the VLDB Endowment,2010,3(1):518-529
[10] Bu Y,Howe B,Balazinska M,et al.HaLoop:Efficient itera-tive data processing on large clusters[J].Proceedings of theVLDB Endowment,2010,3(1):285-296

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!