云计算环境下的一种改进的贝叶斯文本分类算法

Abstract

Abstract: Used the idea of cloud computing,according to MapReduce model to solve the traditional Bayesian classification algorithm suited to large-scale data deficiencies,greatly improved the speed of classification．And the combination of the characteristics of the parallel algorithm was improved accordingly.Adding synonyms and word frequency filtering combined approach allows vector dimensionality reduction,reducing false positives．Wherein the particular keyword was then weighted to enhance the accuracy of classification．Finally,the Hadoop cloud computing platform was experimentally proved that the traditional text classification algorithm after parallelization on Hadoop cloud computing platforms,has better speedup,and the improved algorithm can improve the classification accuracy.

Key words: Cloud computing,Text classification,Parallel,Hadoop

ZHANG Lin and SHAO Tian-hao. Improved Bayesian Text Classification Algorithm in Cloud Computing Environment[J].Computer Science, 2014, 41(Z6): 339-342.

References

[1] Jing Y S,Pavlovic V,Rehg J M．Boosted Bayesian network classifiers[J]．Machine Learning,2008,73(2):155-184
[2] Webb G I,Boughton J R,Zheng F,et al．Learning by extrapolation from marginal to full-multivariate probability distributions:Decreasingly naive Bayesian classification[J]．Machine Lear-ning,2012,86(2):233-272
[3] Tillman R E．Structure learning with independent non-identically distributed data[C]∥Proceedings of the 26th Annual International Conference on Machine Learning．New York,2009:1041-1048
[4] Su J,Zhang H,Ling C X,et al．Discriminative parameter learning for Bayesian networks[C]∥Proceedings of the 25th International Conference on Machine Learning(ICML 2008)．Helsinki,Finland,2008:1014-1023
[5] Ekanayake J,Li H,Zhang B,et al．Twister:A runtime for iterative MapReduce[C]∥Proceedings of the 19th ACM International Symposium onHigh Performance Distributed Computing．Chicago,Illinois,USA,2010:810-818
[6] Dean J,Ghemawat S．Mapreduce:Smiplified data processingonlarge clusters[C]∥Proceedings of the 6th Symposium onOpe-rating System Design and Implementation．SanFrancisco,California,USA:USENIX Association,2004:137-150
[7] Thusoo A,Sarma J S,Jain N,et al．Hive:A warehousing so-lution over a map-reduce framework[C]∥Proceedings of the Conference on Very Large Databases (VLDB.09)．Lyon,France,2009:1626-1629
[8] Dean J,Ghemawat S．Map/Reduce advantages overparallel databases include storage-systemindependenceand fine-grain fault tolerance for large jobs[J]．Communi-cations of the ACM,2010,3(1):72-77
[9] Dittrich J,Quiane-Ruiz J-A,Jindal A,et al．Hadoop+ :Ma-king a yellow elephant run like a cheetah(without it evennoti-cing)[J]．Proceedings of the VLDB Endowment,2010,3(1):518-529
[10] Bu Y,Howe B,Balazinska M,et al．HaLoop:Efficient itera-tive data processing on large clusters[J]．Proceedings of theVLDB Endowment,2010,3(1):285-296

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Improved Bayesian Text Classification Algorithm in Cloud Computing Environment

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0