%A PENG Zheng, WANG Ling-jiao, GUO Hua %T Parallel Text Categorization of Random Forest %0 Journal Article %D 2018 %J Computer Science %R 10.11896/j.issn.1002-137X.2018.12.023 %P 148-152 %V 45 %N 12 %U {https://www.jsjkx.com/CN/abstract/article_17800.shtml} %8 2018-12-15 %X Text categorization is one of the core technologies of information retrieval.Because of the limited computing performance and storage capacity in a computer,the traditional text categorization method can’t be suitable for big data era nowadays.It is realistic and urgent to execute algorithms for classifying the text in parallel to improve the efficiency of algorithm by the parallelization operation of data and tasks on the big data platform of Spark.This paper proposed an improved random fo-rest algorithm for the imbalanced data.It can reduce the impact of imbalanced data on random fo-rests by under-sampling the majority class samples and back-sampling the minority class samples to make up new trai-ning samples.The experimental results show that the new algorithm improves the categorization accuracy of the minority classes when handling imbalanced data sets.