Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 206-210.doi: 10.11896/jsjkx.210500089

• Intelligent Computing • Previous Articles     Next Articles

TI-FastText Automatic Goods Classification Algorithm

SHAO Xin-xin   

  1. Dalian Neusoft University of Information,Dalian,Liaoning 116023,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:SHAO Xin-xin,born in 1980,postgra-duate,assistant professor.Her main research interests include computer software and theory,and big data.
  • Supported by:
    Natural Science Foundation of Liaoning Province,China(2019-ZD-0354).

Abstract: In order to achieve automatic classification of goods according to title information,a Chinese words goods classification algorithm based on TF-IDF(term frequency-inverse document frequency) and FastText is proposed.In this algorithm,the lexicon is represented as a prefix tree by FastText.The TF-IDF filting is performed on the dictionary processed by n-grammar model.Thus,the high group degree of the entries is biased in the process of computing the mean value of input word sequence vectors,making them more suitable for the Chinese short text classification environment.This paper uses Anaconda platform to implement and optimize the product classification algorithm based on FastText.After evaluation,the algorithm has a high accuracy rate and can meet the needs of goods classification on e-commerce platforms.

Key words: Chinese short text classification, FastText, Goods classification, TF-IDF

CLC Number: 

  • TP391.9
[1] REIA-DAVAHLI M.Comparing the Quality and Speed of Sentence Classification with Modern Language Models[J].Applied Sciences,2020,10:3386.
[2] JIANG S,LI S,SUNG Y.FastText-Based Local Feature Visua-lization Algorithm for Merged Image-Based Malware Classification Framework for Cyber Security and Cyber Defense[J].Mathematics,2020,8(3):1-13.
[3] BAH A,AALA B,SM A. Towards a real-time processingframework based on improved distributed recurrent neural network variants with FastText for social big data analytics[J].Information Processing & Management,2020,57(1):102122.
[4] HOU W Z.Police Intelligence Decomposition Based on FastText and WKNN Fusion Model[J].Modern Electronic Technology,2020,43(13):73-80.
[5] YIN A Y,WU Y B,ZHENG Y J,et al.An Improved Algorithm for Word Vector Representation Based on FastText Model[J].Journal of Fuzhou University(Natural Science Edition),2019,47(3):314-319.
[6] LIU T,CHEN S Y,NI W J.Rapid Generation of Emergency Plan Based on SIF-FastText Algorithm[J].China Sciencepaper,2020,15(11):1270-1276.
[7] CHEN K W,ZHANG Z P,LONG J.Research on Entropy-BasedTermWeighting Methods in Text Categorization[J].Journal of Frontiers of Computer Science and Technology,2016,10(9):1299-1309.
[8] LE N,YAPP E,NAGASUNDARAM N,et al.Classifying Promoters by Interpreting the Hidden Information of DNA Sequences via Deep Learning and Combination of Conti-nuous FastText N-Grams[J].Frontiers in Bioengineering and Biotechnology,2019,7:305.
[9] YU P,CUI V Y,GUAN J.Text Classification by using Natural Language Processing[J].Journal of Physics:Conference Series,2021,1802(4):042010.
[10] WANG R,RIDLEY R,SU X,et al.A novel reasoning mechanism for multi-label text classification[J].Information Proces-sing & Management,2021,58(2):102441.
[11] WANG Z K,SHEN D S,WANG C X.A Fast Multi-Tag Feature Selection Algorithm Based on Text Classification with Fisher Score [J/OL].[2021-03-15].https://doi.org/10.19678/j.issn.1000-3428.0060594.
[12] WANG J Q,ZHANG L.Text feature selection oriented to redundant relative criterion [J/OL].[2021-03-15].http://doi.org/10.13451/j.sxu.ns.2020141.
[13] DUAN D D,TANG J S,WEN Y,et al.Chinese Short Text Classification Algorithm Based on Bert Model[J].Computer Engineering,2021,47(1):79-86.
[14] KANG C,ZHENG S H,LI W L.Short Text ClassificationUsing LDA Topic Model and Two-dimensional Convolution[J].Computer Applications and Software,2020,37(11):127-131,153.
[15] LIU Y C,SUN H Z,MA C M,et al.Online Product Classification Based on High-level Feature Fusion[J].Journal of Beijing University of Posts and Telecommunications,2020,43(5):98-104,117.
[1] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[2] ZHAO Rui-jie, SHI Yong, ZHANG Han, LONG Jun, XUE Zhi. Webshell File Detection Method Based on TF-IDF [J]. Computer Science, 2020, 47(11A): 363-367.
[3] ZENG An and XU Xiao-qiang. Hybrid Collaborative Filtering Recommendation Algorithm Based on Friendships and Tag [J]. Computer Science, 2017, 44(8): 246-251.
[4] HUAN Tian, HAO Ning and NIU Qiang. Improved MIMLSVM Algorithm Based on Concept Weight Vector [J]. Computer Science, 2017, 44(12): 48-51.
[5] TANG Ming, ZHU Lei and ZOU Xian-chun. Document Vector Representation Based on Word2Vec [J]. Computer Science, 2016, 43(6): 214-217.
[6] LI Jun-huai, FU Jing-fei, JIANG Wen-jie, FEI Rong and WANG Huai-jun. Feature Selection Method Based on MRMR for Text Classification [J]. Computer Science, 2016, 43(10): 225-228.
[7] LIU Jin-shuo, DENG Ying-ying and DENG Juan. Disambiguation Algorithm Design and Implementation of Food Safety Issues in Network [J]. Computer Science, 2015, 42(Z11): 7-9.
[8] XIANG Lin-hong,ZHANG Ju,SUN Qi-long and ZHAO Xue-ling. Medical Data Similarity Algorithm Analysis Based on Relative-IDF [J]. Computer Science, 2014, 41(Z6): 417-420.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!