Computer Science ›› 2021, Vol. 48 ›› Issue (2): 87-92.doi: 10.11896/jsjkx.200700111

Special Issue: Big Data & Data Scinece

• Database & Big Data & Data Science • Previous Articles     Next Articles

Social E-commerce Text Classification Algorithm Based on BERT

LI Ke-yue1, CHEN Yi2, NIU Shao-zhang1   

  1. 1 School of Computer Science, Beijing University of Posts, Telecommunications, Beijing 100876, China
    2 Mobile Big Data Center,Southeast Digital Economic Development Institute,Quzhou,Zhejiang 324000,China
  • Received:2020-07-17 Revised:2020-12-04 Online:2021-02-15 Published:2021-02-04
  • About author:LI Ke-yue,born in 1995,postgraduate.His main research interests include big data processing and machine learning.
    NIU Shao-zhang,born in 1963,Ph.D supervisor,is a member of China Compu-ter Federation.His main research intere-sts include digital image forensics and information security.

Abstract: With the rapid development of online shopping,a large amount of transaction data has been generated in online transaction activities between online merchants and shoppers,which contain great analytical value.Aiming at the text classification pro-blem of social e-commerce product texts,in order to more efficiently and accurately determine the category of products described in the text,this paper proposes a social e-commerce text classification algorithm based on BERT model.The algorithm adopts the BERT pre-trained language model to complete the feature vector representation of social e-commerce text on sentence-level,and then inputs the obtained feature vectors into the targeted classifier for classification.In this paper,we use the social e-commerce text data set for algorithm verification,and the results show that the F1 value of the trained model on the test set can reach up to 94.61%,which is 6% higher than the MRPC classification task based on the BERT model.Therefore,the social e-commerce text classification algorithm proposed in this paper can more efficiently and accurately determine the type of goods described in the text,which is helpful for further analysis of online transaction data and extraction of valuable information from massive data.

Key words: Bidirectional encoder, Feature extraction, Machine learning, Model building, Multi-label text classification

CLC Number: 

  • TP181
[1] CNNIC.The 45th "Statistical Report on Internet Development in China" (Full Text) [OL].(2020-04-24)[2020-11-01].http://www.cac.gov.cn/2020-04/27/c_1589535470378587.htm.
[2] WANG B.The Essence,Causes and Practical Trends of "New Retail" [J].China Business and Market,2017(7):3-11.
[3] YU H.The Development Status,Trends and Countermeasures of New E-commerce Business Types in China [J].China Business and Market,2016,30(12):47-56.
[4] LI Z,DUAN M.Research of Chinese Short Text Classification Based on Word2vec [J].Computer Life (CPL),2019,7(2):90-96.
[5] QIAO X,PENG C,LIU Z,et al.Word-character attention model for Chinese text classification[J].International Journal of Machine Learning and Cybernetics,2019,10:3521-3537.
[6] WANG L.Research on Chinese short text classification method based on hybrid neural network [D].Hanzhou:Zhejiang Sci-Tech University,2019.
[7] XIE J,HOU Y,WANG Y,et al.Chinese text classificationbased on attention mechanism and feature-enhanced fusion neural network [J].Computing,2020,102:683-700.
[8] HE J,WANG C,WU H,et al.Multi-label chinese comments categorization:comparison of multi-label learning algorithms [J].Journal of New Media,2019,1(2):51-61.
[9] PETERS M,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[C]//Proceedings of the 2018 Confe-rence of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:1-6.
[10] ALEC R,KARTHIK N,TIM S,et al.Improving Language Understanding by Generative Pre-Training [EB/OL].[2020-07-01].https://s3-us-west2.amazonawa.com/openai-assets/research-covers/language-unsupervised/language_unders-tanding_paper.pdf.
[11] WU F,ZHENG Y.Adaptive normalized weighted KNN textclassification based on PSO [J].Scientific Bulletin of National Mining University,2016(1):109-115.
[12] JEFFREY P,RICHARD S,CHRISTOPHER M.Glove:GlobalVectors for Word Representation[C]//Conference on Empirical Methods in Natural Language Processing.2014.
[13] FABRIZIO S.Machine learning in automated text categorization [J].ACM Computing Surveys,2002,34(1):1-47.
[14] SUN M,LI J,GUO Z,et al.THUCTC:An Efficient ChineseText Classifier[EB/OL].[2020-07-01].http:∥thuctc.thunlp.org/.
[15] Sohu News Data[EB/OL].[2020-03-01].https:∥www.jian-shu.com/p/370d3e67a18f.
[16] Netease News Data[EB/OL].[2020-03-01].https:∥news.163.com/.
[17] JACOB D,CHANG M,KENTON L,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805v2,2018.
[18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need [J].arXiv:1706.03762v5,2017.
[19] Google.Pre-trainedmodels,google-research,bert[EB/OL].[2020-05-10].https://github.com/google-research/bert#pre-trained-models.
[20] Google.Sentence (and sentence-pair) classification tasks,google-research,bert[EB/OL].[2020-05-10].https:∥github.com/google-research/bert#sentence-and-sentence-pair-classification-tasks.
[1] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[4] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[5] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[6] ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[7] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[8] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[9] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[10] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[11] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[12] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[13] LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[14] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[15] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!