计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240300150-6.doi: 10.11896/jsjkx.240300150

• 大数据&数据科学 • 上一篇    下一篇

BEML:一种面向商品隐空间表征的混合学习分析范式

郑骐健, 刘峰   

  1. 华东师范大学计算机科学与技术学院 上海 200062
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 刘峰(lsttoy@163.com)
  • 作者简介:(shange0403@163.com)
  • 基金资助:
    上海市科技计划项目(20dz2260300);华东师范大学计算机科学与技术学院“人工智能赋能心理/教育学科交叉人才培养专项基金”(2024JCRC-10)

BEML:A Blended Learning Analysis Paradigm for Hidden Space Representation of Commodities

ZHENG Qijian, LIU Feng   

  1. School of Computer Science and Technology,East China Normal University,Shanghai 200062,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:ZHENG Qijian,born in 2003,master,is a student member of CCF(No.N9988G).His main research interests include deep learning technology and so on.
    LIU Feng,born in 1988,Ph.D,is a se-nior member of CCF(No.93542S).His main research interests include deep learning technology and blockchain technology.
  • Supported by:
    Research Project of Shanghai Science and Technology Commission(20dz2260300) and Special Fund for Talent Cultivation of Artificial Intelligence Enabled Psychology/Education Interdisciplinary Cross-disciplinary Talents(2024JCRC-10),School of Computer Science and Technology,East China Normal University.

摘要: 随着互联网经济时代的到来,电子商务平台的高效管理日益受到学术界和工业界的广泛关注,其中,商品分类的精度与自动化水平直接影响着用户体验及运营效率的优化。鉴于此,本研究围绕商品信息的隐空间表征进行深入探讨,提出了一种面向商品隐空间表征的混合学习分析范式BEML。该框架融合了先进的双向编码器表示(BERT)技术与传统机器学习方法,旨在通过对商品信息隐空间的细致解析,显著提升商品分类的自动化处理效率及准确性。与现行主流的深度学习和机器学习算法进行对比分析的实验结果表明,BEML框架针对本次亚马逊在线分析数据集的最佳分类效果F1指标的宏平均达到了85.79%,微平均达到了84.73%,均超过了目前最佳F1指标83.3%,实现了新的SOTA。该框架不仅在理论上具有创新性,其在电子商务领域的信息管理和自动化处理实践中亦具有重要的应用价值,为科技商学领域提供了一种高效且可靠的混合学习分析范式。

关键词: 隐空间表征, BERT预训练模型, 自动商品分类, 智能化商品分类, 科技商学

Abstract: With the advent of the Internet economy era,the efficient management of e-commerce platforms has garnered widespread attention from both academia and industry.Among various factors,the accuracy and automation level of product classification directly impact users' experience and the optimization of operational efficiency.In light of this,this study delves into the latent space representation of product information,proposing a blended learning analysis paradigm for product latent space representation(BEML).This framework integrates advanced bidirectional encoder representations from transformers(BERT) techno-logy with traditional machine learning methods,aiming to significantly enhance the efficiency and accuracy of automated product classification through detailed analysis of the latent space of product information.By conducting comparative analysis with current mainstream deep learning and machine learning algorithms,this study validates the exceptional performance of the BEML framework in product classification tasks.Experimental results demonstrate that the BEML framework achieves a macro F1 score of 85.79% and a micro F1 score of 84.73%.Both exceed the current best F1 score of 83.3%,reaching a state of the art.Moreover,this framework not only represents a theoretical innovation but also holds significant practical application value in the realm of information management and automation processing within the e-commerce sector,providing an efficient and reliable blended lear-ning analysis paradigm for the field of technology and business.

Key words: Latent space representation, Pre-trained BERT model, Automated commodity classification, Intelligent commodity classification, Sci-tech driven business

中图分类号: 

  • TP311
[1]LANDAUER T K,FOLTZ P W,LAHAM D.An introduction to latent semantic analysis[J].Discourse Processes,1998,25(2/3):259-284.
[2]CHANG T Z,WILDT A R.Price,product information,and purchase intention:An empirical study[J].Journal of the Academy of Marketing Science,1994,22:16-27.
[3]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[4]YANG L,SHIJIA E,XU S,et al.Bert with Dynamic Masked Softmax and Pseudo Labeling for Hierarchical Product Classification[C]//MWPD@ ISWC.2020.
[5]BELTAGY I,LO K,COHAN A.SciBERT:A pretrained language model for scientific text[J].arXiv:1903.10676,2019.
[6]LEE J,YOON W,KIM S,et al.BioBERT:a pre-trained biome-dical language representation model for biomedical text mining[J].Bioinformatics,2020,36(4):1234-1240.
[7]PEETERS R,BIZER C.Dual-objective fine-tuning of BERT for entity matching[J].Proceedings of the VLDB Endowment,2021,14:1913-1921.
[8]ZAHERA H M,SHERIF M.ProBERT:Product Data Classification with Fine-tuning BERT Model[C]//MWPD@ ISWC.2020.
[9]MEUSEL R,PRIMPELI A,MEILICKE C,et al.Exploiting microdata annotations to consistently categorize product offers at web scale[C]//International Conference on Electronic Commerce and Web Technologies.Cham:Springer International Publishing,2015:83-99.
[10]YU H F,HO C H,ARUNACHALAM P,et al.Product titleclassification versus text classification[J].Csie.Ntu.Edu.Tw,2012:1-25.
[11]ZHANG Z,SONG X.An exploratory study on utilising the web of linked data for product data mining[J].SN Computer Science,2022,4(1):15.
[12]LOUIZOS C,SWERSKY K,LI Y,et al.The variational fair autoencoder[J].arXiv:1511.00830,2015.
[13]CHAVALTADA C,PASUPA K,HARDOOND R.A comparative study of machine learning techniques for automatic product categorisation[C]//Advances in Neural Networks(ISNN 2017),Part I 14.Springer International Bublishing,2017:10-17.
[14]RISTOSKI P,PETROVSKI P,MIKAP,et al.A machine lear-ning approach for product matching and categorization[J].Semantic web,2018,9(5):707-728.
[15]LANDAUER T K,DUMAIS S T.A solution to Plato's problem:The latent semantic analysis theory of acquisition,induction,and representation of knowledge[J].Psychological Review,1997,104(2):211-240.
[16]LEE H,YOON Y.Engineering doc2vec for automatic classification of product descriptions on O2O applications[J].Electronic Commerce Research,2018,18:433-456.
[17]ZHANG Z,PARAMITA M.Product classification using microdata annotations[C]//The Semantic Web-ISWC 2019:18th International Semantic Web Conference,Auckland,New Zealand,Part I 18.Springer International Publishing,2019:716-732.
[18]REDDY B,RAMAKANTHA R,LOKESH K.Classification of health care products using hybrid CNN-LSTM model[J].Soft Computing,2023,27:9199-9126.
[19]JAHANSHAHI H,OZYEGEN O,CEVIK M,et al.Text Classification for Predicting Multi-level Product Categories[C]//Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering.2021:33-42.
[20]HEUNG B,HO H C,ZHANG J,et al.An overview and com-parison of machine-learning techniques for classification purposes in digital soil mapping[J].Geoderma,2016,265:62-77.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!