计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 97-103.doi: 10.11896/jsjkx.200900053

• 数据库&大数据&数据科学 • 上一篇    下一篇

融合文本序列和图信息的海关商品HS编码分类

杜少华1, 万怀宇1, 武志昊1,2, 林友芳1,2   

  1. 1 北京交通大学计算机与信息技术学院 北京100044
    2 综合交通运输大数据应用技术交通运输行业重点实验室 北京100044
  • 收稿日期:2020-06-24 修回日期:2020-10-04 出版日期:2021-04-15 发布日期:2021-04-09
  • 通讯作者: 万怀宇(hywan@bjtu.edu.cn)

Customs Commodity HS Code Classification Integrating Text Sequence and Graph Information

DU Shao-hua1, WAN Huai-yu1, WU Zhi-hao1,2, LIN You-fang1,2   

  1. 1 School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    2 Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, Beijing 100044,China
  • Received:2020-06-24 Revised:2020-10-04 Online:2021-04-15 Published:2021-04-09
  • About author:DU Shao-hua,born in 1996,postgradua-te.Her main research interests include text mining and so on.(18120357@bjtu.edu.cn)
    WAN Huai-yu,born in 1981,Ph.D,associate professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include social network mining,text mining,user behavior analysis and spatial-temporal data mining.

摘要: 海关商品HS编码分类是企业和个人进出口贸易的重要国际程序。HS编码分类可以看作是一个文本分类问题,即给定一段商品的描述,确定商品由HS编码表示的所属类别。然而,该任务比一般的文本分类任务更具挑战性,原因是商品描述文本具有特定的层次结构,同时商品描述文本展现出了两个层次上的序列特征,并且商品描述文本还存在关键信息分散且描述形式多样的特点。现有的文本分类方法无法综合考虑以上因素来捕获商品描述文本中的关键信息。对此,文中提出了一种融合文本序列和图信息的神经网络(Text Sequence and Graph Information combination Neural Network,TSGINN)模型,用于解决海关商品HS编码分类问题。TSGINN将HS编码分类问题定义为基于词共现网络的子图分类问题,通过图注意力网络建模非连续词之间的关联关系,同时利用分层的长短期记忆网络结合商品文本层次结构捕获多层次的序列信息。在真实海关商品数据集上进行了实验,结果表明TSGINN模型的HS编码分类效果优于其他分类方法。

关键词: HS编码, 多层次序列信息, 海关商品, 图注意力网络, 文本分类

Abstract: Customs commodity HS code classification is an important international procedure for cross-border trade of enterprises and individuals.HS code classification can be regarded as a text classification problem,that is,given a paragraph of description for a commodity,to determine the category of the commodity represented by HS code.However,this task is more challenging than general text classification task.First,commodity description texts are organized with special hierarchical structures.Then commodity description texts present sequential features at two levels.In addition,the key information in the commodity description text is scattered and the description forms are diverse.Most of the existing classification methods cannot comprehensivelyconsiderthe above factors to capture key information in the commodity description text.In this paper,we proposes a Text Sequence and Graph Information combination Neural Network(TSGINN) to solve the problem of customs commodity HS code classification.The TSGINN defines the HS code classification problem as a subgraph classification problem based on word co-occurrence network,models association between non-contiguous words through graph attention network,and captures multi-level sequential information through hierarchical long short-term memory network.Experiments on the real-world customs datasets show that the classification effect of TSGINN model is better than that of other methods.

Key words: Customs commodity, Graph attention network, HS code, Multi-level sequential information, Text classification

中图分类号: 

  • TP391
[1]KIM Y.Convolutional neural networks for sentence classification[C]//Empirical Methods in Natural Language Processing.2014:1746-1751.
[2]ZHANG X,ZHAO J,LECUN Y,et al.Character-level convolutional networks for text classification[C]//Neural Information Processing Systems.2015:649-657.
[3]CONNEAU A,SCHWENK H,BARRAULT L,et al.Very deep convolutional networks for text classification[C]//Conference of the European Chapter of the Association for Computational Linguistics.2017:1107-1116.
[4]JOHNSON R,ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//Meeting of the Association for Computational Linguistics.2017:562-570.
[5]JOULIN A,GRAVE E,BOJANOWSHI P,et al.Bag of tricks for efficient text classification[C]//Conference of the European Chapter of the Association for Computational Linguistics.2017:427-431.
[6]TANG D,QIN B,LIU T,et al.Document modeling with gated recurrent neural network for sentiment classification[C]//Empirical Methods in Natural Language Processing.2015:1422-1432.
[7]LIU P,QIU X,HUANG X.Recurrent neural network for text classification with multi-task learning[C]//International Joint Conference on Artificial Intelligence.2016:2873-2879.
[8]LUO Y.Recurrent neural networks for classifying relations in clinical notes[J].Journal of Biomedical Informatics,2017,72:85-95.
[9]ZHANG Y,LIU Q,SONG L.Sentence-state LSTM for textrepresentation[C]//Meeting of the Association for Computational Linguistics.2018:317-327.
[10]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489.
[11]PAPPAS N,POPESCUBELIS A.Multilingual hierarchical at-tention networks for document classification[C]//International Joint Conference on Natural Language Processing.2017:1015-1025.
[12]FELBO B,MISLOVE A,SOGAARD A,et al.Using millions of emoji occurrences to learn any-domain representations for detecting sentiment,emotion and sarcasm[C]//Empirical Methods in Natural Language Processing.2017:1615-1625.
[13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[14]ZHAO W,YE J,YANG M,et al.Investigating capsule networks with dynamic routing for text classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018.
[15]WANG Y,SUN A,HAN J,et al.Sentiment analysis by capsules[C]//International World Wide Web Conference.2018:1165-1174.
[16]YAO L,MAO C,LUO Y.Graph convolutional networks fortext classification[C]//AAAI Conference on Artificial Intelligence.2019:7370-7377.
[17]LIU X,YOU X,ZHANG X,et al.Tensor graph convolutional networks for text classification[C]//AAAI Conference on Artificial Intelligence.2020.
[18]VELICKOVIC P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]//International Conference on Learning Representations.2018.
[19]GAO H,JI S.Graph u-nets[C]//International Conference on Machine Learning.2019:2083-2092.
[20]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations.2016.
[1] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[2] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[3] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[4] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[5] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[6] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[7] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[8] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[9] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[10] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
[11] 钟桂凤, 庞雄文, 隋栋.
基于Word2Vec和改进注意力机制AlexNet-2的文本分类方法
Text Classification Method Based on Word2Vec and AlexNet-2 with Improved AttentionMechanism
计算机科学, 2022, 49(4): 288-293. https://doi.org/10.11896/jsjkx.211100016
[12] 邓维斌, 朱坤, 李云波, 胡峰.
FMNN:融合多神经网络的文本分类模型
FMNN:Text Classification Model Fused with Multiple Neural Networks
计算机科学, 2022, 49(3): 281-287. https://doi.org/10.11896/jsjkx.210200090
[13] 张虎, 柏萍.
融入句子中远距离词语依赖的图卷积短文本分类方法
Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification
计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062
[14] 曾伟良, 陈漪皓, 姚若愚, 廖睿翔, 孙为军.
时空图注意力网络在交叉口车辆轨迹预测的应用
Application of Spatial-Temporal Graph Attention Networks in Trajectory Prediction for Vehicles at Intersections
计算机科学, 2021, 48(6A): 334-341. https://doi.org/10.11896/jsjkx.200800066
[15] 刘志鑫, 张泽华, 张杰.
基于多层次多视角的图注意力Top-N推荐方法
Top-N Recommendation Method for Graph Attention Based on Multi-level and Multi-view
计算机科学, 2021, 48(4): 104-110. https://doi.org/10.11896/jsjkx.200800027
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!