计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 97-103.doi: 10.11896/jsjkx.200900053
杜少华1, 万怀宇1, 武志昊1,2, 林友芳1,2
DU Shao-hua1, WAN Huai-yu1, WU Zhi-hao1,2, LIN You-fang1,2
摘要: 海关商品HS编码分类是企业和个人进出口贸易的重要国际程序。HS编码分类可以看作是一个文本分类问题,即给定一段商品的描述,确定商品由HS编码表示的所属类别。然而,该任务比一般的文本分类任务更具挑战性,原因是商品描述文本具有特定的层次结构,同时商品描述文本展现出了两个层次上的序列特征,并且商品描述文本还存在关键信息分散且描述形式多样的特点。现有的文本分类方法无法综合考虑以上因素来捕获商品描述文本中的关键信息。对此,文中提出了一种融合文本序列和图信息的神经网络(Text Sequence and Graph Information combination Neural Network,TSGINN)模型,用于解决海关商品HS编码分类问题。TSGINN将HS编码分类问题定义为基于词共现网络的子图分类问题,通过图注意力网络建模非连续词之间的关联关系,同时利用分层的长短期记忆网络结合商品文本层次结构捕获多层次的序列信息。在真实海关商品数据集上进行了实验,结果表明TSGINN模型的HS编码分类效果优于其他分类方法。
中图分类号:
[1]KIM Y.Convolutional neural networks for sentence classification[C]//Empirical Methods in Natural Language Processing.2014:1746-1751. [2]ZHANG X,ZHAO J,LECUN Y,et al.Character-level convolutional networks for text classification[C]//Neural Information Processing Systems.2015:649-657. [3]CONNEAU A,SCHWENK H,BARRAULT L,et al.Very deep convolutional networks for text classification[C]//Conference of the European Chapter of the Association for Computational Linguistics.2017:1107-1116. [4]JOHNSON R,ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//Meeting of the Association for Computational Linguistics.2017:562-570. [5]JOULIN A,GRAVE E,BOJANOWSHI P,et al.Bag of tricks for efficient text classification[C]//Conference of the European Chapter of the Association for Computational Linguistics.2017:427-431. [6]TANG D,QIN B,LIU T,et al.Document modeling with gated recurrent neural network for sentiment classification[C]//Empirical Methods in Natural Language Processing.2015:1422-1432. [7]LIU P,QIU X,HUANG X.Recurrent neural network for text classification with multi-task learning[C]//International Joint Conference on Artificial Intelligence.2016:2873-2879. [8]LUO Y.Recurrent neural networks for classifying relations in clinical notes[J].Journal of Biomedical Informatics,2017,72:85-95. [9]ZHANG Y,LIU Q,SONG L.Sentence-state LSTM for textrepresentation[C]//Meeting of the Association for Computational Linguistics.2018:317-327. [10]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489. [11]PAPPAS N,POPESCUBELIS A.Multilingual hierarchical at-tention networks for document classification[C]//International Joint Conference on Natural Language Processing.2017:1015-1025. [12]FELBO B,MISLOVE A,SOGAARD A,et al.Using millions of emoji occurrences to learn any-domain representations for detecting sentiment,emotion and sarcasm[C]//Empirical Methods in Natural Language Processing.2017:1615-1625. [13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [14]ZHAO W,YE J,YANG M,et al.Investigating capsule networks with dynamic routing for text classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018. [15]WANG Y,SUN A,HAN J,et al.Sentiment analysis by capsules[C]//International World Wide Web Conference.2018:1165-1174. [16]YAO L,MAO C,LUO Y.Graph convolutional networks fortext classification[C]//AAAI Conference on Artificial Intelligence.2019:7370-7377. [17]LIU X,YOU X,ZHANG X,et al.Tensor graph convolutional networks for text classification[C]//AAAI Conference on Artificial Intelligence.2020. [18]VELICKOVIC P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]//International Conference on Learning Representations.2018. [19]GAO H,JI S.Graph u-nets[C]//International Conference on Machine Learning.2019:2083-2092. [20]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations.2016. |
[1] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[2] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[3] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[4] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[5] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[6] | 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018 |
[7] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[8] | 邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089 |
[9] | 邓朝阳, 仲国强, 王栋. 基于注意力门控图神经网络的文本分类 Text Classification Based on Attention Gated Graph Neural Network 计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218 |
[10] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[11] | 钟桂凤, 庞雄文, 隋栋. 基于Word2Vec和改进注意力机制AlexNet-2的文本分类方法 Text Classification Method Based on Word2Vec and AlexNet-2 with Improved AttentionMechanism 计算机科学, 2022, 49(4): 288-293. https://doi.org/10.11896/jsjkx.211100016 |
[12] | 邓维斌, 朱坤, 李云波, 胡峰. FMNN:融合多神经网络的文本分类模型 FMNN:Text Classification Model Fused with Multiple Neural Networks 计算机科学, 2022, 49(3): 281-287. https://doi.org/10.11896/jsjkx.210200090 |
[13] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[14] | 曾伟良, 陈漪皓, 姚若愚, 廖睿翔, 孙为军. 时空图注意力网络在交叉口车辆轨迹预测的应用 Application of Spatial-Temporal Graph Attention Networks in Trajectory Prediction for Vehicles at Intersections 计算机科学, 2021, 48(6A): 334-341. https://doi.org/10.11896/jsjkx.200800066 |
[15] | 刘志鑫, 张泽华, 张杰. 基于多层次多视角的图注意力Top-N推荐方法 Top-N Recommendation Method for Graph Attention Based on Multi-level and Multi-view 计算机科学, 2021, 48(4): 104-110. https://doi.org/10.11896/jsjkx.200800027 |
|