计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 247-252.doi: 10.11896/jsjkx.191200088
张玉帅, 赵欢, 李博
ZHANG Yu-shuai, ZHAO Huan, LI Bo
摘要: 语义槽填充是对话系统中一项非常重要的任务,旨在为输入句子的每个单词标注正确的标签,其性能的好坏极大地影响着后续的对话管理模块。目前,使用深度学习方法解决该任务时,一般利用随机词向量或者预训练词向量作为模型的初始化词向量。但是,随机词向量存在不具备语义和语法信息的缺点;预训练词向量存在“一词一义”的缺点,无法为模型提供具备上下文依赖的词向量。针对该问题,提出了一种基于预训练模型BERT和长短期记忆网络的深度学习模型。该模型使用基于Transformer的双向编码表征模型(Bidirectional Encoder Representations from Transformers,BERT)产生具备上下文依赖的词向量,并将其作为双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)的输入,最后利用Softmax函数和条件随机场进行解码。将预训练模型BERT和BiLSTM网络作为整体进行训练,达到了提升语义槽填充任务性能的目的。在MIT Restaurant Corpus,MIT Movie Corpus和MIT Movie trivial Corpus 3个数据集上,所提模型得出了良好的结果,最大F1值分别为78.74%,87.60%和71.54%。实验结果表明,所提模型显著提升了语义槽填充任务的F1值。
中图分类号:
[1] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [2] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [3] HOU L X,LI Y L,LI C C.Review of Research on Task-Oriented Spoken Language Understanding[J].Computer Engineering and Applications,2019,55(11):7-15. [4] MCCALLUM A,FREITAG D,PEREIRA F C N.MaximumEntropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of International Conference on Machine Learning.2000:591-598. [5] RAYMOND C,RICCARDI G.Generative and DiscriminativeAlgorithms for Spoken Language Understanding[C]//Procee-dings of Conference of the International Speech Communication Association.2008:1605-1608. [6] MESNIL G,DAUPHIN Y,YAO K,et al.Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2014,23(3):530-539. [7] XU P,SARIKAYA R.Convolutional neural network based triangular CRF for joint intent detection and slot filling[C]//Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.IEEE,2013:78-83. [8] XU Z X,CHE W X,LIU T.Slot filling based on Bi-LSTM-CRF[J].Intelligent Computer and Applications,2017,7(6):91-94. [9] YAO K,PENG B,ZHANG Y,et al.Spoken language under-standing using long short-term memory neural networks[C]//Proceedings of 2014 IEEE Spoken Language Technology Workshop(SLT).IEEE,2014:189-194. [10] PENG B,YAO K.Recurrent Neural Networks with ExternalMemory for Language Understanding[C]//Proceedings of Na-tural Language Processing and Chinese Computing.2015:25-35. [11] VU N T.Sequential Convolutional Neural Networks for SlotFilling in Spoken Language Understanding[C]//Proceedings of 17th Annual Conference of the International Speech Communication Association(ISCA).2016:3250-3254. [12] KURATA G,XIANG B,ZHOU B,et al.Leveraging Sentence-level Information with Encoder LSTM for Natural Language Understanding[J].arXiv:1601.01530,2016. [13] LIU B,LANE I.Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding[J].arXiv:1711.11310,2017. [14] ZHAO L,FENG Z.Improving slot filling in spoken language understanding with joint pointer and attention[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2018:426-431. [15] KIM H Y,ROH Y H,KIM Y G.Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Student Research Workshop.2019:97-102. [16] YOO K M,SHIN Y,LEE S.Data Augmentation for Spoken Language Understanding via Joint Variational Generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7402-7409. [17] SHIN Y,YOO K M,LEE S G.Utterance Generation With Varia-tional Auto-Encoder for Slot Filling in Spoken Language Understanding[J].IEEE Signal Processing Letters,2019,26(3):505-509. [18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of Advances in Neural Information Processing Systems.2017:5998-6008. [19] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[J].arXiv:1802.05365,2018. [20] ZHU Y,KIROS R,ZEMEL R,et al.Aligning books and mo-vies:Towards story-like visual explanations by watching movies and reading books[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:19-27. [21] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016. [22] JIN C,LI W H,JI C,et al.Bi-directional Long Short-term Me-mory Neural Networks for Chinese Word[J].Journal of Chinese Information Processing,2018,32(2):29-37. [23] ZHOU J,XU W.End-to-end learning of semantic role labeling using recurrent neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.2015:1127-1137. |
[1] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[2] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[3] | 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波. 语义通信系统的性能度量指标分析 Analysis of Performance Metrics of Semantic Communication Systems 计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071 |
[4] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
[5] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
[6] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[7] | 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123 |
[8] | 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157 |
[9] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[10] | 高堰泸, 徐圆, 朱群雄. 基于A-DLSTM夹层网络结构的电能消耗预测方法 Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM 计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006 |
[11] | 侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006 |
[12] | 刘凯, 张宏军, 陈飞琼. 基于领域适应嵌入的军事命名实体识别 Name Entity Recognition for Military Based on Domain Adaptive Embedding 计算机科学, 2022, 49(1): 292-297. https://doi.org/10.11896/jsjkx.201100007 |
[13] | 汤世征, 张岩峰. DragDL:一种易用的深度学习模型可视化构建系统 DragDL:An Easy-to-Use Graphical DL Model Construction System 计算机科学, 2021, 48(8): 220-225. https://doi.org/10.11896/jsjkx.200900045 |
[14] | 王胜, 张仰森, 陈若愚, 向尕. 基于细粒度差异特征的文本匹配方法 Text Matching Method Based on Fine-grained Difference Features 计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008 |
[15] | 杨进才, 曹元, 胡泉, 沈显君. 基于Transformer模型与关系词特征的汉语因果类复句关系自动识别 Relation Classification of Chinese Causal Compound Sentences Based on Transformer Model and Relational Word Feature 计算机科学, 2021, 48(6A): 295-298. https://doi.org/10.11896/jsjkx.200500019 |
|