计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 247-252.doi: 10.11896/jsjkx.191200088

• 人工智能 • 上一篇    下一篇

基于BERT和BiLSTM的语义槽填充

张玉帅, 赵欢, 李博   

  1. 湖南大学信息科学与工程学院 长沙 410082
  • 收稿日期:2019-12-13 修回日期:2020-05-01 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 赵欢(hzhao@hnu.edu.cn)
  • 作者简介:zhangyushuai@hnu.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFC0831800)

Semantic Slot Filling Based on BERT and BiLSTM

ZHANG Yu-shuai, ZHAO Huan, LI Bo   

  1. College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China
  • Received:2019-12-13 Revised:2020-05-01 Online:2021-01-15 Published:2021-01-15
  • About author:ZHANG Yu-shuai,born in 1993,master,is a member of China Computer Federation.His main research interest is nature language processing.
    ZHAO Huan,born in 1967,Ph.D,professor,is a member of China Computer Federation.Her main research interests include speech information processing,nature language processing and intelligent computing.
  • Supported by:
    National Key R&D Project Program(2018YFC0831800).

摘要: 语义槽填充是对话系统中一项非常重要的任务,旨在为输入句子的每个单词标注正确的标签,其性能的好坏极大地影响着后续的对话管理模块。目前,使用深度学习方法解决该任务时,一般利用随机词向量或者预训练词向量作为模型的初始化词向量。但是,随机词向量存在不具备语义和语法信息的缺点;预训练词向量存在“一词一义”的缺点,无法为模型提供具备上下文依赖的词向量。针对该问题,提出了一种基于预训练模型BERT和长短期记忆网络的深度学习模型。该模型使用基于Transformer的双向编码表征模型(Bidirectional Encoder Representations from Transformers,BERT)产生具备上下文依赖的词向量,并将其作为双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)的输入,最后利用Softmax函数和条件随机场进行解码。将预训练模型BERT和BiLSTM网络作为整体进行训练,达到了提升语义槽填充任务性能的目的。在MIT Restaurant Corpus,MIT Movie Corpus和MIT Movie trivial Corpus 3个数据集上,所提模型得出了良好的结果,最大F1值分别为78.74%,87.60%和71.54%。实验结果表明,所提模型显著提升了语义槽填充任务的F1值。

关键词: 长短期记忆网络, 词向量, 上下文依赖, 语义槽填充, 预训练模型

Abstract: Semantic slot filling is an important task in the dialogue system,which aims to label each word of the input sentence correctly.Slot filling performance has a marked impact on the following dialog management module.At present,random word vector or pretrained word vector is usually used as the initialization word vector of the deep learningmodel used to solveslot filling task.However,the random word vector has no semantic and grammatical information,and the pre-trained word vector only pre-sent one meaning.Both of them cannot provide context-dependent word vector for the model.We proposed an end-to-end neural network model based on pre-trained model BERTand Long Short-Term Memory network(LSTM).First,the pre-trained model(BERT) encoded the input sentence as context-dependentword embedding.After that,the word embedding served as input to subsequent Bidirectional Long Short-Term Memory network(BiLSTM).Andusing the Softmax function and conditional random field to decode prediction labels finally.The pre-trained model BERT and BiLSTM networks were trained as a wholein order to improve the performance of semantic slot filling task.The model achieves F1 scores of 78.74%,87.60% and 71.54% on three data sets(MIT Restaurant Corpus,MIT Movie Corpus and MIT Movie trivial Corpus) respectively.The experimental results show that our model significantly improves the F1 value of Semantic slot filling task.

Key words: Context-dependent, Long short-term memory network, Pre-trained model, Slot filling, Word embedding

中图分类号: 

  • TP391
[1] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[2] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[3] HOU L X,LI Y L,LI C C.Review of Research on Task-Oriented Spoken Language Understanding[J].Computer Engineering and Applications,2019,55(11):7-15.
[4] MCCALLUM A,FREITAG D,PEREIRA F C N.MaximumEntropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of International Conference on Machine Learning.2000:591-598.
[5] RAYMOND C,RICCARDI G.Generative and DiscriminativeAlgorithms for Spoken Language Understanding[C]//Procee-dings of Conference of the International Speech Communication Association.2008:1605-1608.
[6] MESNIL G,DAUPHIN Y,YAO K,et al.Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2014,23(3):530-539.
[7] XU P,SARIKAYA R.Convolutional neural network based triangular CRF for joint intent detection and slot filling[C]//Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.IEEE,2013:78-83.
[8] XU Z X,CHE W X,LIU T.Slot filling based on Bi-LSTM-CRF[J].Intelligent Computer and Applications,2017,7(6):91-94.
[9] YAO K,PENG B,ZHANG Y,et al.Spoken language under-standing using long short-term memory neural networks[C]//Proceedings of 2014 IEEE Spoken Language Technology Workshop(SLT).IEEE,2014:189-194.
[10] PENG B,YAO K.Recurrent Neural Networks with ExternalMemory for Language Understanding[C]//Proceedings of Na-tural Language Processing and Chinese Computing.2015:25-35.
[11] VU N T.Sequential Convolutional Neural Networks for SlotFilling in Spoken Language Understanding[C]//Proceedings of 17th Annual Conference of the International Speech Communication Association(ISCA).2016:3250-3254.
[12] KURATA G,XIANG B,ZHOU B,et al.Leveraging Sentence-level Information with Encoder LSTM for Natural Language Understanding[J].arXiv:1601.01530,2016.
[13] LIU B,LANE I.Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding[J].arXiv:1711.11310,2017.
[14] ZHAO L,FENG Z.Improving slot filling in spoken language understanding with joint pointer and attention[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2018:426-431.
[15] KIM H Y,ROH Y H,KIM Y G.Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Student Research Workshop.2019:97-102.
[16] YOO K M,SHIN Y,LEE S.Data Augmentation for Spoken Language Understanding via Joint Variational Generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7402-7409.
[17] SHIN Y,YOO K M,LEE S G.Utterance Generation With Varia-tional Auto-Encoder for Slot Filling in Spoken Language Understanding[J].IEEE Signal Processing Letters,2019,26(3):505-509.
[18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of Advances in Neural Information Processing Systems.2017:5998-6008.
[19] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[J].arXiv:1802.05365,2018.
[20] ZHU Y,KIROS R,ZEMEL R,et al.Aligning books and mo-vies:Towards story-like visual explanations by watching movies and reading books[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:19-27.
[21] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[22] JIN C,LI W H,JI C,et al.Bi-directional Long Short-term Me-mory Neural Networks for Chinese Word[J].Journal of Chinese Information Processing,2018,32(2):29-37.
[23] ZHOU J,XU W.End-to-end learning of semantic role labeling using recurrent neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.2015:1127-1137.
[1] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[2] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[3] 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波.
语义通信系统的性能度量指标分析
Analysis of Performance Metrics of Semantic Communication Systems
计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071
[4] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[5] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
[6] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[7] 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀.
基于BERT-GRU-ATT模型的中文实体关系分类
Chinese Entity Relations Classification Based on BERT-GRU-ATT
计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[8] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[9] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
[10] 高堰泸, 徐圆, 朱群雄.
基于A-DLSTM夹层网络结构的电能消耗预测方法
Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM
计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006
[11] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[12] 刘凯, 张宏军, 陈飞琼.
基于领域适应嵌入的军事命名实体识别
Name Entity Recognition for Military Based on Domain Adaptive Embedding
计算机科学, 2022, 49(1): 292-297. https://doi.org/10.11896/jsjkx.201100007
[13] 汤世征, 张岩峰.
DragDL:一种易用的深度学习模型可视化构建系统
DragDL:An Easy-to-Use Graphical DL Model Construction System
计算机科学, 2021, 48(8): 220-225. https://doi.org/10.11896/jsjkx.200900045
[14] 王胜, 张仰森, 陈若愚, 向尕.
基于细粒度差异特征的文本匹配方法
Text Matching Method Based on Fine-grained Difference Features
计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008
[15] 杨进才, 曹元, 胡泉, 沈显君.
基于Transformer模型与关系词特征的汉语因果类复句关系自动识别
Relation Classification of Chinese Causal Compound Sentences Based on Transformer Model and Relational Word Feature
计算机科学, 2021, 48(6A): 295-298. https://doi.org/10.11896/jsjkx.200500019
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!