基于BERT和BiLSTM的语义槽填充

doi:10.11896/jsjkx.191200088

Abstract

Abstract: Semantic slot filling is an important task in the dialogue system,which aims to label each word of the input sentence correctly.Slot filling performance has a marked impact on the following dialog management module.At present,random word vector or pretrained word vector is usually used as the initialization word vector of the deep learningmodel used to solveslot filling task.However,the random word vector has no semantic and grammatical information,and the pre-trained word vector only pre-sent one meaning.Both of them cannot provide context-dependent word vector for the model.We proposed an end-to-end neural network model based on pre-trained model BERTand Long Short-Term Memory network(LSTM).First,the pre-trained model(BERT) encoded the input sentence as context-dependentword embedding.After that,the word embedding served as input to subsequent Bidirectional Long Short-Term Memory network(BiLSTM).Andusing the Softmax function and conditional random field to decode prediction labels finally.The pre-trained model BERT and BiLSTM networks were trained as a wholein order to improve the performance of semantic slot filling task.The model achieves F1 scores of 78.74%,87.60% and 71.54% on three data sets(MIT Restaurant Corpus,MIT Movie Corpus and MIT Movie trivial Corpus) respectively.The experimental results show that our model significantly improves the F1 value of Semantic slot filling task.

Key words: Context-dependent, Long short-term memory network, Pre-trained model, Slot filling, Word embedding

CLC Number:

TP391

ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM[J].Computer Science, 2021, 48(1): 247-252.

References

[1] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[2] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[3] HOU L X,LI Y L,LI C C.Review of Research on Task-Oriented Spoken Language Understanding[J].Computer Engineering and Applications,2019,55(11):7-15.
[4] MCCALLUM A,FREITAG D,PEREIRA F C N.MaximumEntropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of International Conference on Machine Learning.2000:591-598.
[5] RAYMOND C,RICCARDI G.Generative and DiscriminativeAlgorithms for Spoken Language Understanding[C]//Procee-dings of Conference of the International Speech Communication Association.2008:1605-1608.
[6] MESNIL G,DAUPHIN Y,YAO K,et al.Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2014,23(3):530-539.
[7] XU P,SARIKAYA R.Convolutional neural network based triangular CRF for joint intent detection and slot filling[C]//Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.IEEE,2013:78-83.
[8] XU Z X,CHE W X,LIU T.Slot filling based on Bi-LSTM-CRF[J].Intelligent Computer and Applications,2017,7(6):91-94.
[9] YAO K,PENG B,ZHANG Y,et al.Spoken language under-standing using long short-term memory neural networks[C]//Proceedings of 2014 IEEE Spoken Language Technology Workshop(SLT).IEEE,2014:189-194.
[10] PENG B,YAO K.Recurrent Neural Networks with ExternalMemory for Language Understanding[C]//Proceedings of Na-tural Language Processing and Chinese Computing.2015:25-35.
[11] VU N T.Sequential Convolutional Neural Networks for SlotFilling in Spoken Language Understanding[C]//Proceedings of 17th Annual Conference of the International Speech Communication Association(ISCA).2016:3250-3254.
[12] KURATA G,XIANG B,ZHOU B,et al.Leveraging Sentence-level Information with Encoder LSTM for Natural Language Understanding[J].arXiv:1601.01530,2016.
[13] LIU B,LANE I.Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding[J].arXiv:1711.11310,2017.
[14] ZHAO L,FENG Z.Improving slot filling in spoken language understanding with joint pointer and attention[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2018:426-431.
[15] KIM H Y,ROH Y H,KIM Y G.Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Student Research Workshop.2019:97-102.
[16] YOO K M,SHIN Y,LEE S.Data Augmentation for Spoken Language Understanding via Joint Variational Generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7402-7409.
[17] SHIN Y,YOO K M,LEE S G.Utterance Generation With Varia-tional Auto-Encoder for Slot Filling in Spoken Language Understanding[J].IEEE Signal Processing Letters,2019,26(3):505-509.
[18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of Advances in Neural Information Processing Systems.2017:5998-6008.
[19] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[J].arXiv:1802.05365,2018.
[20] ZHU Y,KIROS R,ZEMEL R,et al.Aligning books and mo-vies:Towards story-like visual explanations by watching movies and reading books[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:19-27.
[21] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[22] JIN C,LI W H,JI C,et al.Bi-directional Long Short-term Me-mory Neural Networks for Chinese Word[J].Journal of Chinese Information Processing,2018,32(2):29-37.
[23] ZHOU J,XU W.End-to-end learning of semantic role labeling using recurrent neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.2015:1127-1137.

Related Articles 15

[1]	WANG Xin-tong, WANG Xuan, SUN Zhi-xin. Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network [J]. Computer Science, 2022, 49(8): 314-322.
[2]	HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[3]	WANG Shan, XU Chu-yi, SHI Chun-xiang, ZHANG Ying. Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM [J]. Computer Science, 2022, 49(6A): 675-679.
[4]	HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[5]	PAN Zhi-hao, ZENG Bi, LIAO Wen-xiong, WEI Peng-fei, WEN Song. Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2022, 49(3): 294-300.
[6]	LI Yu-qiang, ZHANG Wei-jiang, HUANG Yu, LI Lin, LIU Ai-hua. Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution [J]. Computer Science, 2022, 49(2): 256-264.
[7]	LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[8]	LI Zhao-qi, LI Ta. Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining [J]. Computer Science, 2022, 49(1): 59-64.
[9]	TANG Shi-zheng, ZHANG Yan-feng. DragDL:An Easy-to-Use Graphical DL Model Construction System [J]. Computer Science, 2021, 48(8): 220-225.
[10]	WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga. Text Matching Method Based on Fine-grained Difference Features [J]. Computer Science, 2021, 48(8): 60-65.
[11]	YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng. Approach for Knowledge-driven Similar Bug Report Recommendation [J]. Computer Science, 2021, 48(5): 91-98.
[12]	PENG Bin, LI Zheng, LIU Yong, WU Yong-hao. Automatic Code Comments Generation Method Based on Convolutional Neural Network [J]. Computer Science, 2021, 48(12): 117-124.
[13]	HUANG Xin, LEI Gang, CAO Yuan-long, LU Ming-ming. Review on Interactive Question Answering Techniques Based on Deep Learning [J]. Computer Science, 2021, 48(12): 286-296.
[14]	ZHANG Ning, FANG Jing-wen, ZHAO Yu-xuan. Bitcoin Price Forecast Based on Mixed LSTM Model [J]. Computer Science, 2021, 48(11A): 39-45.
[15]	TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang. Natural Language Interface for Databases with Content-based Table Column Embeddings [J]. Computer Science, 2020, 47(9): 60-66.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Semantic Slot Filling Based on BERT and BiLSTM

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0