Computer Science ›› 2020, Vol. 47 ›› Issue (1): 193-198.doi: 10.11896/jsjkx.181202261

• Artificial Intelligence • Previous Articles     Next Articles

Chinese Short Text Keyphrase Extraction Model Based on Attention

YANG Dan-hao,WU Yue-xin,FAN Chun-xiao   

  1. (School of Electronic Engineering,Beijing University of Posts and Telecommunications,Beijing 100089,China)
  • Received:2018-12-05 Published:2020-01-19
  • About author:YANG Dan-hao,born in 1994,master.His main research interests include natural language processing;FAN Chun-xiao,born in 1962,professor.Her main research interests artificial intelligence and internet of things.

Abstract: Keyphrase extraction technology is a research hotspot in the field of natural language processing.In the current keyphrase extraction algorithm,the deep learning method seldom takes into account the characteristics of Chinese,the information of Chinese character granularity is not fully utilized,and the extraction effect of Chinese short text keyworks still has a large improvement space.In order to improve the effect of the keyphrase extraction for short text,a model for automatic keyphrase extraction abstracts was proposed,namely BAST model,which combines the bidirectional long short-term memory and attention mechanism based on sequence tagging model.Firstly, word vectors in the word granularity and character vectors in the character granularity are used to represent input text information.Secondly,the BAST model is trained,text features are extracted by using BiLSTM and attention mechanism,and the label of each word is classified.Finally,the character vector model is used to correct the extraction results of the word vector model.The experimental results show that the F1-measure of the BAST model reaches 66.93% on 8159 abstract data,which is 2.08% higher than that of the BiLSTM-CRF(Bidirectional Long Shoft-Term Memory and Conditional Random Field) algorithm,and is further improved than other traditional keyphrase extraction algorithms.The innovation of the model lies in the combination of the extraction results of the word vector and the character vector model.The model makes full use of the characteristics of the Chinese text information and can effectively extract keyphrases from the short text,and extraction effect is further improved.

Key words: Attention mechanism, Word embedding, Character embedding, Keyphrase extraction, LSTM

CLC Number: 

  • TP391
