Computer Science ›› 2019, Vol. 46 ›› Issue (6A): 56-59.

Automatic Extraction of Diversity Keyphrase by Utilizing Integer Liner Programming

LI Shan-shan, CHEN Li, TANG Yu-ting, WANG Yi-lin, YU Zhong-hua   

  1. College of Computer Science,Sichuan University,Chengdu 610065,China
Abstract: Keyphrases are the concise summary of text information,which can represent the main topics and the core ideas of texts.And the automatic extraction of key phrases is one of the important tasks for natural language processing and information retrieval.Aiming at the existing problem caused by semantic over-generation on candidate phrases with unsupervised method,this paper proposed an algorithm for automaticextraction of keyphrase by using integer linear programming (ILP) and similarity of candidate phrases,in which candidate phrases with high sematic similarity are punished for maximizing the object function to obtain diversified keyphrases.TextRand and TFIDF algorithms are applied in the proposed method to create candidate phrases based on two different corpus sets and the proposedoptimization algorithm is utilized to optimize the weight scores of candidate phrases.Finally,the results of the proposed optimization algorithm is compared with the ones of baseline methods,and the experimental results show that the proposed method can solve the semantic over-generation problem effectively by punishing candidate phrases with high semantic similarity.Moreover,the optimization algorithm can obtain more diverse keyphrases and the optimized results of P,R and F value outperform the ones of baseline methods.

Key words: Automatic keyphrase extraction, Integer liner programming, Semantic over-generation, Diversity

