Computer Science ›› 2013, Vol. 40 ›› Issue (12): 264-269.

Previous Articles     Next Articles

Extraction Algorithm Based on Semantic Expansion Integrated with Lexical Chain

LIU Duan-yang and WANG Liang-fang   

  • Online:2018-11-16 Published:2018-11-16

Abstract: For the difficulties that affect the quality of keywords extraction,such as the phenomenon of polysemy,synonyms as well as the accurate and comprehensive expression of the subjects in the text,a method named KESELC based on the semantics of keyword extraction was proposed.By calculating semantic similarity and semantic relevancy based on the tongyici cilin and statistical information,then the concept of semantic expansion and its calculation method were proposed.By combining semantic expansion with lexical chain,it made the text processing in terms of preprocess,polysemy disambiguation,synonym mergence,the construction of lexical chains,feature selection and improvement of weights computation.The extracted keywords not only avoid a redundant expression,but also cover the subjects of the article accurately and comprehensively.The experimental results show that the method of keyword extraction based on KESELC has better performance than the ones based on TFIDF and Lexical chain, and has a certain practical value.

Key words: Tongyici cilin,Semantic expansion,Lexical chain,Keyword extraction,Semantic analysis

[1] Bao Hong,Deng Zhen.An extended keyword extraction method[C]∥Proceedings of the 2012International Conference on Applied Physics and Industrial Engineering.USA:Elsevier,2012:1120-1127
[2] 李霞,李战怀,张利军,等.MXDR:一种基于关键字的XML多文档分布式检索方法[J].计算机科学,2011,8(10):152-156
[3] 郑斐然,苗夺谦,张志飞,等.一种中文微博新闻话题检测的方法[J].计算机科学,2012,9(1):138-141
[4] G′abor B,Rich′ard F.SZTERGAK:Feature engineering forkeyphrase extraction[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Sweden:ACM,2010:186-189
[5] Witten I H,Paynter G W,Frank E,et al.KEA:Practical automatic keyphrase extraction[C]∥Proceedings of the 4th ACM Conference on Digital Libraries.Berkeley,California,US:ACM,1999:254-256
[6] Lopez P,Romary L.HUMB:automatic key term extraction from scientific articles in GROBID[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Uppsala,Sweden:ACM,2010:248-251
[7] 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859
[8] 方俊,郭雷,王晓东.基于语义的关键词提取算法[J].计算机科学,2008,35(6):148-151
[9] Meng Wen-chao,Liu Lian-chen,Dai Ting.A modified approach to keyword extraction based on word-similarity[C]∥Procee-dings of the 2009IEEE International Conference on Intelligent Computing and Intelligent Systems(ICIS).Shanghai,China:IEEE,2009:388-392
[10] Li Gang,Dai Qiang-bin,Wei Quan.A new approach to compute semantic relevance of Chinese words[C]∥Proceedings of the 2010IEEE International Conference on Artificial Intelligence and Education (ICAIE).Wuhan,China:IEEE,2010:610-613
[11] 聂卉,龙朝辉.结合语义相似度与相关度的概念扩展[J].情报学报,2007,6(5):728-732
[12] LI Xing-hua,WU Xin-dong,HU Xue-gang,et al.Keyword extraction based on lexical chains and word co-occurrence for Chinese news Web pages[C]∥Proceedings of the 2008IEEE International Conference on Data Mining Workshops.Pisa,Italy:IEEE,2008:744-751
[13] 梅家驹,竺一鸣,高蕴琦,等.同义词词林[M].上海:上海辞书出版社,1993:106-108
[14] 陆洋.基于语义分析的文本挖掘研究[D].杭州:浙江工业大学,2011
[15] Institute of Computing Technology,Chinese Academy of Sci-ences.ICTCLAS [EB/OL].http://ictclas.org/index.html,2012-04-01
[16] 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报:信息科学版,2010,28(6):603-608
[17] Satanjeev B,Ted P.Extended gloss overlaps as a measure of semantic relatedness[C]∥Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence.Acapulco,Mexico:Aminer,2003:805-810
[18] Jane M,Graeme H.Lexical cohesion computed by thesaural relations as an indicator of the structure of text[J].Computational Linguistics,1991,17(1):21-48
[19] Li Rong-lu.Fudan university text corpus [DB/OL].http://www.nlp.org.cn/docs/doclist.php?cat_id=16&type=15,2012-04-01

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!