计算机科学 ›› 2013, Vol. 40 ›› Issue (12): 264-269.
刘端阳,王良芳
LIU Duan-yang and WANG Liang-fang
摘要: 针对影响关键词提取质量的一词多义现象、同义词现象以及文章主题准确全面表达的难点,提出了一种基于语义的关键词提取算法KESELC,利用《同义词词林》语义词典和统计信息计算语义相似度和相关度,进而得出语义扩展度及其计算方法,将语义扩展度和词汇链方法相结合,对文本分别作预处理、多义词词义消歧、同义词合并、词汇链构建、有效特征选取及对权重综合计算的处理,提取出的关键词不仅避免了同义词冗余表达,而且较准确全面地覆盖文本的主题。通过实验对比分析,验证了基于KESELC的方法比基于TFIDF的方法以及基于词汇链的方法具有较优的提取效果,具有一定的实际应用价值。
[1] Bao Hong,Deng Zhen.An extended keyword extraction method[C]∥Proceedings of the 2012International Conference on Applied Physics and Industrial Engineering.USA:Elsevier,2012:1120-1127 [2] 李霞,李战怀,张利军,等.MXDR:一种基于关键字的XML多文档分布式检索方法[J].计算机科学,2011,8(10):152-156 [3] 郑斐然,苗夺谦,张志飞,等.一种中文微博新闻话题检测的方法[J].计算机科学,2012,9(1):138-141 [4] G′abor B,Rich′ard F.SZTERGAK:Feature engineering forkeyphrase extraction[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Sweden:ACM,2010:186-189 [5] Witten I H,Paynter G W,Frank E,et al.KEA:Practical automatic keyphrase extraction[C]∥Proceedings of the 4th ACM Conference on Digital Libraries.Berkeley,California,US:ACM,1999:254-256 [6] Lopez P,Romary L.HUMB:automatic key term extraction from scientific articles in GROBID[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Uppsala,Sweden:ACM,2010:248-251 [7] 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859 [8] 方俊,郭雷,王晓东.基于语义的关键词提取算法[J].计算机科学,2008,35(6):148-151 [9] Meng Wen-chao,Liu Lian-chen,Dai Ting.A modified approach to keyword extraction based on word-similarity[C]∥Procee-dings of the 2009IEEE International Conference on Intelligent Computing and Intelligent Systems(ICIS).Shanghai,China:IEEE,2009:388-392 [10] Li Gang,Dai Qiang-bin,Wei Quan.A new approach to compute semantic relevance of Chinese words[C]∥Proceedings of the 2010IEEE International Conference on Artificial Intelligence and Education (ICAIE).Wuhan,China:IEEE,2010:610-613 [11] 聂卉,龙朝辉.结合语义相似度与相关度的概念扩展[J].情报学报,2007,6(5):728-732 [12] LI Xing-hua,WU Xin-dong,HU Xue-gang,et al.Keyword extraction based on lexical chains and word co-occurrence for Chinese news Web pages[C]∥Proceedings of the 2008IEEE International Conference on Data Mining Workshops.Pisa,Italy:IEEE,2008:744-751 [13] 梅家驹,竺一鸣,高蕴琦,等.同义词词林[M].上海:上海辞书出版社,1993:106-108 [14] 陆洋.基于语义分析的文本挖掘研究[D].杭州:浙江工业大学,2011 [15] Institute of Computing Technology,Chinese Academy of Sci-ences.ICTCLAS [EB/OL].http://ictclas.org/index.html,2012-04-01 [16] 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报:信息科学版,2010,28(6):603-608 [17] Satanjeev B,Ted P.Extended gloss overlaps as a measure of semantic relatedness[C]∥Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence.Acapulco,Mexico:Aminer,2003:805-810 [18] Jane M,Graeme H.Lexical cohesion computed by thesaural relations as an indicator of the structure of text[J].Computational Linguistics,1991,17(1):21-48 [19] Li Rong-lu.Fudan university text corpus [DB/OL].http://www.nlp.org.cn/docs/doclist.php?cat_id=16&type=15,2012-04-01 |
No related articles found! |
|