计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 56-59.
李珊珊, 陈黎, 唐裕婷, 王艺霖, 于中华
LI Shan-shan, CHEN Li, TANG Yu-ting, WANG Yi-lin, YU Zhong-hua
摘要: 关键短语是文本信息的精简概括,能够代表文本的主题和核心观点。而关键短语的自动抽取更是自然语言处理和信息检索的重要任务之一。针对目前无监督方法自动抽取关键短语存在过度生成候选短语语义的问题,提出了一种将整数线性规划和短语语义相似度相结合的自动抽取算法。通过惩罚语义相似度高的候选短语实现目标函数的最大化,以此形成多样性的关键短语。实验利用TextRank和TFIDF算法在两种不同的语料集中分别产生候选短语,并利用提出的优化算法对候选短语的权值得分进行优化。最后将所提算法产生的优化结果与现有多个算法的结果进行了比较。实验结果表明,通过加入相似性度量的惩罚能够有效解决语义过度问题,并获取更多样的关键短语,其优化结果的P,R和F值均高于其他算法。
中图分类号:
[1]BARKER K,CORNACCHIA N.Using Noun Phrase Heads to Extract Document Keyphrases[C]∥Advances in Artificial Intelligence.Springer Berlin Heidelberg,2000:40-52. [2]EKMAN P.An argument for basic emotions[J].Cogition and emotion,1992,6(3-4):169-200. [3]CARAGEA C,BULGAROV F A,GODEA A,et al.Citation-Enhanced Keyphrase Extraction from Research Papers:A Supervised Approach[C]∥Conference on Empirical Methods in Natural Language Processing.2014:1435-1446. [4]MIHALCEA R,TARAU P.TextRank:Bringing Order into Texts[M].Emnlp,2004:404-411. [5]WAN X,XIAO J.CollabRank:Towards a Collaborative Ap-proach to SingleDocumentKeyphrase Extraction[C]∥Proceedings of the Conference International Conference on Computational Linguistics,COLING 2008.Manchester,Uk.DBLP,2008:969-976. [6]LIU Z,LIANG C,SUN M.Topical Word Trigger Model for Keyphrase Extraction[C]∥COLING.2012:1715-1730. [7]NGUYEN T D,KAN M Y.Keyphrase extraction in scientific publications[C]∥International Conference on Asian Digital Libraries:Looking Back 10 Years and Forging New Frontiers.Springer-Verlag,2007:317-326. [8]TOMOKIYO T,HURST M.A language model approach to keyphrase extraction[C]∥ACL 2003 Workshop on Multiword Expressions:Analysis,Acquisition and Treatment.Association for Computational Linguistics,2003:33-40. [9]HASAN K S,NG V.Automatic Keyphrase Extraction:A Survey of the State of the Art[C]∥Meeting of the Association for Computational Linguistics.2014:1262-1273. [10]WITTEN I H,PAYNTER G W,FRANK E,et al.KEA:practical automatic keyphrase extraction[C]∥ACM Conference on Digital Libraries.Berkeley,CA,USA.CiteSeer,1999:254-255. [11]TURNEY P D.Coherent keyphrase extraction via Web mining [C]∥International Joint Conference on Artificial Intelligence.Morgan Kaufmann Publishers Inc,2003:434-439. [12]FLORESCU C,CARAGEA C.A Position-Biased PageRank Algorithm for Keyphrase Extraction[C]∥Proceedings of the American Association for Artificial Intelligence.2017. [13]HASAN K S,NG V.Automatic Keyphrase Extraction:A Survey of the State of the Art[C]∥Meeting of the Association for Computational Linguistics.2014:1262-1273. [14]BOUDIN F.Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming[C]∥ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction.China,2015. [15]HULTH A.Improved automatic keyword extraction given more linguistic knowledge[C]∥Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2003:216-223. [16]KIM S N,MEDELYANO,KAN M Y,et al.SemEval-2010 task 5:Automatic keyphrase extraction from scientific articles[C]∥International Workshop on Semantic Evaluation.Association for Computational Linguistics,2010:21-26. [17]LE T T N,NGUYEN M L,SHIMAZU A.Unsupervised Keyphrase Extraction:Introducing New Kinds of Words to Keyphrases[C]∥Australasian Joint Conference on Artificial Intelligence.Springer International Publishing,2016:665-671. |
[1] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[2] | 陈壮, 邹海涛, 郑尚, 于化龙, 高尚. 基于用户覆盖及评分差异的多样性推荐算法 Diversity Recommendation Algorithm Based on User Coverage and Rating Differences 计算机科学, 2022, 49(5): 159-164. https://doi.org/10.11896/jsjkx.210300263 |
[3] | 刘意, 毛莺池, 程杨堃, 高建, 王龙宝. 基于邻域一致性的异常检测序列集成方法 Locality and Consistency Based Sequential Ensemble Method for Outlier Detection 计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156 |
[4] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[5] | 朱丽花, 王玲, 唐麒, 魏急波. 一种针对动态部分可重构SoC软硬件划分的高效MILP模型 Efficient MILP Model for HW/SW Partitioning of Dynamic Partial Reconfigurable SoC 计算机科学, 2020, 47(4): 18-24. https://doi.org/10.11896/jsjkx.190300001 |
[6] | 张艳红, 张春光, 周湘贞, 王怡鸥. 项目多属性模糊联合的多样性视频推荐算法 Diverse Video Recommender Algorithm Based on Multi-property Fuzzy Aggregate of Items 计算机科学, 2019, 46(8): 78-83. https://doi.org/10.11896/j.issn.1002-137X.2019.08.012 |
[7] | 张学扶, 曾攀, 金敏. 相关性和相似度联合的癌症分类预测 Cancer Classification Prediction Model Based on Correlation and Similarity 计算机科学, 2019, 46(7): 300-307. https://doi.org/10.11896/j.issn.1002-137X.2019.07.046 |
[8] | 关晓蔷, 庞继芳, 梁吉业. 基于类别随机化的随机森林算法 Randomization of Classes Based Random Forest Algorithm 计算机科学, 2019, 46(2): 196-201. https://doi.org/10.11896/j.issn.1002-137X.2019.02.030 |
[9] | 常啸林, 樊永文, 朱维军, 刘洋. 基于拟态防御的管理信息系统 Management Information System Based on Mimic Defense 计算机科学, 2019, 46(11A): 438-441. |
[10] | 石进平,李劲,和凤珍. 基于社交关系和用户偏好的多样性图推荐方法 Diversity Recommendation Approach Based on Social Relationship and User Preference 计算机科学, 2018, 45(6A): 423-427. |
[11] | 庞博,金乾坤,合尼古力·吾买尔,齐兴斌. 软件定义网络中基于网络切片和ILP模型的路由方案 Routing Scheme Based on Network Slicing and ILP Model in SDN 计算机科学, 2018, 45(4): 143-147. https://doi.org/10.11896/j.issn.1002-137X.2018.04.023 |
[12] | 单天羽, 管煜旸. 基于种群多样性的可变种群缩减差分进化算法 Differential Evolution Algorithm with Adaptive Population Size Reduction Based on Population Diversity 计算机科学, 2018, 45(11A): 160-166. |
[13] | 曹敏姿, 张琳琳, 毕雪华, 赵楷. 个性化(α,l)-多样性k-匿名隐私保护模型 Personalized (α,l)-diversity k-anonymity Model for Privacy Preservation 计算机科学, 2018, 45(11): 180-186. https://doi.org/10.11896/j.issn.1002-137X.2018.11.028 |
[14] | 何旭,景小宁,冯超,程越. 基于种群多样性的FPSO算法在空中加油区域配置中的应用 Diversity-guided FPSO Algorithm for Solving Air Refueling Region Deplaying Problem 计算机科学, 2017, 44(Z11): 547-551. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.116 |
[15] | 焦重阳,周清雷,张文宁. 混合拓扑结构的粒子群算法及其在测试数据生成中的应用研究 MPSO and Its Application in Test Data Automatic Generation 计算机科学, 2017, 44(12): 249-254. https://doi.org/10.11896/j.issn.1002-137X.2017.12.045 |
|