计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 142-145.
徐立
XU Li
摘要: 为提升提取文本关键词的准确性,文中提出了一种文本关键词提取方法。该方法融合词频、词长、词语位置及词性等关键词提取影响因素,提出了候选关键词的权重公式;通过实验获取权重公式的相对最优权重系数;将权重公式应用到TextRank算法的候选关键词得分公式中,以提升提取文本关键词的准确性。通过实验对比了OPW-Text-Rank算法与TextRank算法对单文本关键词提取的准确率、召回率及F值,结果表明,OPW-TextRank算法在窗口大小为6时,提取关键词的准确率高于TextRank算法。在以文本关键词提取为基础的自然语言处理系统中所提算法具有一定的实用性。
中图分类号:
[1]张璐,芦天亮,杜彦辉.基于WMF_LDA主题模型的文本相似度计算[J/OL].计算机应用研究,2019(10):1-8. [2]HASSAINE A,MECHETER S,JAOUA A.Text Categorization Using Hyper Rectangular Keyword Extraction:Application to News Articles Classification[C]∥International Conference on Relational and Algebraic Methods in Computer Science.Springer International Publishing,2015:312-325. [3]曲靖野,陈震,胡轶楠.共词分析与LDA模型分析在文本主题挖掘中的比较研究[J].情报科学,2018,36(2):18-23. [4]ZHANG W N,MING Z Y,ZHANG Y,et al.Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval[C]∥Design Automation and Test in Europe.2015:1-4. [5]夏火松,甄化春.大数据环境下舆情分析与决策支持研究文献综述[J].情报杂志,2015,34(2):1-6,21. [6]SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1987,24(5):513-523. [7]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022. [8]MIHALCEA R,TARAU P.TextRank:Bringing Order into Texts[J].Emnlp,2004:404-411. [9]李鹏,王斌,石志伟,等.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351. [10]ORTEGA F J,VALLEJO C G.STR:A GRAPH-BASED TAGGING TECHNIQUE[J].International Journal on Artificial Intelligence Tools,2011,20(5):955-967. [11]夏天.词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术,2013(9):30-34. [12]顾益军,夏天.融合LDA与TextRank的关键词抽取研究[J].现代图书情报技术,2014(Z1):41-47. [13]杨玥,张德生.中文文本的主题关键短语提取技术[J].计算机科学,2017,44(S2):432-436. [14]张建娥.基于多特征融合的中文文本关键词提取方法[J].情报理论与实践,2013,36(10):105-108. [15]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Networks,2015,61:85-117. [16]CSOMAI A.Keywords in the mist:Automated keyword extraction for very large documents and back of the book indexing[J].Unt Theses & Dissertations,2008. [17]DOSTÀL M,JEZEK K.Automatic Keyphrase Extraction based on NLP and Statistical Methods[C]∥Dateso 2011 International Workshop on Databases,Texts,Specifications and Objects.Pisek,Czech Republic,DBLP,2011:140-145. [18]TIMONEN M,TOIVANEN T,TENG Y,et al.Informative-ness-based Keyword Extraction from Short Documents[C]∥KDIR.2012:411-421. |
[1] | 杨文坤, 原晓佩, 陈小锋, 郭睿. 三维激光雷达点云空间多特征分割 Spatial Multi-feature Segmentation of 3D Lidar Point Cloud 计算机科学, 2022, 49(8): 143-149. https://doi.org/10.11896/jsjkx.210300275 |
[2] | 石先让, 宋廷伦, 唐得志, 戴振泳. 一种新颖的单目视觉深度学习算法:H_SFPN Novel Deep Learning Algorithm for Monocular Vision:H_SFPN 计算机科学, 2021, 48(4): 130-137. https://doi.org/10.11896/jsjkx.200400090 |
[3] | 储杰, 张正军, 汤鑫瑶, 黄振生. 基于加权样本和共识率的标记传播算法 Label Propagation Algorithm Based on Weighted Samples and Consensus-rate 计算机科学, 2021, 48(3): 214-219. https://doi.org/10.11896/jsjkx.191200103 |
[4] | 张天瑞, 魏铭琦, 高秀秀. 基于IPSO-WRF的选择性激光烧结件气泡溶解时间预测模型 Prediction Model of Bubble Dissolution Time in Selective Laser Sintering Based on IPSO-WRF 计算机科学, 2021, 48(11A): 638-643. https://doi.org/10.11896/jsjkx.210300080 |
[5] | 毛湘科, 黄少滨, 余秦勇. 一种基于图的文档关键词和摘要协同抽取方法研究 Graph Based Collaborative Extraction Method for Keywords and Summary from Documents 计算机科学, 2021, 48(10): 44-50. https://doi.org/10.11896/jsjkx.200900082 |
[6] | 朱珍, 黄锐, 臧铁钢, 卢世军. 基于加权近红外图像融合的单幅图像除雾方法 Single Image Defogging Method Based on Weighted Near-InFrared Image Fusion 计算机科学, 2020, 47(8): 241-244. https://doi.org/10.11896/jsjkx.200300068 |
[7] | 宋传鸣, 洪旭, 王相海. 空-频域联合投票的交通视频阴影去除方法 Shadow Removal of Traffic Surveillance Video by Joint Voting in Spatial-Frequency Domain 计算机科学, 2020, 47(5): 129-136. https://doi.org/10.11896/jsjkx.190400040 |
[8] | 朱莹,夏亦犁,裴文江. 基于改进的BEMD的红外与可见光图像融合方法 Fusion of Infrared and Color Visible Images Based on Improved BEMD 计算机科学, 2020, 47(3): 124-129. https://doi.org/10.11896/jsjkx.190100038 |
[9] | 吴甜甜,王洁. 基于可能回答集程序的多Agent信念协调 Belief Coordination for Multi-agent System Based on Possibilistic Answer Set Programming 计算机科学, 2020, 47(2): 201-205. https://doi.org/10.11896/jsjkx.190100101 |
[10] | 古雪梅,刘嘉勇,程芃森,何祥. 基于增强BiLSTM-CRF模型的推文恶意软件名称识别 Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model 计算机科学, 2020, 47(2): 245-250. https://doi.org/10.11896/jsjkx.190500063 |
[11] | 陈庆超, 王韬, 尹世庄, 冯文博. 多级字典存储的未知文本协议候选关键词链式合并方法 Chain Merging Method for Unknown Text Protocol Candidate Keyword Stored in Multi-levelDictionary 计算机科学, 2020, 47(12): 332-335. https://doi.org/10.11896/jsjkx.190900116 |
[12] | 刘志, 曹诗鹏, 沈阳, 杨曦. 基于改进深度强化学习方法的单交叉口信号控制 Signal Control of Single Intersection Based on Improved Deep Reinforcement Learning Method 计算机科学, 2020, 47(12): 226-232. https://doi.org/10.11896/jsjkx.200300021 |
[13] | 张文华, 刘晓鸽, 王沛沛, 刘静静, 程敬亮. 肝脏多b值扩散加权图像的三维配准 3D Registration for Multi-b-value Diffusion Weighted Images of Liver 计算机科学, 2020, 47(11A): 241-243. https://doi.org/10.11896/jsjkx.200400060 |
[14] | 张良成, 王运锋. 动态自适应的多雷达信息加权融合方法 Dynamic Adaptive Multi-radar Tracks Weighted Fusion Method 计算机科学, 2020, 47(11A): 321-326. https://doi.org/10.11896/jsjkx.2004000145 |
[15] | 易玉根, 李世成, 裴洋, 陈磊, 代江艳. 联合多流形结构和自表示的特征选择方法 Feature Selection Method Combined with Multi-manifold Structures and Self-representation 计算机科学, 2020, 47(11A): 474-478. https://doi.org/10.11896/jsjkx.200100037 |
|