计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 91-96.
陈伟1,吴友政2,陈文亮1,张民1
CHEN Wei1,WU You-zheng2,CHEN Wen-liang1,ZHANG Min1
摘要: 关键词自动抽取是自然语言处理(Natural Language Processing,NLP)的一项重要任务,给个性化推荐、网购等应用提供了重要的技术支撑。针对关键词自动抽取问题,提出一种新的基于双向长短期记忆网络条件随机场(Bidirectional Long Short-Term Memory Network Conditional Random Field,BiLSTM-CRF)的方法,并将该问题刻画为序列标注问题。首先,该方法通过对输入的文本进行建模,把文本表示为低维高密度的向量;然后,使用分类算法对各个词进行分类;最后,使用CRF对整个标注序列进行解码,得到最终结果。在一个大规模的真实数据中进行实验,结果表明该方法较基准系统性能提高约1个百分点。
中图分类号:
[1]刘知远.基于文档主题结构的关键词抽取方法研究[D].北京:清华大学,2011. [2]MARUJO L,WANG L,TRANCOSO I,et al.Automatic keyword extraction on twitter[C]∥Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers).USA:ACL,2015:637-643. [3]GOLLAPALLI S D,LI X L,YANG P.Incorporating Expert Knowledge into Keyphrase Extraction[C]∥Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17).California:AAAI,2017:3180-3187. [4]TURNEY P D.Learning Algorithms for Keyphrase Extraction[J].Information Retrieval,2000,2(4):303-336. [5]WU W,ZHANG B,OSTENDORF M.Automatic generation of personalized annotation tags for twitter users[C]∥Human Language Technologies:The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics(HLT’10).USA:ACL,2010:689-692. [6]ZHAO W X,JIANG J,HE J,et al.Topical keyphrase extraction from twitter[C]∥Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies(HLT’11).USA:ACL,2011:379-388. [7]BELLAACHIA A,AL-DHELAAN M.Ne-rank:A novel graph-based keyphrase extraction in twitter[C]∥The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology(WI-IAT’12).Washington,DC:IEEE Computer Society,2012:372-379. [8]RILOFF E,LEHNERT W.Information extraction as a basis for high-precision text classification[J].ACM Transactions on Information Systems(TOIS),1994,12(3):296-333. [9]WITTEN I H,PAYNTER G W,FRANK E,et al.Kea:practical automatic keyphrase extraction[C]∥4th ACM Conference on Digital Libraries(DL’99).New York:ACM,1999:254-255. [10]MEDELYAN O,PERRONE V,WITTEN I H.Subject metadata support powered by maui[C]∥10th Annual Joint Conference on Digital Libraries(JCDL’10).New York:ACM,2010:407-408. [11]WANG C,LI S J.Corankbayes:Bayesian learning to rank under the co-training framework and its application in keyphrase extraction[C]∥20th ACM International Conference on Information and Knowledge Management(CIKM’11).New York:ACM,2011:2241-2244. [12]FRANK E,PAYNTER G W,WITTEN I H,et al.Domain-specific Keyphrase Extraction[C]∥Proceedings of IJCAI.California:AAAI,1999:668-673. [13]HULTH A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge[C]∥Proceedings of EMNLP.USA:ACL,2003:216-223. [14]HULTH A,KARLGREN J,JONSSON A,et al.Automatic keyword extraction using domain knowledge[C]∥2nd International Conference on Computational Linguistics and Intelligent Text Processing.Mexico City:Springer-verlag,2001:472-482. [15]KIM S N,KAN M Y.Re-examining automatic keyphrase extraction approaches in scientific articles[C]∥Proceedings of the ACL-IJCNLP Workshop on Multiword Expressions.USA:ACL,2009:9-16. [16]LOPEZ P,ROMARY L.HUMB:Automatic key term extraction from scientific articles in GROBID[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Sweden:ACM,2010:248-251. [17]JIANG X,HU Y H,LI H.A ranking approach to keyphrase extraction[C]∥32nd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2009:756-757. [18]HUANG Z H,XU W,YU K.Bidirectional LSTM-CRF Models for Sequence Tagging(arXiv)(Version1.0)[OL].https://arxiv.org/abs/1508.01991. [19]BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3(6):1137-1155. [20]COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing (almost) from scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537. [21]MIKOLOV T,YIH W T,ZWEIG G.Linguistic regularities in continuous space word representations[C]∥NAACL-HLT.USA:ACL,2013:746-751. [22]LEVY O,GOLDBERG Y,DAGAN I.Improving distributional similarity with lessons learned from word embeddings[J].Transactions of the Association for Computational Linguistics,2015,75(3):211-225. [23]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al. Neural Architectures for Named Entity Recognition (arXiv)(Version3.0)[OL].https://arxiv.org/abs/1603.01360. [24]LAFFERTY F,MCCALLUM A,PEREIRA F.Conditional Random Fields:Probabilistic models for segmenting and labeling sequence data[C]∥Proceedings of ICML-2001.New York:ACM,2001:282-289. |
[1] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[2] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[3] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[4] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
[5] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
[6] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[7] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[8] | 高堰泸, 徐圆, 朱群雄. 基于A-DLSTM夹层网络结构的电能消耗预测方法 Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM 计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006 |
[9] | 丁锋, 孙晓. 基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取 Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF 计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046 |
[10] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[11] | 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法 DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection 计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007 |
[12] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[13] | 裴莹, 李天祥, 王鏖清, 付加胜, 韩霄松. 基于新闻的国际天然气价格趋势预测方法 Prediction Method of International Natural Gas Price Trends Based on News 计算机科学, 2021, 48(6A): 235-239. https://doi.org/10.11896/jsjkx.201000056 |
[14] | 刘嘉琛, 秦小麟, 朱润泽. 基于LSTM-Attention的RFID移动对象位置预测 Prediction of RFID Mobile Object Location Based on LSTM-Attention 计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134 |
[15] | 刘奇, 陈红梅, 罗川. 基于改进的蝗虫优化算法的红细胞供应预测方法 Method for Prediction of Red Blood Cells Supply Based on Improved Grasshopper Optimization Algorithm 计算机科学, 2021, 48(2): 224-230. https://doi.org/10.11896/jsjkx.200600016 |
|