Computer Science ›› 2018, Vol. 45 ›› Issue (11A): 417-421.

• Big Data & Data Mining • Previous Articles     Next Articles

Word Clustering Based Text Semantic Tagging Extraction Method

LI Xiong, DING Zhi-ming, SU Xing, GUO Li-min   

  1. Department of Information,Beijing University of Technology,Beijing 100124,China
  • Online:2019-02-26 Published:2019-02-26

Abstract: This research mainly solves the problem of extracting key semantic information from a large number of text data.Text is the information carrier of the natural language.When the text information is analyzed and processed,the characteristics of text messages are different,due to different goals and methods.In the past,the semantic tagging extraction method is usually focused on the single text,but the semantic relationships between different texts are ignored.To this end,this paper proposed a text semantic tagging extraction method based on word clustering.The proposed method is based on semantic tagging extraction processing target,which employs a distributed Hinton representation hypothesis to express text information,and uses word clustering algorithm to maximize the semantic tagging and semantic similarity between the original text data.Experiments show that since the method involves all vocabularies in the cluster computing,the semantic richness and power of information expression of the proposed method outperform many existing methods.

Key words: Clustering, Distributed representation hypothesis, Semantic extraction, Similarity

CLC Number: 

  • TP391
[1]文继军,王珊.SEEKER:基于关键词的关系数据库信息检索[J].软件学报,2005,16(7):1270-1281.
[2]张阔,李涓子,吴刚,等.基于关键词元的话题内事件检测[J].计算机研究与发展,2009,46(2):245-252.
[3]李峰,黄金柱,李舟军,等.使用关键词扩展的新闻文本自动摘要方法[J].计算机科学与探索,2016,10(3):373-380.
[4]吴舜尧,邵峰晶,王金龙,等.融合语义资源和关键词的文本聚类[J].计算机工程,2014,40(4):223-227.
[5]VIDAL M,MENEZES G V,BERLT K,et al.Selecting Keywords to Represent Web Page Using Wikipedia Information[J].WebMedia,2012,4(10):15-18.
[6]TURNEY P D.Learning Algorithms for Keyphrase Extraction[J].Information Retrieval,2000,2(4):303-336.
[7]BELLAACHIA A.NE-Rank:A Novel Graph-based Keyphrase Extraction in Twitter[J].Web Intelligence and Intelligent Agent Technology,2013,1(12):372-379.
[8]李然,张华平,赵燕平,等.基于主题模型与信息熵的中文文档自动摘要技术研究[J].计算机科学,2014,41(S2):298-300.
[9]刘通.基于复杂网络的文本关键词提取算法研究[J].计算机应用研究,2016,33(2):365-369.
[10]陈伟鹤,刘云.基于词或词组长度和频数的短中文文本关键词提取算法[J].计算机科学,2016,43(12):50-57.
[11]王立霞,淮晓永.基于语义的中文文本关键词提取算法[J].计算机工程,2012,38(1):1-4.
[12]李鹏,王斌,石志伟,等.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351.
[13]罗燕,赵书良,李晓超,等.基于词频统计的文本关键词提取方法[J].计算机应用,2016,36(3):718-725.
[14]李晓超,赵书良,罗燕,等.中文文本同频词统计规律及在关键词提取中的应用[J].计算机应用研究,2016,33(4):1007-1012.
[15]潘虹,徐朝军.LCS算法在术语抽取中的应用研究[J].情报学报,2010,29(5):853-857.
[16]车海燕,冯铁,张家晨,等.面向中文自然语言文档的自动知识抽取方法[J].计算机研究与发展,2013,50(4):834-842.
[17]夏天.词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术,2013,29(9):30-34.
[18]方康,韩立新.基于HMM的加权TextRank单文档的关键词抽取算法[J].信息技术,2015,4(4):114-116.
[19]顾益军.融合LDA与TextRank的关键词抽取研究[J].现代图书情报技术,2014,30(7):41-47.
[20]BENGIO Y,DUCHARME R,VINCENT P,et al.A Neural Probabilistic Language Model[J].Journal of Machine Learning Research,2003,3(6):1137-1155.
[1] LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.
[2] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[3] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[4] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[5] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[6] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[7] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[8] CAI Xiao-juan, TAN Wen-an. Improved Collaborative Filtering Algorithm Combining Similarity and Trust [J]. Computer Science, 2022, 49(6A): 238-241.
[9] YU Shu-hao, ZHOU Hui, YE Chun-yang, WANG Tai-zheng. SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion [J]. Computer Science, 2022, 49(6A): 256-260.
[10] WANG Yi, LI Zheng-hao, CHEN Xing. Recommendation of Android Application Services via User Scenarios [J]. Computer Science, 2022, 49(6A): 267-271.
[11] MAO Sen-lin, XIA Zhen, GENG Xin-yu, CHEN Jian-hui, JIANG Hong-xia. FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition [J]. Computer Science, 2022, 49(6A): 285-290.
[12] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[13] CHEN Jia-zhou, ZHAO Yi-bo, XU Yang-hui, MA Ji, JIN Ling-feng, QIN Xu-jia. Small Object Detection in 3D Urban Scenes [J]. Computer Science, 2022, 49(6): 238-244.
[14] Ran WANG, Jiang-tian NIE, Yang ZHANG, Kun ZHU. Clustering-based Demand Response for Intelligent Energy Management in 6G-enabled Smart Grids [J]. Computer Science, 2022, 49(6): 44-54.
[15] CHENG Ke-yang, WANG Ning, CUI Hong-gang, ZHAN Yong-zhao. Interpretability Optimization Method Based on Mutual Transfer of Local Attention Map [J]. Computer Science, 2022, 49(5): 64-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!