Computer Science ›› 2016, Vol. 43 ›› Issue (12): 36-40.doi: 10.11896/j.issn.1002-137X.2016.12.006

Previous Articles     Next Articles

Uyghur Keyword Extraction and Text Classification Based on TextRank Algorithm and Mutual Information Similarity

Ghalip ABDUKERIM and LI Xiao   

  • Online:2018-12-01 Published:2018-12-01

Abstract: This paper proposed Uyghur keyword extraction and text classification scheme based on TextRank algorithm and mutual information similarity for the issues of classification in Uyghur language text.Firstly,the input document is pre-processed to filter out non-Uyghur characters and stop words.Then,keywords set in the text is extracted through using the TextRank algorithm which is weighted by semantic similarity of words,position of words and importance of frequency.Finally,the similarity between keyword sets in the input text and a variety of keyword sets is measured according to the mutual information similarity,and the text classification is realized.The experimental results show that this scheme can efficiently extract the keywords,and the average classification rate reaches 91.2% when the set size is 1250.

Key words: Uyghur language,Text categorization,Keyword extraction,TextRank algorithm,Mutual information similarity

[1] Parhat R,Meng X T,Hamdulla A.Uyghur Text Sentiment Classification Based on Discriminative Keyword Model[J].Computer Engineering,2014,40(10):132-136(in Chinese) 热依莱木·帕尔哈提,孟祥涛,艾斯卡尔·艾木都拉.基于区分性关键词模型的维吾尔文本情感分类[J].计算机工程,2014,40(10):132-136
[2] Maimaitiyiming Hasimu,Wushouer Silamu,Weinila Musha-jiang,et al.Research N-gram based Uyghur text classification technique[J].Application Research of Computers,2015,32(7):1986-1988(in Chinese) 买买提依明·哈斯木,吾守尔·斯拉木,维尼拉·木沙江,等.基于N元模型的维吾尔文文本分类技术研究[J].计算机应用研究,2015,32(7):1986-1988
[3] Mairehaba·AILI,Jiang Wen-bin,Wang Zhi- yang,et al.Directed Graph Model of Uyghur Morphological Analysis[J].Journal of Software,2012,23(12):94-100(in Chinese) 麦热哈巴·艾力,姜文斌,王志洋,等.维吾尔语词法分析的有向图模型[J].软件学报,2012,23(12):94-100
[4] Trstenjak B,Mikac S,Donko D.KNN with TF-IDF based Fra-mework for Text Categorization[J].Procedia Engineering,2014,69(1):1356-1364
[5] Jayashree R,Srikanta M K,Sunny K.Keyword Extraction Based Summarization of Categorized Kannada Text Documents[J].International Journal on Soft Computing,2011,2(4):152-164
[6] Alimjan AYSA,Turgun IBRAHIM,Kurban OBUL,et al.Research of Uyghur Language Text Categorization Based on SVM[J].Computer Engineering and Science,2012,34(12):140-144(in Chinese) 阿力木江·艾沙,吐尔根·依布拉音,库尔班·吾布力,等.基于SVM的维吾尔文文本分类研究[J].计算机工程与科学,2012,34(12):140-144
[7] Alimjan AYSA,Kurban UBUL,Turgun IBRAHIM.Bigram feature extraction for Uyghur text[J].Computer Engineering and Applications,2015,51(3):216-221(in Chinese) 阿力木江·艾沙,库尔班·吾布力,吐尔根·依布拉音.维吾尔文Bigram文本特征提取[J].计算机工程与应用,2015,51(3):216-221
[8] Pawar D D,Bewoor M S,Patil S H.Text Rank:A Novel Concept for Extraction Based Text Summarization[J].International Journal of Computer Science & Information Technolo,2014,34(6):152-163
[9] Mahpirat Wali,Zhao Meng-yuan,Askar Hamdulla.Keywordbased Uyghur single document summarization[J].Computer Engineering and Applications,2015,51(16):130-135(in Chinese) 买哈铺热提·外力,赵梦原,艾斯卡尔·艾木都拉.基于关键词的维吾尔单文本自动文摘技术研究[J].计算机工程与应用,2015,51(16):130-135
[10] Turdi TOHTI,Akbar PATTAR,Askar HAMDULLA.Adap-tive word grouping algorithm based on mutual information in Uyghur language[J].Application Research of Computers,2013,30(2):429-431(in Chinese) 吐尔地·托合提,艾克白尔·帕塔尔,艾斯卡尔·艾木都拉.基于互信息的维吾尔文自适应组词算法[J].计算机应用研究,2013,30(2):429-431
[11] Wang Z,Feng Y.FN-Rank:Domain Keywords Extraction Algorithm[J].Open Automation & Control Systems Journal,2015,7(1):1347-1351
[12] Li Peng,Wang Bin,Shi Zhi-wei,et al.Tag-TextRank:A Webpage Keyword Extraction Method Based on Tags[J].Journal of Computer Research and Development,2012,49(11):2344-2351(in Chinese) 李鹏,王斌,石志伟,等.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351
[13] Litvak M,Last M,Kandel A.DegExt:a language-independent keyphrase extractor[J].Journal of Ambient Intelligence & Humanized Computing,2012,4(3):377-387
[14] Razlighi Q R,Kehtarnavaz N.Spatial Mutual Information asSimilarity Measure for 3-D Brain Image Registration[J].IEEE Journal of Translational Engineering in Health & Medicine,2014,24(2):27-34
[15] Li Bo,Shi Hui-xia,Wang Yi.A Text Extension Algorithm Based on Synonymy Discovery[J].Journal of Chongqing University of Technology(Natural Science),2014,8(2):76-81(in Chinese) 李波,石慧霞,王毅.一种基于同义词发现的文本扩充算法[J].重庆理工大学学报(自然科学),2014,8(2):76-81

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!