计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 36-40.doi: 10.11896/j.issn.1002-137X.2016.12.006
阿力甫·阿不都克里木,李晓
Ghalip ABDUKERIM and LI Xiao
摘要: 针对维吾尔语文本的分类问题,提出一种基于TextRank算法和互信息相似度的维吾尔文关键词提取及文本分类方法。首先,对输入文本进行预处理,滤除非维吾尔语的字符和停用词;然后,利用词语语义相似度、词语位置和词频重要性加权的TextRank算法提取文本关键词集合;最后,根据互信息相似度度量,计算输入文本关键词集和各类关键词集的相似度,最终实现文本的分类。实验结果表明,该方案能够 提取出具有较高识别度的关键词,当关键词集大小为1250时,平均分类率达到了91.2%。
[1] Parhat R,Meng X T,Hamdulla A.Uyghur Text Sentiment Classification Based on Discriminative Keyword Model[J].Computer Engineering,2014,40(10):132-136(in Chinese) 热依莱木·帕尔哈提,孟祥涛,艾斯卡尔·艾木都拉.基于区分性关键词模型的维吾尔文本情感分类[J].计算机工程,2014,40(10):132-136 [2] Maimaitiyiming Hasimu,Wushouer Silamu,Weinila Musha-jiang,et al.Research N-gram based Uyghur text classification technique[J].Application Research of Computers,2015,32(7):1986-1988(in Chinese) 买买提依明·哈斯木,吾守尔·斯拉木,维尼拉·木沙江,等.基于N元模型的维吾尔文文本分类技术研究[J].计算机应用研究,2015,32(7):1986-1988 [3] Mairehaba·AILI,Jiang Wen-bin,Wang Zhi- yang,et al.Directed Graph Model of Uyghur Morphological Analysis[J].Journal of Software,2012,23(12):94-100(in Chinese) 麦热哈巴·艾力,姜文斌,王志洋,等.维吾尔语词法分析的有向图模型[J].软件学报,2012,23(12):94-100 [4] Trstenjak B,Mikac S,Donko D.KNN with TF-IDF based Fra-mework for Text Categorization[J].Procedia Engineering,2014,69(1):1356-1364 [5] Jayashree R,Srikanta M K,Sunny K.Keyword Extraction Based Summarization of Categorized Kannada Text Documents[J].International Journal on Soft Computing,2011,2(4):152-164 [6] Alimjan AYSA,Turgun IBRAHIM,Kurban OBUL,et al.Research of Uyghur Language Text Categorization Based on SVM[J].Computer Engineering and Science,2012,34(12):140-144(in Chinese) 阿力木江·艾沙,吐尔根·依布拉音,库尔班·吾布力,等.基于SVM的维吾尔文文本分类研究[J].计算机工程与科学,2012,34(12):140-144 [7] Alimjan AYSA,Kurban UBUL,Turgun IBRAHIM.Bigram feature extraction for Uyghur text[J].Computer Engineering and Applications,2015,51(3):216-221(in Chinese) 阿力木江·艾沙,库尔班·吾布力,吐尔根·依布拉音.维吾尔文Bigram文本特征提取[J].计算机工程与应用,2015,51(3):216-221 [8] Pawar D D,Bewoor M S,Patil S H.Text Rank:A Novel Concept for Extraction Based Text Summarization[J].International Journal of Computer Science & Information Technolo,2014,34(6):152-163 [9] Mahpirat Wali,Zhao Meng-yuan,Askar Hamdulla.Keywordbased Uyghur single document summarization[J].Computer Engineering and Applications,2015,51(16):130-135(in Chinese) 买哈铺热提·外力,赵梦原,艾斯卡尔·艾木都拉.基于关键词的维吾尔单文本自动文摘技术研究[J].计算机工程与应用,2015,51(16):130-135 [10] Turdi TOHTI,Akbar PATTAR,Askar HAMDULLA.Adap-tive word grouping algorithm based on mutual information in Uyghur language[J].Application Research of Computers,2013,30(2):429-431(in Chinese) 吐尔地·托合提,艾克白尔·帕塔尔,艾斯卡尔·艾木都拉.基于互信息的维吾尔文自适应组词算法[J].计算机应用研究,2013,30(2):429-431 [11] Wang Z,Feng Y.FN-Rank:Domain Keywords Extraction Algorithm[J].Open Automation & Control Systems Journal,2015,7(1):1347-1351 [12] Li Peng,Wang Bin,Shi Zhi-wei,et al.Tag-TextRank:A Webpage Keyword Extraction Method Based on Tags[J].Journal of Computer Research and Development,2012,49(11):2344-2351(in Chinese) 李鹏,王斌,石志伟,等.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351 [13] Litvak M,Last M,Kandel A.DegExt:a language-independent keyphrase extractor[J].Journal of Ambient Intelligence & Humanized Computing,2012,4(3):377-387 [14] Razlighi Q R,Kehtarnavaz N.Spatial Mutual Information asSimilarity Measure for 3-D Brain Image Registration[J].IEEE Journal of Translational Engineering in Health & Medicine,2014,24(2):27-34 [15] Li Bo,Shi Hui-xia,Wang Yi.A Text Extension Algorithm Based on Synonymy Discovery[J].Journal of Chongqing University of Technology(Natural Science),2014,8(2):76-81(in Chinese) 李波,石慧霞,王毅.一种基于同义词发现的文本扩充算法[J].重庆理工大学学报(自然科学),2014,8(2):76-81 |
No related articles found! |
|