计算机科学 ›› 2015, Vol. 42 ›› Issue (7): 270-275.doi: 10.11896/j.issn.1002-137X.2015.07.058

• 人工智能 • 上一篇    下一篇

基于稳健模糊粗糙集模型的多标记文本分类

张 晶,李德玉,王素格,李 华   

  1. 山西大学计算机与信息技术学院 太原030006,山西大学计算机与信息技术学院 太原030006;山西大学计算智能与中文信息处理教育部重点实验室 太原030006,山西大学计算机与信息技术学院 太原030006;山西大学计算智能与中文信息处理教育部重点实验室 太原030006,山西大学计算机与信息技术学院 太原030006
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61175067,5),山西省科技攻关项目(20110321027-02),山西省回国留学人员科研项目(2013-014)资助

Multi-label Text Classification Based on Robust Fuzzy Rough Set Model

ZHANG Jing, LI De-yu, WANG Su-ge and LI Hua   

  • Online:2018-11-14 Published:2018-11-14

摘要: 针对多标记数据的不确定性以及噪声数据的存在,提出了一种新的多标记稳健模糊粗糙分类模型。该模型是处理单标记分类问题的k-mean稳健统计量模糊粗糙分类模型的扩展应用。对于每个待分类数据,首先根据相似性计算方法,得到它们相对于各标记的隶属度;然后根据隶属度定义待分类数据与各标记的相关度;最后为每一组相关度赋予合适的阈值,得到相关的标记集合。在3个标准多标记数据集和1个真实多标记文本数据集上的实验结果表明,对于多标记文本分类问题,所提模型在 6个常用的多标记评测指标上较常用的ML-kNN和rank-SVM多标记学习方法具有更高的准确率。

关键词: 模糊粗糙集,k-mean稳健统计量,隶属度,多标记学习

Abstract: Owing to the uncertainty of multi-label data and noise data,a novel multi-label robust fuzzy rough classification model was proposed,which is an extension of k-mean robust statistics fuzzy rough classification model that is used to solve the single label classification problem.First,for each unlabeled instance,the membership with respect to each label was obtained by similarity measures.Second,according to the membership,the degree of correlation was defined.Finally,an appropriate threshold was given to demarcate the correlated and uncorrelated labels. The experimental results on three benchmark multi-label datasets and one actual multi-label datasets indicate that the proposed model is superior to ML-kNN and rank-SVM across six popular multi-label evaluation metrics.

Key words: Fuzzy rough set,k-mean robust statistics,Membership,Multi-label learning

[1] Schapire R,Singer Y.BoosTexter:A boosting-based system for text categorization[J].Machine Learning,2000,39(2):135-168
[2] 郝虹,计华,张化祥,等.基于I2C距离和标记相关性的多标记场景分类[J].计算机科学,2014,41(1):88-90 Hao Hong,Ji Hua,Zhang Hua-xiang,et al.Multi-label scene classification based on I2C distance and label dependency[J].Computer Science,2014,41(1):88-90
[3] Trohidis K,Tsoumakas G,Kalliris G,et al.Multi-label classification of music into emotions[C]∥Proceeding of 9th International Conference on Music Information Retrieval(ISMIR).Philadelphia,PA,USA,2008:325-330
[4] Elisseeff A,Weston J.A kernel method for multi-labelled classification[C]∥Advances in Neural Information Processing Systems 14.Cambridge,MA:MIT Press,2002:681-687
[5] Tsoumakas G,Katakis I.Multi-Label Classification:An Overview[J].International Journal of Data Warehousing and Mi-ning,2007,3(3):1-13
[6] Zhang Min-ling,Zhou Zhi-hua.A Review on Multi-Label Lear-ning Algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837
[7] Pawlak Z.Rough Sets[J].International Journal of Computerand Information Sciences,1982,11(5):341-356
[8] Dubois D,Prade H.Rough fuzzy sets and fuzzy rough sets[J].International Journal of General Systems,1990,17:191-208
[9] Hu Qing-hua,Zhang Lei,An Shuang,et al.On robust fuzzyrough set models[J].IEEE Transactions on Fuzzy systems,2012,20(4):636-651
[10] McCallum A.Multi-label text classification with a mixture modeltrained by EM[C]∥Proc of Working Notes of the AAAI’99 Workshop on Text Learning.Menlo Park,CA:AAAI Press,1999
[11] Ueda N,Saito K.Parametric mixture models for multi-label text[C]∥Advances in Neural Information Processing Systems 15.Cambridge,MA:MIT Press,2003:721-728
[12] Zhang Min-ling,Zhou Zhi-hua.Multi-label neural networkswith applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351
[13] Zhang Min-ling,Zhou Zhi-hua.ML-kNN:A lazy learning ap-proach to muti-label learning[J].Pattern Recognition,2007,40(7):2038-2048
[14] Comité F D,Gilleron R,Tommasi M.Learning multi-label alternating decision tree from texts and data[C]∥Lecture Notes in Computer Science 2734.Berlin:Springer,2003:35-49
[15] Yeung D S,Chen D G,Tsang E C C,et al.On the Generalization of Fuzzy Rough Sets[J].IEEE Transactions on Fuzzy Systems,2005,13(3):343-361
[16] 郑伟,王朝坤,刘璋,等.一种基于随机游走模型的多标记分类算法[J].计算机学报,2010,3(8):1418-1426 Zheng Wei,Wang Chao-kun,Liu Zhang,et al.A multi-label classification algorithm based on random walk model[J].Chinese Journal of Computers,2010,3(8):1418-1426
[17] 广凯,潘金贵.一种基于向量夹角的k近邻多标记文本分类算法[J].计算机科学,2008,35(4):205-207 Guang Kai,Pan Jin-gui.An kNN algorithm based on vector angle for multi-label text categorization[J].Computer Science,2008,5(4):205-207
[18] Tsoumakas G,Katakis I,Vlahavas I.Mining Multi-label Data[M]∥Data Ming and Knowledge Discovery Handbook.Berlin:Springer,2010:667-685

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!