计算机科学 ›› 2015, Vol. 42 ›› Issue (5): 106-108.doi: 10.11896/j.issn.1002-137X.2015.05.021

• 2014' 数据挖掘会议 • 上一篇    下一篇

基于kNN的多标签分类预处理方法

徐晓丹,姚明海,刘华文,郑忠龙   

  1. 浙江工业大学信息工程学院 杭州310023;浙江师范大学数理与信息工程学院 金华321004,浙江工业大学信息工程学院 杭州310023,浙江师范大学数理与信息工程学院 金华321004,浙江师范大学数理与信息工程学院 金华321004
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受浙江省教育厅项目(Y201328291),浙江省自然科学基金项目(LZ14F030001,LY14F020012)资助

Pre-processing Method of Multi-label Classification Based on kNN

XU Xiao-dan, YAO Ming-hai, LIU Hua-wen and ZHENG Zhong-long   

  • Online:2018-11-14 Published:2018-11-14

摘要: 多标签学习已成为当前机器学习的研究热点。为了提高分类性能,对训练集中的噪声数据进行预处理,提出一种基于k近邻(kNN)的多标签分类去噪方法:对现有的多标签数据集进行分析后获得近似正态分布的特征,通过将噪声标记改为其k近邻标记的方法,滤去部分噪声信息,从而得到相对高质量的数据集。在MULAN平台上使用多个数据集对6种多标签分类算法进行了噪声去除前后的对比测试,实验结果表明,多标签的预处理方法有效提高了分类器的性能。此方法对于分布特征明显的数据集具有较好的适用性。

关键词: 多标签,分类,正态分布,预处理,kNN

Abstract: Multi-label learning is a new field in machine learning.In order to improve the multi-label classification precision,a new kNN method was used to remove the noise labels.First,a normal distribution is discovered by analyzing the characteristics of multi-label datasets,and then the high quality datasets are generated by changing the value of noisy labels to their k-Nearest Neighbors.In the experiments,six kinds of multi-label classification methods were tested on MULAN with new datasets.Compared to the primal datasets,the classification precision based on new datasets is better.Research results show this method is suitable for the data set which has a regular distribution.

Key words: Multi-label,Classification,Normal distribution,Pretreatment,kNN

[1] Zhang Min-ling,Zhou Zhi-hua.ML-KNN:A lazy learning ap-proach to multi-label learning [J].Pattern Recognition,2007,7(40):2038-2048
[2] Tsoumakas G,Katakis I,Vlahavas I.Mining multi-label data[M]∥Data Mining and Knowledge Discovery Handbook.New York:Springer US,2010
[3] Xu Xin-shun,Jiang Yuan,Peng Liang,et al.Ensemble approach based on conditional random field for multi-labels image and video annotation[C]∥Proceedings of the 19th ACM international conference on Multimedia.Scottsdale,Arizona,USA,2011:1377-1380
[4] Wang Jing-dong,Zhao Ying-hai,Wu Xiu-qing,et al.A transductive multi-label learning approach for video concept detection [J].Pattern Recognition,2011,44(10/11):2274-2286
[5] Sanden C,Zhang J Z.Enhancing multi-label music genre techniques [C]∥Proceedings of the 34th International ACM SIGIR Conference on Research and Development in information Retrieval(SIGIR’11).New York,USA,2011:705-714
[6] Wieczorkowska A,Synak P,Ras Z.Multi-label classification of emotions in music[C]∥Proceeding of the 2006 International Conference on Intelligent Information Proceeding and Web Mi-ning(IIPWM).2006:307-315
[7] Trohidis K,Tsoumakas G,Kalliris G,et al.Multi label classification of music into emotions[C]∥Proceeding of 9th International Conference on Music Information Retrieval(ISMIR).Philadelphia,PA,USA,2008:69-75
[8] Zhang Yi,Burer S,Street W N.Ensemble pruning via semi-definite programming [J].Journal of Machine Learning Research,2006(7):1315-1338
[9] Read J,Pfahringer B,Holmes G,et al.Classifier Chains forMulti-label Classification[J].Machine Learning,2011,85(3):333-359
[10] Shen X,Boutell M,Luo J,et al.Multi-label machine learning and its application to semantic scene classification[C]∥Proceedings of the 2004 International Symposium on Electronic Imaging.San Jose,California,USA,2004:18-22
[11] Hullermeier E,Furnkranz J,Cheng W,et al.Label ranking by learning pairwise preferences[J].Artificial Intelligence,2008(16):1897-1916
[12] Read J.A pruned problem transformation method for multi-label classification[C]∥Proceeding of the New Zealand Computer Science Research Student Conference.New Zealand,2008:143-150
[13] Tsoumakas G,Katakis I.Multi-label classification:An overview [J].International Journal of Data Warehousing and Mining,2007,3(3):1-13
[14] Tsoumakas G,Vlahavas I.Random k-Labelsets:An ensemblemethod for multi-label classification[C]∥Proceedings of the ECML.Warsaw,Poland,2007:406-417
[15] Zhang Min-ling,Zhou Zhi-hua.Multi-label neural networks with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351
[16] Zhang Min-lin,Zhou Zhi-hua.A k-nearest neighbor based algorithm for multi-label classification[C]∥Proceedings of the IEEE International Conference on Granular Computing.Beijing,China,2005,2:718-721
[17] Tsoumakas G,Dimon A,Spyromitros E,et al.Correlation based pruning of stacked binary relevance models for multi-label lear-ning[C]∥Proceedings of the ECML/PKDD.Slovenia,2009:101-113
[18] http://mulan.sourseforge.net/datasets.html
[19] http://meka.sourceforge.net/#download

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!