Computer Science ›› 2017, Vol. 44 ›› Issue (1): 25-31.doi: 10.11896/j.issn.1002-137X.2017.01.005

Previous Articles     Next Articles

Rough Set Attribute Reduction Algorithm for Partially Labeled Data

ZHANG Wei, MIAO Duo-qian, GAO Can and LI Feng   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Attribute reduction,as an important preprocessing step for knowledge acquiring in data mining,is one of the key issues in rough set theory.Rough set theory is an effective supervised learning model for labeled data.However,attribute reduction for partially labeled data is outside the realm of traditional rough set theory.In this paper,a rough set attribute reduction algorithm for partially labeled data was proposed based on co-training which capitalizes on the unlabeled data to improve the quality of attribute reducts from few labeled data.It gets two diverse reducts of the labeled data,employs them to train its base classifiers,and then co-trains the two base classifiers iteratively.In every round,the base classifiers learn from each other on the unlabeled data and enlarge the labeled data,so better quality reducts could be computed from the enlarged labeled data and employed to construct base classifiers of higher performance.The theoretical analysis and experimental results with UCI data sets show that the proposed algorithm can select a few attributes but keep classification power.

Key words: Rough sets,Incremental attribute reduction,Co-training,Partially labeled data,Semi-supervised learning

[1] PAWLAK Z.Rough sets[J].International Journal of Computer and Information Science,1982,11(5):341-356.
[2] PAWLAK Z.Rough sets:Theoretical Aspects of Reasoningabout Data[M].Dordrecht,Netherlands:Kluwer Academic Publishers,1991.
[3] WANG Guo-yin,YAO Yi-yu,YU Hong.A survey on rough set theory and applications [J].Chinese Journal of Computers,2009,32(7):1229-1246.(in Chinese) 王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009,32(7):1229-1246.
[4] THANGAVEL K,PETHALAKSHMI A.Dimensionality reduc-tion based on rough set theory:A review[J].Applied Soft Computing,2009,9(1):1-12.
[5] MIAO D Q,ZHAO Y,YAO Y Y,et al.Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model[J].Information Sciences,2009,179(24):4140-4150.
[6] TIPPING E,BISHOP C M.Probabilistic principal componentanalysis[J].Journal of the Royal Statistical Society,1999,1(3):611-622.
[7] COSTA J A,Hero A O.Classification constrained dimensionality reduction[J].Statistics,2008,5:1077-1080.
[8] CAI D,HE X,HAN J.Semi-Supervised discriminant analysis[C]∥International Conference Computer Vision.2007:1-7.
[9] SUGIYAMA M,IDE T,NAKAJIMA S,et al.Semi-Supervised local fisher discriminant analysis for dimensionality reduction[J].Machine Learning,2008,8(1/2):35-61.
[10] CHATPATANASIRI R,KIJSIRIKUL B.A unified semi-supervised dimensionality reduction framework for manifold learning[J].Neurocomputing,2010,73(10-12):1631-1640.
[11] BAR-HILLEL A,HERTZ T,SHENTAL N,et al.Learning a Mahalanobis Metric from Equivalence Constraints[J].Journal of Machine Learning Research,2005,6(2):937-965.
[12] ZHANG D,ZHOU Z,CHEN S.Semi-Supervised dimensionality reduction[C]∥ SIAM Conference on Data Mining.Minneapolis,2007:629-634.
[13] CEVIKALP H,VERBEEK J J,et al.Semi-Supervised Dimensionality Reduction Using Pairwise Equivalence Constraints[C]∥The Third International Conference on Computer Vision Theory and Applications.2008:489-496.
[14] WEI J,PENG H.Neighbourhood preserving based semi-supervised dimensionality reduction[J].Electronics Letters,2008,44(20):1190-1191.
[15] BAGHSHAH M S,SHOURAKI S B.Semi-Supervised Metric Learning Using Pairwise Constraints[C]∥The International Joint Conference on Artificial Intelligence.2009:1217-1222.
[16] LIN Y,LIU T,CHEN H.Semantic manifold learning for image retrieval[C]∥ACM International Conference on Multimedia.2005:249-258.
[17] YU J,TIAN Q.Learning image manifolds by semantic subspace projection[C]∥ ACM International Conference on Multimedia.2006:297-306.
[18] MIAO D,GAO C,ZHANG N,et al.Diverse reduct subspaces based co-training for partially labeled data[J].International Journal of Approximate Reasoning,2011,52(8):1103-1117.
[19] SKOWRON A,RAUSZER C.The discernibility matrices andfunctions in information systems[M]∥Slowi_nski R,ed.Intelligent Decision Support,Handbook of Applications and Advances of the Rough Sets Theory.Dordrecht,Kluwer,1992.
[20] HU X H,CERCONE N.Learning in relational databases:Arough set approach[J].International Journal of Computational Intelligence,1995,1(2):323-338.
[21] YANG Ming.An incremental updating algorithm for attributereduction based on improved discernibility matrix[J].Chinese Journal of Computers,2007,0(5):815-822.(in Chinese) 杨明.一种基于改进差别矩阵的属性约简增量式更新算法[J].计算机学报,2007,30(5):815-822.
[22] BLUM A,MITCHELL T.Combining labeled and unlabeled data with co-training[C]∥Proc of the 11th Annual Conf on Computational Learning Theory.New York:ACM,1998:92-100.
[23] FEGER F,KOPRINSKA I.Co-training using RBF nets and different feature splits[C]∥Proc of the 2006 Int Joint Confon Neural Networks.Piscataway,NJ:IEEE,2006:1878-1885.
[24] YASLAN Y,CATALTEPE Z.Co-training with relevant random subspaces[J].Neurocomputing,2010,73(10-12):1652-1661.
[25] TANG Huan-ling,LIN Zheng-kui,LU Ming-yu,et al.An advanced co-training algorithm based on mutual independence and diversity measures[J].Journal of Computer Research and Deve-lopment,2008,45(11):1874-1881.(in Chinese) 唐焕玲,林正奎,鲁明羽,等.一种结合独立性模型与差异评估的Co-Training改进方案[J].计算机研究与发展,2008,45(11):1874-1881.
[26] SALAHELDIN A,GAYAR N El.New feature splitting criteria for cotraining using genetic algorithm optimization[C]∥Proc of the 9th Int Workshop on Multiple classifier systems.Berlin:Springer,2010:22-32.
[27] GOLDMAN S,ZHOU Yan.Enhancing supervised learning with unlabeled data[C]∥Proc of the 17th Int Conf on Machine Learning.San Francisco:Margan Kaufmann,2000:327-334.
[28] ZHOU Zhi-hua,LI Ming.Tri-training:Exploiting unlabeled data using three classifiers[J].IEEE Trans on Knowledge and Data Engineering,2005,17(11):1529-1541.
[29] BLAKE C,MERZ C J.UCI Repository of machine learning databases.http://archive.ics.uci.edu/ml/datasets.html.
[30] HRN A,KOMOROWSKI J.ROSETTA:A rough set toolkit for analysis of data[C]∥Proceedings of the 3rd International Joint Conference on Information Sciences.Durham,NC,USA:1997:403-407.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .