计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 25-31.doi: 10.11896/j.issn.1002-137X.2017.01.005

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

一种处理部分标记数据的粗糙集属性约简算法

张维,苗夺谦,高灿,李峰   

  1. 同济大学电子与信息工程学院计算机科学与技术系 上海201804;上海电力学院计算机科学与技术学院 上海200090;同济大学嵌入式系统与服务计算教育部重点实验室 上海201804,同济大学电子与信息工程学院计算机科学与技术系 上海201804;同济大学嵌入式系统与服务计算教育部重点实验室 上海201804,深圳大学计算机与软件学院 广东518060;香港理工大学应用科学与纺织学院 香港,同济大学电子与信息工程学院计算机科学与技术系 上海201804;同济大学嵌入式系统与服务计算教育部重点实验室 上海201804
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61273304),2013年度高等学校博士学科点专项科研基金(20130072130004)资助

Rough Set Attribute Reduction Algorithm for Partially Labeled Data

ZHANG Wei, MIAO Duo-qian, GAO Can and LI Feng   

  • Online:2018-11-13 Published:2018-11-13

摘要: 属性约简是粗糙集理论中重要的研究内容之一,是数据挖掘中知识获取的关键步骤。Pawlak粗糙集约简的对象一般是有标记的决策表或者是无标记的信息表。而在很多现实问题中有标记数据很有限,更多的是无标记数据,即半监督数据。为此,结合半监督协同学习理论,提出了处理半监督数据的属性约简算法。该算法首先在有标记数据上构造两个差异性较大的约简来构造基分类器;然后在无标记数据上交互协同学习,扩大有标记数据集,获得质量更好的约简,构造性能更好的分类器,该过程迭代进行,从而实现利用无标记数据提高有标记数据的约简质量,最终获得质量较好的属性约简。UCI数据集上的实验分析表明,该算法是有效且可行的。

关键词: 粗糙集,增量式属性约简,协同学习,部分标记数据,半监督学习

Abstract: Attribute reduction,as an important preprocessing step for knowledge acquiring in data mining,is one of the key issues in rough set theory.Rough set theory is an effective supervised learning model for labeled data.However,attribute reduction for partially labeled data is outside the realm of traditional rough set theory.In this paper,a rough set attribute reduction algorithm for partially labeled data was proposed based on co-training which capitalizes on the unlabeled data to improve the quality of attribute reducts from few labeled data.It gets two diverse reducts of the labeled data,employs them to train its base classifiers,and then co-trains the two base classifiers iteratively.In every round,the base classifiers learn from each other on the unlabeled data and enlarge the labeled data,so better quality reducts could be computed from the enlarged labeled data and employed to construct base classifiers of higher performance.The theoretical analysis and experimental results with UCI data sets show that the proposed algorithm can select a few attributes but keep classification power.

Key words: Rough sets,Incremental attribute reduction,Co-training,Partially labeled data,Semi-supervised learning

[1] PAWLAK Z.Rough sets[J].International Journal of Computer and Information Science,1982,11(5):341-356.
[2] PAWLAK Z.Rough sets:Theoretical Aspects of Reasoningabout Data[M].Dordrecht,Netherlands:Kluwer Academic Publishers,1991.
[3] WANG Guo-yin,YAO Yi-yu,YU Hong.A survey on rough set theory and applications [J].Chinese Journal of Computers,2009,32(7):1229-1246.(in Chinese) 王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009,32(7):1229-1246.
[4] THANGAVEL K,PETHALAKSHMI A.Dimensionality reduc-tion based on rough set theory:A review[J].Applied Soft Computing,2009,9(1):1-12.
[5] MIAO D Q,ZHAO Y,YAO Y Y,et al.Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model[J].Information Sciences,2009,179(24):4140-4150.
[6] TIPPING E,BISHOP C M.Probabilistic principal componentanalysis[J].Journal of the Royal Statistical Society,1999,1(3):611-622.
[7] COSTA J A,Hero A O.Classification constrained dimensionality reduction[J].Statistics,2008,5:1077-1080.
[8] CAI D,HE X,HAN J.Semi-Supervised discriminant analysis[C]∥International Conference Computer Vision.2007:1-7.
[9] SUGIYAMA M,IDE T,NAKAJIMA S,et al.Semi-Supervised local fisher discriminant analysis for dimensionality reduction[J].Machine Learning,2008,8(1/2):35-61.
[10] CHATPATANASIRI R,KIJSIRIKUL B.A unified semi-supervised dimensionality reduction framework for manifold learning[J].Neurocomputing,2010,73(10-12):1631-1640.
[11] BAR-HILLEL A,HERTZ T,SHENTAL N,et al.Learning a Mahalanobis Metric from Equivalence Constraints[J].Journal of Machine Learning Research,2005,6(2):937-965.
[12] ZHANG D,ZHOU Z,CHEN S.Semi-Supervised dimensionality reduction[C]∥ SIAM Conference on Data Mining.Minneapolis,2007:629-634.
[13] CEVIKALP H,VERBEEK J J,et al.Semi-Supervised Dimensionality Reduction Using Pairwise Equivalence Constraints[C]∥The Third International Conference on Computer Vision Theory and Applications.2008:489-496.
[14] WEI J,PENG H.Neighbourhood preserving based semi-supervised dimensionality reduction[J].Electronics Letters,2008,44(20):1190-1191.
[15] BAGHSHAH M S,SHOURAKI S B.Semi-Supervised Metric Learning Using Pairwise Constraints[C]∥The International Joint Conference on Artificial Intelligence.2009:1217-1222.
[16] LIN Y,LIU T,CHEN H.Semantic manifold learning for image retrieval[C]∥ACM International Conference on Multimedia.2005:249-258.
[17] YU J,TIAN Q.Learning image manifolds by semantic subspace projection[C]∥ ACM International Conference on Multimedia.2006:297-306.
[18] MIAO D,GAO C,ZHANG N,et al.Diverse reduct subspaces based co-training for partially labeled data[J].International Journal of Approximate Reasoning,2011,52(8):1103-1117.
[19] SKOWRON A,RAUSZER C.The discernibility matrices andfunctions in information systems[M]∥Slowi_nski R,ed.Intelligent Decision Support,Handbook of Applications and Advances of the Rough Sets Theory.Dordrecht,Kluwer,1992.
[20] HU X H,CERCONE N.Learning in relational databases:Arough set approach[J].International Journal of Computational Intelligence,1995,1(2):323-338.
[21] YANG Ming.An incremental updating algorithm for attributereduction based on improved discernibility matrix[J].Chinese Journal of Computers,2007,0(5):815-822.(in Chinese) 杨明.一种基于改进差别矩阵的属性约简增量式更新算法[J].计算机学报,2007,30(5):815-822.
[22] BLUM A,MITCHELL T.Combining labeled and unlabeled data with co-training[C]∥Proc of the 11th Annual Conf on Computational Learning Theory.New York:ACM,1998:92-100.
[23] FEGER F,KOPRINSKA I.Co-training using RBF nets and different feature splits[C]∥Proc of the 2006 Int Joint Confon Neural Networks.Piscataway,NJ:IEEE,2006:1878-1885.
[24] YASLAN Y,CATALTEPE Z.Co-training with relevant random subspaces[J].Neurocomputing,2010,73(10-12):1652-1661.
[25] TANG Huan-ling,LIN Zheng-kui,LU Ming-yu,et al.An advanced co-training algorithm based on mutual independence and diversity measures[J].Journal of Computer Research and Deve-lopment,2008,45(11):1874-1881.(in Chinese) 唐焕玲,林正奎,鲁明羽,等.一种结合独立性模型与差异评估的Co-Training改进方案[J].计算机研究与发展,2008,45(11):1874-1881.
[26] SALAHELDIN A,GAYAR N El.New feature splitting criteria for cotraining using genetic algorithm optimization[C]∥Proc of the 9th Int Workshop on Multiple classifier systems.Berlin:Springer,2010:22-32.
[27] GOLDMAN S,ZHOU Yan.Enhancing supervised learning with unlabeled data[C]∥Proc of the 17th Int Conf on Machine Learning.San Francisco:Margan Kaufmann,2000:327-334.
[28] ZHOU Zhi-hua,LI Ming.Tri-training:Exploiting unlabeled data using three classifiers[J].IEEE Trans on Knowledge and Data Engineering,2005,17(11):1529-1541.
[29] BLAKE C,MERZ C J.UCI Repository of machine learning databases.http://archive.ics.uci.edu/ml/datasets.html.
[30] HRN A,KOMOROWSKI J.ROSETTA:A rough set toolkit for analysis of data[C]∥Proceedings of the 3rd International Joint Conference on Information Sciences.Durham,NC,USA:1997:403-407.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .