Computer Science ›› 2021, Vol. 48 ›› Issue (6A): 342-348.doi: 10.11896/jsjkx.201000053

• Intelligent Computing • Previous Articles     Next Articles

Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets

LI Yan1,2, FAN Bin2, GUO Jie2, LIN Zi-yuan1, ZHAO Zhao1   

  1. 1 School of Applied Mathematics,Beijing Normal University,Zhuhai,Zhuhai,Guangdong 519087,China
    2 College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China
  • Online:2021-06-10 Published:2021-06-17
  • About author:LI Yan,born in 1976,Ph.D,professor,master supervisor,is a member of China Computer Federation.Her main research interests include Granular computing and knowledge discovery and machine learning.
  • Supported by:
    NSF of Guangdong Province(2018A0303130026),NSF of Hebei Province(F2018201096),National Nautral Science Foundation of China(61976141) and Key Science and Technology Foundation of the Educational Department of Hebei Province(ZD2019021).

Abstract: For target information systems containing both continuous and symbolic values,a novel attribute reduction method is proposed based on k-prototypes clustering and rough set theory under equivalent relations,which is suitable for hybrid data.Firstly,k-prototypes clustering is applied to obtain clusters of information systems by defining the distance of hybrid data,forming a division of the universe.Then the obtained clusters are used to replace equivalent classes in rough set theory,and the concepts of cluster-based approximate set,positive region,attribute reduction are correspondingly proposed.An attribute importance measure is also defined based on information entropy and the clusters.Finally,a variable precision positive-region reduction method is established,which can process both numerical and symbolic data,remove redundant attributes,reduce the needed storage and running time cost,and improve classification performance of classification algorithms.Besides,the division of different granularities of the universe can be obtained by adjusting the clustering parameter k and thus the attributed reduction can be optimized.A large number of experiments are carried out on 11 UCI data sets,four common classification algorithms are used for classification problems.The classification accuracy before and after reduction are compared.The influence of parameters on the results is analyzed in detail and verifies the effectiveness of the reduction method.

Key words: k-prototypes clustering, Attribute reduction, Hybrid data, Multi-granule, Rough set

CLC Number: 

  • TP181
[1] PAWLAK Z.Rough sets[J].International Journal of Information & Computer Sciences,1982,11(3):289-296.
[2] PAWLAK Z.Rough sets:Theoretical Aspects of Reasoning about Data[M].Boston:Kluwer Academic Publishers,1991.
[3] SKOWRON A,RAUSZER C.The discernibility matrices andfunctions in information systems[M].Dordrecht:Springer,1992:331-362.
[4] KRYZKIEWICZ M.Comparative study of alternative types of knowledge reduction in inconsistent systems [J].International Journal of Intelligent Systems,2001,16(1):105-120.
[5] CHEN J,WANG G Y,HU J.Positive Domain Reduction Based on Dominance Relation in Inconsistent System[J].Computer Science,2008,35(3):216-218,227.
[6] LIU G,FENG Y,YANG J.A common attribute reduction form for information systems[J].Knowledge-Based Systems,2020,193:105466.
[7] GRECO S,MATARAZZO B,SLOWINSKI R.Rough sets theory for multicriteria decision analysis[J].European Journal of Operational Research,2001,129(1):1-47.
[8] GRECO S,MATARAZZO B,SLOWINSKI R.Rough approxi-mation by dominance relations[J].International Journal of Intelligent Systems,2002,17(2):153-171.
[9] CAO B R,LIU Y.Variable Precision Rough Set Model Based on Set Pair Situation Dominance Relationship[J].Computer Engineering,2015,41(11):35-40.
[10] LI Y,ZHANG L,WANG X J,et al.Attribute Reduction for Sequential Three-way Decisions Under Dominance-Equivalence Relations[J].Computer Science,2019,46(2):242-248.
[11] ANDERBERG M R.Cluster Analysis for Applications[M].New York:Academic Press,1973.
[12] SUN J G,LIU J,ZHAO L Y.Clustering algorithms research[J].Journal of Software,2008,19(1):48-61.
[13] LIU Y H,MA H F,LIU H J,et al.An overlapping subspace K-Means clustering algorithm[J].Computer Engineering,2020,46(8):58-63.
[14] HUANG Z.Extensions to the K-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
[15] HUANG Z,NG M.Fuzzy K-modes algorithm for clusteringcategorical data[J].IEEE Transactions on Fuzzy Systems,1999,7(4):446-452.
[16] CHEN Y,SONG J J,YANG X B.Accelerator for finding reduct based on attribute group[J].Journal of Nanjing University of Science and Technology,2020,44(2):216-223.
[17] CHEN Y,ZENG D S,XIE C.A Method of Attribute Reduction Based on Clustering[J].Computer Systems Applications,2009,18(5):173-176.
[18] LU J,ZHANG T,REN H L.Reduction of attribute in decision table based on clustering rate[J].Computer Engineering and Application,2012(28):135-138,233.
[19] CHEN Y C,LI O,SUN Y.Attribute reduction based on clustering discretization and variable precision neighborhood entropy[J].Control and Decision,2018,33(8):1407-1414.
[20] ZIARKO W.Variable precision rough set model[J].Journal of Computer and System Sciences,1993,46(1):39-59.
[21] UCI Machine Learning Repository[OL].
[1] CHENG Fu-hao, XU Tai-hua, CHEN Jian-jun, SONG Jing-jing, YANG Xi-bei. Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory [J]. Computer Science, 2022, 49(8): 97-107.
[2] XU Si-yu, QIN Ke-yun. Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices [J]. Computer Science, 2022, 49(6A): 140-143.
[3] FANG Lian-hua, LIN Yu-mei, WU Wei-zhi. Optimal Scale Selection in Random Multi-scale Ordered Decision Systems [J]. Computer Science, 2022, 49(6): 172-179.
[4] CHEN Yu-si, AI Zhi-hua, ZHANG Qing-hua. Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy [J]. Computer Science, 2022, 49(5): 152-158.
[5] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[6] WANG Zi-yin, LI Lei-jun, MI Ju-sheng, LI Mei-zheng, XIE Bin. Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost [J]. Computer Science, 2022, 49(4): 161-167.
[7] WANG Zhi-cheng, GAO Can, XING Jin-ming. Three-way Approximate Reduction Based on Positive Region [J]. Computer Science, 2022, 49(4): 168-173.
[8] XUE Zhan-ao, HOU Hao-dong, SUN Bing-xin, YAO Shou-qian. Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes [J]. Computer Science, 2022, 49(3): 255-262.
[9] XUE Zhan-ao, SUN Bing-xin, HOU Hao-dong, JING Meng-meng. Optimal Granulation Selection Method Based on Multi-granulation Rough Intuitionistic Hesitant Fuzzy Sets [J]. Computer Science, 2021, 48(10): 98-106.
[10] ZENG Hui-kun, MI Ju-sheng, LI Zhong-ling. Dynamic Updating Method of Concepts and Reduction in Formal Context [J]. Computer Science, 2021, 48(1): 131-135.
[11] XUE Zhan-ao, ZHANG Min, ZHAO Li-ping, LI Yong-xiang. Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation [J]. Computer Science, 2021, 48(1): 157-166.
[12] SANG Bin-bin, YANG Liu-zhong, CHEN Hong-mei, WANG Sheng-wu. Incremental Attribute Reduction Algorithm in Dominance-based Rough Set [J]. Computer Science, 2020, 47(8): 137-143.
[13] CHEN Yu-jin, XU Ji-hui, SHI Jia-hui, LIU Yu. Three-way Decision Models Based on Intuitionistic Hesitant Fuzzy Sets and Its Applications [J]. Computer Science, 2020, 47(8): 144-150.
[14] YUE Xiao-wei, PENG Sha and QIN Ke-yun. Attribute Reduction Methods of Formal Context Based on ObJect (Attribute) Oriented Concept Lattice [J]. Computer Science, 2020, 47(6A): 436-439.
[15] ZHOU Jun-li, GUAN Yan-yong, XU Fa-sheng and WANG Hong-kai. Core in Covering Approximation Space and Its Properties [J]. Computer Science, 2020, 47(6A): 526-529.
Full text



No Suggested Reading articles found!