计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 342-348.doi: 10.11896/jsjkx.201000053
李艳1,2, 范斌2, 郭劼2, 林梓源1, 赵曌1
LI Yan1,2, FAN Bin2, GUO Jie2, LIN Zi-yuan1, ZHAO Zhao1
摘要: 基于k-原型聚类和等价关系下的粗糙集理论,对含有连续值和符号值的目标信息系统提出了一种新的适用于混合数据的属性约简方法。首先,k-原型聚类可以通过定义混合数据的距离而得到信息系统的类簇,形成对论域的划分。将所得到的类簇代替粗糙集理论中的等价类,提出基于聚类的近似集、正域以及正域约简的概念,并根据信息熵定义属性重要性度量,建立了变精度正域约简方法。这种属性约简可以同时处理数值型和符号型数据,去除其中的冗余属性,提高分类性能,降低存储和算法运行时间耗费,并通过调节聚类参数k得到对论域不同粒度的划分,对所得到的约简进行优化。最后在UCI数据集上进行了大量的实验,针对分类问题采用了常见的4种分类算法,比较了约简前后的分类精度,详细分析了参数对结果的影响,验证了约简方法的有效性。
中图分类号:
[1] PAWLAK Z.Rough sets[J].International Journal of Information & Computer Sciences,1982,11(3):289-296. [2] PAWLAK Z.Rough sets:Theoretical Aspects of Reasoning about Data[M].Boston:Kluwer Academic Publishers,1991. [3] SKOWRON A,RAUSZER C.The discernibility matrices andfunctions in information systems[M].Dordrecht:Springer,1992:331-362. [4] KRYZKIEWICZ M.Comparative study of alternative types of knowledge reduction in inconsistent systems [J].International Journal of Intelligent Systems,2001,16(1):105-120. [5] CHEN J,WANG G Y,HU J.Positive Domain Reduction Based on Dominance Relation in Inconsistent System[J].Computer Science,2008,35(3):216-218,227. [6] LIU G,FENG Y,YANG J.A common attribute reduction form for information systems[J].Knowledge-Based Systems,2020,193:105466. [7] GRECO S,MATARAZZO B,SLOWINSKI R.Rough sets theory for multicriteria decision analysis[J].European Journal of Operational Research,2001,129(1):1-47. [8] GRECO S,MATARAZZO B,SLOWINSKI R.Rough approxi-mation by dominance relations[J].International Journal of Intelligent Systems,2002,17(2):153-171. [9] CAO B R,LIU Y.Variable Precision Rough Set Model Based on Set Pair Situation Dominance Relationship[J].Computer Engineering,2015,41(11):35-40. [10] LI Y,ZHANG L,WANG X J,et al.Attribute Reduction for Sequential Three-way Decisions Under Dominance-Equivalence Relations[J].Computer Science,2019,46(2):242-248. [11] ANDERBERG M R.Cluster Analysis for Applications[M].New York:Academic Press,1973. [12] SUN J G,LIU J,ZHAO L Y.Clustering algorithms research[J].Journal of Software,2008,19(1):48-61. [13] LIU Y H,MA H F,LIU H J,et al.An overlapping subspace K-Means clustering algorithm[J].Computer Engineering,2020,46(8):58-63. [14] HUANG Z.Extensions to the K-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304. [15] HUANG Z,NG M.Fuzzy K-modes algorithm for clusteringcategorical data[J].IEEE Transactions on Fuzzy Systems,1999,7(4):446-452. [16] CHEN Y,SONG J J,YANG X B.Accelerator for finding reduct based on attribute group[J].Journal of Nanjing University of Science and Technology,2020,44(2):216-223. [17] CHEN Y,ZENG D S,XIE C.A Method of Attribute Reduction Based on Clustering[J].Computer Systems Applications,2009,18(5):173-176. [18] LU J,ZHANG T,REN H L.Reduction of attribute in decision table based on clustering rate[J].Computer Engineering and Application,2012(28):135-138,233. [19] CHEN Y C,LI O,SUN Y.Attribute reduction based on clustering discretization and variable precision neighborhood entropy[J].Control and Decision,2018,33(8):1407-1414. [20] ZIARKO W.Variable precision rough set model[J].Journal of Computer and System Sciences,1993,46(1):39-59. [21] UCI Machine Learning Repository[OL].https://archive.ics.uci.edu/ml/index.php. |
[1] | 秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111 |
[2] | 程富豪, 徐泰华, 陈建军, 宋晶晶, 杨习贝. 基于顶点粒k步搜索和粗糙集的强连通分量挖掘算法 Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory 计算机科学, 2022, 49(8): 97-107. https://doi.org/10.11896/jsjkx.210700202 |
[3] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[4] | 许思雨, 秦克云. 基于剩余格的模糊粗糙集的拓扑性质 Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices 计算机科学, 2022, 49(6A): 140-143. https://doi.org/10.11896/jsjkx.210200123 |
[5] | 方连花, 林玉梅, 吴伟志. 随机多尺度序决策系统的最优尺度选择 Optimal Scale Selection in Random Multi-scale Ordered Decision Systems 计算机科学, 2022, 49(6): 172-179. https://doi.org/10.11896/jsjkx.220200067 |
[6] | 杨斐斐, 沈思妤, 申德荣, 聂铁铮, 寇月. 面向数据融合的多粒度数据溯源方法 Method on Multi-granularity Data Provenance for Data Fusion 计算机科学, 2022, 49(5): 120-128. https://doi.org/10.11896/jsjkx.210300092 |
[7] | 陈于思, 艾志华, 张清华. 基于三角不等式判定和局部策略的高效邻域覆盖模型 Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy 计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302 |
[8] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[9] | 王子茵, 李磊军, 米据生, 李美争, 解滨. 基于误分代价的变精度模糊粗糙集属性约简 Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost 计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211 |
[10] | 王志成, 高灿, 邢金明. 一种基于正域的三支近似约简 Three-way Approximate Reduction Based on Positive Region 计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067 |
[11] | 薛占熬, 侯昊东, 孙冰心, 姚守倩. 带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法 Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes 计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042 |
[12] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[13] | 王栋, 周大可, 黄有达, 杨欣. 基于多尺度多粒度特征的行人重识别 Multi-scale Multi-granularity Feature for Pedestrian Re-identification 计算机科学, 2021, 48(7): 238-244. https://doi.org/10.11896/jsjkx.200600043 |
[14] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[15] | 吕乐宾, 刘群, 彭露, 邓维斌, 王崇宇. 结合多粒度信息的文本匹配融合模型 Text Matching Fusion Model Combining Multi-granularity Information 计算机科学, 2021, 48(6): 196-201. https://doi.org/10.11896/jsjkx.200700100 |
|