计算机科学 ›› 2018, Vol. 45 ›› Issue (10): 217-224.doi: 10.11896/j.issn.1002-137X.2018.10.040
李虹利, 蒙祖强
LI Hong-li, MENG Zu-qiang
摘要: 针对不完备、不一致性数据的属性约简是数据挖掘研究的一个重要内容。将信息增益,不一致度相结合,提出一种面向不完备不一致性数据的属性约简算法。首先,介绍了信息增益,定义了不一致度的概念与算法公式,并给出了基于二者对数据进行填补的方法;然后,基于该填补方法,以最大不一致度条件下的信息增益为权值,以不一致度为属性约简的启发信息,给出属性约简算法;最后,通过实验证明了所提算法的有效性。
中图分类号:
[1]PAWLAK Z.Rough Sets:Theoretical Aspects of Reasoning about Data[M].Kluwer Academic Publishers,1991,9:24-26. [2]STEFANOWSKI J,TSOUKIS A.Incomplete Information Tables and Rough Classification[J].Computational Intelligence,2001,17(3):545-566. [3]LIU P,QIU T R,XIONG X X,et al.An Incomplete Data Filling Approach Based on a New Valued Tolerance Relation[J].Open Automation & Control Systems Journal,2014,6(1):1456-1462. [4]JIN C M,E X,MU H J,et al.Data Filling Method Based on New Relationship Matrix[J].Computer Engineering,2011,37(19):28-31.(in Chinese) 金成美,鄂旭,穆海军,等.一种基于新型关系矩阵的数据填补方法[J].计算机工程,2011,37(19):28-31. [5]WU K K,PAN W.Attribute significance based imputation method[J].Computer Engineering and Design,2016,37(3):725-730.(in Chinese) 吴康康,潘巍.基于属性重要度的数据补齐方法[J].计算机工程与设计,2016,37(3):725-730. [6]KIRAN P M,RAO A P,RATNAMALA B.An Efficient Approach for Filling Incomplete Data[C]∥National Conference on Advances in Computer Science and Applications with International Journal of Computer Applications(NCACSA 2012).2012:23-27. [7]YANG X P.Completing incomplete data based on maximum similarity in Rough sets[J].Computer Engineering and Applications,2012,48(36):164-166.(in Chinese) 杨小平.粗集中最大相似度的不完备数据补齐[J].计算机工程与应用,2012,48(36):164-166. [8]WU S,FENG X D,SHAN Z G.Missing Data Imputation Approach Based on Incomplete Data Clustering[J].Chinese Journal of Computers,2012,35(8):1726-1738.(in Chinese) 武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法[J].计算机学报,2012,35(8):1726-1738. [9]YANG T,LUO J W,WANG Y,et al.Missing value estimation for gene expression data based on Mahalanobis distance[J].Computer Applications,2005,25(12):2868-2871.(in Chinese) 杨涛,骆嘉伟,王艳,等.基于马氏距离的缺失值填充算法[J].计算机应用,2005,25(12):2868-2871. [10]KIM K Y,KIM B J,YI G S.Reuse of imputed data in microarray analysis increases imputation efficiency[J].Bmc Bioinformatics,2004,5(1):160. [11]CHEN Z K,YANG Y D,ZHANG Q C,et al.Novel algorithm for filling incomplete data of internet of things based on attri-bute reduction[J].Computer Engineering and Design,2013,34(2):418-422.(in Chinese) 陈志奎,杨英达,张清辰,等.基于属性约简的物联网不完全数据填充算法[J].计算机工程与设计,2013,34(2):418-422. [12]ZHANG H X.Missing data imputation:Information gain based on approach[J].Computer Engineering and Design,2006,27(24):4810-4812.(in Chinese) 张红霞.缺失值填充:基于信息增益的方法[J].计算机工程与设计,2006,27(24):4810-4812. [13]QIN Z.Information Gain based Algorithm for Filling Missing Data[J].Microcomputer Information,2007,23(12):180-181.(in Chinese) 覃泽.基于信息增益的数据库缺失值填充算法[J].微计算机信息,2007,23(12):180-181. [14]KRYSZKIEWICZ M.Rough Set Approach to Incomplete Information System[J].Information Sciences,1998,112(1-4):39-49. [15]WANG G Y.Extension of Rough Set Under Incomplete Information systems[J].Journal of Computer Research and Development,2002,39(10):1238-1243.(in Chinese) 王国胤.Rough 集理论在不完备信息系统中的扩充[J].计算机研究与发展,2002,39(10):1238-1243. [16]FU A,WANG G Y,HU J.Information entropy based attribute reduction algorithm in incomplete information systems[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2008,20(5):586-592.(in Chinese) 付昂,王国胤,胡军.基于信息熵的不完备信息系统属性约简算法[J].重庆邮电大学学报(自然科学版),2008,20(5):586-592. [17]TAO Z,LIU Q Z,LI W M.Attribute reduction based on GA under incomplete information system[J].Systems Engineering and Electronics,2007,29(9):1484-1487.(in Chinese) 陶志,刘庆拯,李卫民.基于遗传算法的不完备信息系统属性约简方法[J].系统工程与电子技术,2007,29(9):1484-1487. [18]KRYSZKIEWICZ M.Rules in incomplete information systems[J].Information Sciences,1999,113(3-4):271-292. [19]XIE H,CHENG H Z,NIU D X.Discretization of Continuous Attributes in Rough Set Theory Based on Information Entropy[J].Chinese Journal of Computers,2005,28(9):1570-1574.(in Chinese) 谢宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法[J].计算机学报,2005,28(9):1570-1574. [20]蒋盛益,李霞,郑琪.数据挖掘原理与实践[M].北京:电子工业出版社,2011:48-58. [21]FU M L,ZENG H L.Oprimization Selection and Rules Extraction in Inconsistent and Incomplete Information System[J].Computer Science,2007,34(10):208-211.(in Chinese) 伏明兰,曾黄麟.一种不一致不完备信息系统的最优选择及规则约简方法研究[J].计算机科学,2007,34(10):208-211. [22]HE W,LIU C Y,ZHAO J,et al.An Algorithm of Attributes Reduction in Incomplete Information System[J].ComputerScien-ce,2004,31(2):117-119.(in Chinese) 何伟,刘春亚,赵军,等.不完备信息系统下的属性约简算法[J].计算机科学,2004,31(2):117-119. [23]MENG Z Q,XU K,ZHOU S Q.Maximum distribution reduction and computation methods for incomplete inconsistent decision systems[J].Journal of Guangxi Normal University(Natural Science Edition),2011,29(3):89-93.(in Chinese) 蒙祖强,许珂,周石泉.不完备不一致决策系统的最大分布约简及计算方法[J].广西师范大学学报(自然科学版),2011,29(3):89-93. [24]MENG Z Q,SHI Z Z.A fast approach to attribute reduction in incomplete decision systems with tolerance relation—based rough sets[J].Information Sciences,2009,179(16):2774-2793. [25]MA F M,LIU T T,XU A P.Data completion with rough sets based on fuzzy weighted similarity measure [J].Computer Engineering and Applications,2016,52(9):62-66.(in Chinese) 马福民,刘涛涛,徐安平.基于模糊加权相似度量的粗糙集数据补齐方法[J].计算机工程与应用,2016,52(9):62-66. [26]YANG C Q.The attribute reduction algorithms based on rough sets[J].Journal of Northwest University(Natural Science Edition),2012,42(2):223-225.(in Chinese) 杨常清.基于粗糙集的属性约简算法[J].西北大学学报(自然科学版),2012,42(2):223-225. [27]YE D Y.An Improvement to Jelonek′s Attribute Reduction Algorithm[J].Acta Electronica Sinca,2000,28(12):81-82.(in Chinese) 叶东毅.Jelonek属性约简算法的一个改进[J].电子学报,2000,28(12):81-82. |
[1] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[2] | 李霞, 马茜, 白梅, 王习特, 李冠宇, 宁博. RIIM:基于独立模型的在线缺失值填补 RIIM:Real-Time Imputation Based on Individual Models 计算机科学, 2022, 49(8): 56-63. https://doi.org/10.11896/jsjkx.210600180 |
[3] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[4] | 王子茵, 李磊军, 米据生, 李美争, 解滨. 基于误分代价的变精度模糊粗糙集属性约简 Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost 计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211 |
[5] | 王志成, 高灿, 邢金明. 一种基于正域的三支近似约简 Three-way Approximate Reduction Based on Positive Region 计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067 |
[6] | 薛占熬, 侯昊东, 孙冰心, 姚守倩. 带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法 Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes 计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042 |
[7] | 李艳, 范斌, 郭劼, 林梓源, 赵曌. 基于k-原型聚类和粗糙集的属性约简方法 Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets 计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053 |
[8] | 赵志强, 易秀双, 李婕, 王兴伟. 基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究 Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm 计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001 |
[9] | 曾惠坤, 米据生, 李仲玲. 形式背景中概念及约简的动态更新方法 Dynamic Updating Method of Concepts and Reduction in Formal Context 计算机科学, 2021, 48(1): 131-135. https://doi.org/10.11896/jsjkx.200800018 |
[10] | 薛占熬, 张敏, 赵丽平, 李永祥. 集对优势关系下多粒度决策粗糙集的可变三支决策模型 Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation 计算机科学, 2021, 48(1): 157-166. https://doi.org/10.11896/jsjkx.191200175 |
[11] | 桑彬彬, 杨留中, 陈红梅, 王生武. 优势关系粗糙集增量属性约简算法 Incremental Attribute Reduction Algorithm in Dominance-based Rough Set 计算机科学, 2020, 47(8): 137-143. https://doi.org/10.11896/jsjkx.190700188 |
[12] | 岳晓威, 彭莎, 秦克云. 基于面向对象(属性)概念格的形式背景属性约简方法 Attribute Reduction Methods of Formal Context Based on ObJect (Attribute) Oriented Concept Lattice 计算机科学, 2020, 47(6A): 436-439. https://doi.org/10.11896/JsJkx.191100011 |
[13] | 陈毅宁,陈红梅. 基于距离比值尺度的模糊粗糙集属性约简 Attribute Reduction of Fuzzy Rough Set Based on Distance Ratio Scale 计算机科学, 2020, 47(3): 67-72. https://doi.org/10.11896/jsjkx.190100196 |
[14] | 徐怡,唐静昕. 基于优化可辨识矩阵和改进差别信息树的属性约简算法 Attribute Reduction Algorithm Based on Optimized Discernibility Matrix and Improving Discernibility Information Tree 计算机科学, 2020, 47(3): 73-78. https://doi.org/10.11896/jsjkx.190500125 |
[15] | 侯成军,米据生,梁美社. 基于局部可调节多粒度粗糙集的属性约简 Attribute Reduction Based on Local Adjustable Multi-granulation Rough Set 计算机科学, 2020, 47(3): 87-91. https://doi.org/10.11896/jsjkx.190500162 |
|