计算机科学 ›› 2018, Vol. 45 ›› Issue (10): 217-224.doi: 10.11896/j.issn.1002-137X.2018.10.040

• 人工智能 • 上一篇    下一篇

运用信息增益和不一致度进行填补的属性约简算法

李虹利, 蒙祖强   

  1. 广西大学计算机与电子信息学院 南宁530004
  • 收稿日期:2017-08-10 出版日期:2018-11-05 发布日期:2018-11-05
  • 作者简介:李虹利(1990-),男,硕士,主要研究方向为数据挖掘、机器学习;蒙祖强(1974-),男,博士,教授,主要研究方向为人工智能、数据挖掘与知识发现、智能决策、智能信息处理,E-mail:mengzuqiang@163.com(通信作者)。
  • 基金资助:
    国家自然科学基金项目(61762009,61363027),广西自然科学基金项目(2015GXNSFAA139292)资助

Attribute Reduction Algorithm Using Information Gain and Inconsistency to Fill

LI Hong-li, MENG Zu-qiang   

  1. College of Computer and Electronic Information,Guangxi University,Nanning 530004,China
  • Received:2017-08-10 Online:2018-11-05 Published:2018-11-05

摘要: 针对不完备、不一致性数据的属性约简是数据挖掘研究的一个重要内容。将信息增益,不一致度相结合,提出一种面向不完备不一致性数据的属性约简算法。首先,介绍了信息增益,定义了不一致度的概念与算法公式,并给出了基于二者对数据进行填补的方法;然后,基于该填补方法,以最大不一致度条件下的信息增益为权值,以不一致度为属性约简的启发信息,给出属性约简算法;最后,通过实验证明了所提算法的有效性。

关键词: 不完备, 不一致性, 填补, 信息增益, 属性约简

Abstract: The attribute reduction of incomplete and inconsistent data is a major content of data mining.Combining information gain and inconsistent degree of data,this paper proposed an attribute reduction algorithm for incomplete and inconsistent data.First,the information gain is introduced,and the concept and algorithm formula of inconsistent degree are defined.Besides,the method of data filling based on information gain and inconsistent degree is given.Then,based on this data filling method,the attribute reduction algorithm is provided with the information gain under the condition of taking the maximum inconsistent degree as the weight and inconsistent degree as heuristic information.Finally,the experimental results demonstrate the effectiveness of the proposed algorithm.

Key words: Attribute reduction, Filling, Incomplete, Inconsistent, Information gain

中图分类号: 

  • TP181
[1]PAWLAK Z.Rough Sets:Theoretical Aspects of Reasoning about Data[M].Kluwer Academic Publishers,1991,9:24-26.
[2]STEFANOWSKI J,TSOUKIS A.Incomplete Information Tables and Rough Classification[J].Computational Intelligence,2001,17(3):545-566.
[3]LIU P,QIU T R,XIONG X X,et al.An Incomplete Data Filling Approach Based on a New Valued Tolerance Relation[J].Open Automation & Control Systems Journal,2014,6(1):1456-1462.
[4]JIN C M,E X,MU H J,et al.Data Filling Method Based on New Relationship Matrix[J].Computer Engineering,2011,37(19):28-31.(in Chinese)
金成美,鄂旭,穆海军,等.一种基于新型关系矩阵的数据填补方法[J].计算机工程,2011,37(19):28-31.
[5]WU K K,PAN W.Attribute significance based imputation method[J].Computer Engineering and Design,2016,37(3):725-730.(in Chinese)
吴康康,潘巍.基于属性重要度的数据补齐方法[J].计算机工程与设计,2016,37(3):725-730.
[6]KIRAN P M,RAO A P,RATNAMALA B.An Efficient Approach for Filling Incomplete Data[C]∥National Conference on Advances in Computer Science and Applications with International Journal of Computer Applications(NCACSA 2012).2012:23-27.
[7]YANG X P.Completing incomplete data based on maximum similarity in Rough sets[J].Computer Engineering and Applications,2012,48(36):164-166.(in Chinese)
杨小平.粗集中最大相似度的不完备数据补齐[J].计算机工程与应用,2012,48(36):164-166.
[8]WU S,FENG X D,SHAN Z G.Missing Data Imputation Approach Based on Incomplete Data Clustering[J].Chinese Journal of Computers,2012,35(8):1726-1738.(in Chinese)
武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法[J].计算机学报,2012,35(8):1726-1738.
[9]YANG T,LUO J W,WANG Y,et al.Missing value estimation for gene expression data based on Mahalanobis distance[J].Computer Applications,2005,25(12):2868-2871.(in Chinese)
杨涛,骆嘉伟,王艳,等.基于马氏距离的缺失值填充算法[J].计算机应用,2005,25(12):2868-2871.
[10]KIM K Y,KIM B J,YI G S.Reuse of imputed data in microarray analysis increases imputation efficiency[J].Bmc Bioinformatics,2004,5(1):160.
[11]CHEN Z K,YANG Y D,ZHANG Q C,et al.Novel algorithm for filling incomplete data of internet of things based on attri-bute reduction[J].Computer Engineering and Design,2013,34(2):418-422.(in Chinese)
陈志奎,杨英达,张清辰,等.基于属性约简的物联网不完全数据填充算法[J].计算机工程与设计,2013,34(2):418-422.
[12]ZHANG H X.Missing data imputation:Information gain based on approach[J].Computer Engineering and Design,2006,27(24):4810-4812.(in Chinese)
张红霞.缺失值填充:基于信息增益的方法[J].计算机工程与设计,2006,27(24):4810-4812.
[13]QIN Z.Information Gain based Algorithm for Filling Missing Data[J].Microcomputer Information,2007,23(12):180-181.(in Chinese)
覃泽.基于信息增益的数据库缺失值填充算法[J].微计算机信息,2007,23(12):180-181.
[14]KRYSZKIEWICZ M.Rough Set Approach to Incomplete Information System[J].Information Sciences,1998,112(1-4):39-49.
[15]WANG G Y.Extension of Rough Set Under Incomplete Information systems[J].Journal of Computer Research and Development,2002,39(10):1238-1243.(in Chinese)
王国胤.Rough 集理论在不完备信息系统中的扩充[J].计算机研究与发展,2002,39(10):1238-1243.
[16]FU A,WANG G Y,HU J.Information entropy based attribute reduction algorithm in incomplete information systems[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2008,20(5):586-592.(in Chinese)
付昂,王国胤,胡军.基于信息熵的不完备信息系统属性约简算法[J].重庆邮电大学学报(自然科学版),2008,20(5):586-592.
[17]TAO Z,LIU Q Z,LI W M.Attribute reduction based on GA under incomplete information system[J].Systems Engineering and Electronics,2007,29(9):1484-1487.(in Chinese)
陶志,刘庆拯,李卫民.基于遗传算法的不完备信息系统属性约简方法[J].系统工程与电子技术,2007,29(9):1484-1487.
[18]KRYSZKIEWICZ M.Rules in incomplete information systems[J].Information Sciences,1999,113(3-4):271-292.
[19]XIE H,CHENG H Z,NIU D X.Discretization of Continuous Attributes in Rough Set Theory Based on Information Entropy[J].Chinese Journal of Computers,2005,28(9):1570-1574.(in Chinese)
谢宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法[J].计算机学报,2005,28(9):1570-1574.
[20]蒋盛益,李霞,郑琪.数据挖掘原理与实践[M].北京:电子工业出版社,2011:48-58.
[21]FU M L,ZENG H L.Oprimization Selection and Rules Extraction in Inconsistent and Incomplete Information System[J].Computer Science,2007,34(10):208-211.(in Chinese)
伏明兰,曾黄麟.一种不一致不完备信息系统的最优选择及规则约简方法研究[J].计算机科学,2007,34(10):208-211.
[22]HE W,LIU C Y,ZHAO J,et al.An Algorithm of Attributes Reduction in Incomplete Information System[J].ComputerScien-ce,2004,31(2):117-119.(in Chinese)
何伟,刘春亚,赵军,等.不完备信息系统下的属性约简算法[J].计算机科学,2004,31(2):117-119.
[23]MENG Z Q,XU K,ZHOU S Q.Maximum distribution reduction and computation methods for incomplete inconsistent decision systems[J].Journal of Guangxi Normal University(Natural Science Edition),2011,29(3):89-93.(in Chinese)
蒙祖强,许珂,周石泉.不完备不一致决策系统的最大分布约简及计算方法[J].广西师范大学学报(自然科学版),2011,29(3):89-93.
[24]MENG Z Q,SHI Z Z.A fast approach to attribute reduction in incomplete decision systems with tolerance relation—based rough sets[J].Information Sciences,2009,179(16):2774-2793.
[25]MA F M,LIU T T,XU A P.Data completion with rough sets based on fuzzy weighted similarity measure [J].Computer Engineering and Applications,2016,52(9):62-66.(in Chinese)
马福民,刘涛涛,徐安平.基于模糊加权相似度量的粗糙集数据补齐方法[J].计算机工程与应用,2016,52(9):62-66.
[26]YANG C Q.The attribute reduction algorithms based on rough sets[J].Journal of Northwest University(Natural Science Edition),2012,42(2):223-225.(in Chinese)
杨常清.基于粗糙集的属性约简算法[J].西北大学学报(自然科学版),2012,42(2):223-225.
[27]YE D Y.An Improvement to Jelonek′s Attribute Reduction Algorithm[J].Acta Electronica Sinca,2000,28(12):81-82.(in Chinese)
叶东毅.Jelonek属性约简算法的一个改进[J].电子学报,2000,28(12):81-82.
[1] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[2] 李霞, 马茜, 白梅, 王习特, 李冠宇, 宁博.
RIIM:基于独立模型的在线缺失值填补
RIIM:Real-Time Imputation Based on Individual Models
计算机科学, 2022, 49(8): 56-63. https://doi.org/10.11896/jsjkx.210600180
[3] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[4] 王子茵, 李磊军, 米据生, 李美争, 解滨.
基于误分代价的变精度模糊粗糙集属性约简
Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost
计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211
[5] 王志成, 高灿, 邢金明.
一种基于正域的三支近似约简
Three-way Approximate Reduction Based on Positive Region
计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067
[6] 薛占熬, 侯昊东, 孙冰心, 姚守倩.
带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法
Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes
计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042
[7] 李艳, 范斌, 郭劼, 林梓源, 赵曌.
基于k-原型聚类和粗糙集的属性约简方法
Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets
计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053
[8] 赵志强, 易秀双, 李婕, 王兴伟.
基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究
Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm
计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001
[9] 曾惠坤, 米据生, 李仲玲.
形式背景中概念及约简的动态更新方法
Dynamic Updating Method of Concepts and Reduction in Formal Context
计算机科学, 2021, 48(1): 131-135. https://doi.org/10.11896/jsjkx.200800018
[10] 薛占熬, 张敏, 赵丽平, 李永祥.
集对优势关系下多粒度决策粗糙集的可变三支决策模型
Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation
计算机科学, 2021, 48(1): 157-166. https://doi.org/10.11896/jsjkx.191200175
[11] 桑彬彬, 杨留中, 陈红梅, 王生武.
优势关系粗糙集增量属性约简算法
Incremental Attribute Reduction Algorithm in Dominance-based Rough Set
计算机科学, 2020, 47(8): 137-143. https://doi.org/10.11896/jsjkx.190700188
[12] 岳晓威, 彭莎, 秦克云.
基于面向对象(属性)概念格的形式背景属性约简方法
Attribute Reduction Methods of Formal Context Based on ObJect (Attribute) Oriented Concept Lattice
计算机科学, 2020, 47(6A): 436-439. https://doi.org/10.11896/JsJkx.191100011
[13] 陈毅宁,陈红梅.
基于距离比值尺度的模糊粗糙集属性约简
Attribute Reduction of Fuzzy Rough Set Based on Distance Ratio Scale
计算机科学, 2020, 47(3): 67-72. https://doi.org/10.11896/jsjkx.190100196
[14] 徐怡,唐静昕.
基于优化可辨识矩阵和改进差别信息树的属性约简算法
Attribute Reduction Algorithm Based on Optimized Discernibility Matrix and Improving Discernibility Information Tree
计算机科学, 2020, 47(3): 73-78. https://doi.org/10.11896/jsjkx.190500125
[15] 侯成军,米据生,梁美社.
基于局部可调节多粒度粗糙集的属性约简
Attribute Reduction Based on Local Adjustable Multi-granulation Rough Set
计算机科学, 2020, 47(3): 87-91. https://doi.org/10.11896/jsjkx.190500162
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!