计算机科学 ›› 2015, Vol. 42 ›› Issue (7): 285-290.doi: 10.11896/j.issn.1002-137X.2015.07.061

• 人工智能 • 上一篇    下一篇

缺失数据数据集的组增量式特征选择

王 锋 魏 巍   

  1. 山西大学计算机与信息技术学院 太原030006
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61402272)资助

Group Feature Selection Algorithm for Data Sets with Missing Data

WANG Feng WEI Wei   

  • Online:2018-11-14 Published:2018-11-14

摘要: 实际应用中获取到的数据集通常是动态增加的,且随着数据获取工具的迅速发展,新数据通常会一组一组地增加。为此,针对含有缺失数据的动态数据集,基于粗糙集理论,提出了一种组增量式的粗糙特征选择算法。首先分析、证明了信息熵的组增量计算公式,并以信息熵作为特征重要度的度量,在此基础上设计了基于信息熵的可有效处理含有缺失数据的动态数据集的组增量式特征选择算法。实验结果进一步证明了新算法的可行性和高效性。

关键词: 动态数据集,缺失数据,信息熵,组增量特征选择

Abstract: Many real data increase dynamically in size.With the rapid development of data processing tools,new data are usually increased in groups.In this paper,based on rough set theory,a group rough feature selection algorithm was proposed to deal with dynamic data sets with missing data.Firstly,the group incremental mechanism of information entropy was analyzed,and then significance of feature was defined based on the mechanism.On this basis,a group feature selection algorithm was constructed,which can be used to deal with dynamic data sets with missing data effectively.Experimental results show that the new algorithm is feasible and efficient.

Key words: Dynamic data sets,Missing data,Information entropy,Group feature selection

[1] Blum A L,Langley P.Selection of relevant features and examples in machine learning [J].Artificial Intelligence,1997,97:245-271
[2] Liu H,Yu L.Toward integrating feature selection algorithmsfor classification and clustering [J].IEEE Transaction on Knowledge and Data Engineering,2005,17(4):491-502
[3] Jain A,Zongker D.Feature selection:evaluation,application,and small sample performance [J].IEEE Transaction on Pattern Analysis and Machine Intelligence,1997,19(2):153-158
[4] Liang J Y,Wang F,Dang C Y,et al.An efficient rough feature selection algorithm with a multi-granulation view [J].International Journal of Approximate Reasoning,2012,53:912-926
[5] 王锋,梁吉业,钱宇华.序信息系统的启发式属性约简算法[J].计算机科学,2010,37(3):258-260,278 Wang Feng,Liang Ji-ye,Qian Yu-hua.Heuristic attribute reduction algorithm to ordered information system[J].Computer Science,2010,7(3):258-260,8
[6] 徐章艳,刘作鹏,杨炳儒,等.一个复杂度为max(O(|C||U|),O(|C|2|U/C|)) 的快速属性约简算法[J].计算机学报,2006,29(3):391-399 Xu Zhang-yan,Liu Zuo-peng,Yang Bing-ru,et al.A quick attribute reduction algorithm with complexity of max (O(|C||U|),O(|C|2|U/C|))[J].Chinese Journal of Computers, 2006,9(3):391-399
[7] Hu Q H,Xie Z X,Yu D R.Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [J].Pattern Recognition,2007,40:3509-3521
[8] Wang F,Liang J Y,Qian Y H.Attribute reduction:a dimension incremental strategy [J].Knowledge-Based Systems,2013,39:95-108
[9] Wang F,Liang J Y,Dang C Y.Attribute reduction for dynamic data sets [J].Applied Soft Computing,2013,13:676-689
[10] Liu D,Li T R,Ruan D,et al.An incremental approach for inducing knowledge from dynamic information systems [J].Fundamenta Informaticae,2009,94:245-260
[11] 刘宗田.属性最小约简的增量式算法[J].电子学报,1999,27(11):96-98 Liu Zong-tian.An incremental arithmetic for the smallest reduction of attributes[J].Acta Electronica Sinica,1999,7(11):96-98
[12] Orlowska M,Orlowski M.Maintenance of knowledge in dynamic information systems [M]∥Slowinski R,ed.Intelligent Decision Support:Handbook of Applications and Advances of the Rough Set Theory.Kluwer Academic Publishers,Dordrecht,1992:315-330
[13] 杨明.一种基于改进差别矩阵的核增量式更新算法[J].计算机学报,2006,29(3):407-413 Yang Ming.An incremental updating algorithm for core based on improved discernibility matrix[J].Chinese Journal of Computers,2006,9(3):407-413
[14] 杨明.一种基于改进差别矩阵的属性约简增量式更新算法[J].计算机学报,2007,30(5):815-822 Yang Ming.An incremental updating algorithm for attribute reduction based on improved discernibility matrix[J].Chinese Journal of Computers,2007,0(5):815-822
[15] Hu F,Wang G Y,Huang H,et al.Incremental attribute reduction based on elementary sets [C]∥Proceedings of the 10th International Conference on Rough Sets,Fuzzy Sets,Data Mining and Granular Computing.Regina,Canada,2005:185-193
[16] 梁吉业,魏巍,钱宇华.一种基于条件熵的增量核求解方法[J].系统工程理论与实践,2008,28(4):81-89 Liang Ji-ye,Wei Wei,Qian Yu-hua.An incremental approach to computing of a core based on conditional entropy[J].Chinese Journal of System Engineering Theory and Practice,2008,8(4):81-89
[17] 刘薇,梁吉业,魏巍,等.一种基于条件熵的增量式属性约简求解算法[J].计算机科学,2011,38(1):229-231,239 Liu Wei,Liang Ji-ye,Wei Wei,et al.An incremental approach to computing of attribute reduction based on conditional entropy[J].Computer Science,2011,8(1):229-231,9
[18] Liang J Y,Wang F,Dang C Y,et al.A group incremental approach to feature selection applying rough set technique [J].IEEE Transactions on Knowledge and Data Engineering,2014,26(2):294-308
[19] Kryszkiewicz M.Rough set approach to incomplete information systems [J].Information Sciences,1998,112:39-49
[20] Slowinskir R,Vsnderprooten D.A generalized definition ofrough approximations based on similarity [J].IEEE Transactions on Data and Knowledge Engineering,2000,12(2):331-336
[21] Liang J Y,Xu Z B.The algorithm on knowledge reduction in incomplete information systems [J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(1):95-103
[22] 黄兵,周献中,张蓉蓉.基于信息量的不完备信息系统属性约简[J].系统工程理论与实践,2005,25(4):55-60 Huang Bing,Zhou Xian-zhong,Zhang Rong-rong.Attribute reduction based on information quantity under incomplete information systems[J].Chinese Journal of System Engineering Theory and Practice,2005,5(4):55-60
[23] 钱宇华,梁吉业,王锋.面向非完备决策表的正向近似特征选择加速算法[J].计算机学报,2011,34(3):435-442 Qian Yu-hua,Liang Ji-ye,Wang Feng.A positive approximation based accelerated algorithm to feature selection from incomplete decision tables [J].Chinese Journal of Computers,2011,4(3):435-442
[24] Liang J Y,Shi Z Z,Li D Y,et al.The information entropy,roughentropy and knowledge granulation in incomplete information systems [J].International Journal of General Systems,2006,34(1):641-654
[25] Li T R,Ruan D,Geert W,et al.A rough sets based characteristic relation approach for dynamic attribute generalization in data mining [J].Knowledge-Based Systems,2007,20(5):485-494
[26] Pawlak Z,Skowron A.Rudiments of rough sets [J].Information Sciences,2007,177(1):3-27

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!