计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 445-452.
杨烽
YANG Feng
摘要: 在数据挖掘领域,基于符号型数据分组的数据预处理是一个极富挑战性的问题,它给人们提供了一种更加简化的数据表现形式。在已往的研究中,相关学者提出了许多解决方案,例如,运用粗糙集的方法来解决这一问题。文中提出了一种基于粒计算的符号型数据分组算法,主要分为粒度生成和粒度选择两个阶段。在粒度生成阶段,对于每一条属性,以对应属性值的聚类为叶子节点,自底向上以二进制树的形式构建粒层,形成属性树森林。在粒度选择阶段,以信息增益为基础,对每棵树进行全局考虑,选取最优的粒层,选层结果就是符号型数据的分组结果。实验结果表明,本算法呈现出比已有算法更加平衡的层次结构和更加优秀的压缩效率,具有较好的应用价值。
中图分类号:
[1]王齐,钱宇华,李飞江.基于空间结构的符号数据仿射传播算法[J].模式识别与人工智能,2016,29(12):1132-1139. [2]党红恩,赵尔平,刘炜,等.利用数据变换与并行运算的闭频繁项集挖掘方法[J].湘潭大学自然科学学报,2018,40(1):119-122. [3]BAZAN J G,NGUYEN H S,NGUYEN S H,et al.Rough Set Algorithms in Classification Problem[C]∥Rough set methods and applications.Physica-Verlag GmbH,2000:49-88. [4]MIN F,LIU Q,FANG C.Rough sets approach to symbolic value partition[J].International Journal of Approximate Reaso-ning,2008,49(3):689-700. [5]沈思倩,毛宇光,江冠儒.不完全数据集的差分隐私保护决策树研究[J].计算机科学,2017,44(6):139-143. [6]HOSSAIN M M,HABIB A,RAHMAN M S.Transliteration Based Bengali Text Compression using Huffman principle[C]∥International Conference on Informatics,Electronics & Vision.IEEE,2014:1-6. [7]朱淑芹,李俊青,葛广英.基于一个新的四维离散混沌映射的图像加密新算法[J].计算机科学,2017,44(1):188-193. [8]孙艳歌,王志海,原继东,等.基于信息熵的数据流自适应集成分类算法[J].中国科学技术大学学报,2017,47(7):575-582. [9]XU Y,CHEN B Z,HU Z C.Research for multi-sensor data fusion based on Huffman tree clustering algorithm in greenhouses[J].International Journal of Embedded Systems,2016,8(1):34. [10]曹鹏,栗伟,赵大哲.面向不均衡数据集的ARSGOS算法[J].小型微型计算机系统,2014,35(4):818-823. [11]FALANDYSZ J.Review:On published data and methods for selenium in mushrooms[J].Food Chemistry,2013,138(1):242-250. [12]YANG L,LUO P,CHEN C L,et al.A large-scale car dataset for fine-grained categorization and verification[C]∥Computer Vision and Pattern Recognition.IEEE,2015:3973-3981. [13]SHASHA D.Open Field Tic-Tac-Toe[J].Communications of the Acm,2017,60(1):112. [14]JONAS A.DieGSVP-Agenturen als Forschungsobjekt[M]∥Das Governance-System der GSVP:Die Rolle des EU-Satellitenzentrums und der Europischen Verteidigungsagentur.Nomos Verlagsgesellschaft mbH & Co.KG,2015:133-177. [15]赵继军,郭昆,冯楠,等.基于RSVP—TE的有向泛洪IRWA算法研究[J].光通信研究,2013(5):8-11. |
[1] | 方连花, 林玉梅, 吴伟志. 随机多尺度序决策系统的最优尺度选择 Optimal Scale Selection in Random Multi-scale Ordered Decision Systems 计算机科学, 2022, 49(6): 172-179. https://doi.org/10.11896/jsjkx.220200067 |
[2] | 赵志强, 易秀双, 李婕, 王兴伟. 基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究 Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm 计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001 |
[3] | 温馨, 闫心怡, 陈泽华. 基于等价关系的最小乐观概念格生成算法 Minimal Optimistic Concept Generation Algorithm Based on Equivalent Relations 计算机科学, 2021, 48(3): 163-167. https://doi.org/10.11896/jsjkx.200100046 |
[4] | 饶梦,苗夺谦,罗晟. 一种粗糙不确定的图像分割方法 Rough Uncertain Image Segmentation Method 计算机科学, 2020, 47(2): 72-75. https://doi.org/10.11896/jsjkx.190500177 |
[5] | 冯进展, 蔡淑琴. 融合信息增益和梯度下降算法的在线评论有用程度预测模型 Helpfulness Degree Prediction Model of Online Reviews Fusing Information Gain and Gradient Decline Algorithms 计算机科学, 2020, 47(10): 69-74. https://doi.org/10.11896/jsjkx.190700034 |
[6] | 延安, 闫心怡, 陈泽华. 一致决策信息系统规则提取的形式向量方法 Formal Vector Method of Rule Extraction for Consistent Decision Information System 计算机科学, 2019, 46(10): 236-241. https://doi.org/10.11896/jsjkx.190200270 |
[7] | 李虹利, 蒙祖强. 运用信息增益和不一致度进行填补的属性约简算法 Attribute Reduction Algorithm Using Information Gain and Inconsistency to Fill 计算机科学, 2018, 45(10): 217-224. https://doi.org/10.11896/j.issn.1002-137X.2018.10.040 |
[8] | 陈丽芳, 代琪, 付其峰. 基于粒计算的极限学习机模型设计与应用 Design and Application of Extreme Learning Machine Model Based on Granular Computing 计算机科学, 2018, 45(10): 59-63. https://doi.org/10.11896/j.issn.1002-137X.2018.10.012 |
[9] | 尚奥,裴晓鹏,吕迎春,陈泽华. 基于等价关系的完全确定时序逻辑电路状态化简算法 State Reduction Algorithm for Completely Specified Sequential Logic Circuit Based on Equivalence Relation 计算机科学, 2018, 45(1): 118-121. https://doi.org/10.11896/j.issn.1002-137X.2018.01.019 |
[10] | 叶晓庆,刘盾,梁德翠. 基于协同过滤的三支粒推荐算法研究 Three-way Granular Recommendation Algorithm Based on Collaborative Filtering 计算机科学, 2018, 45(1): 90-96. https://doi.org/10.11896/j.issn.1002-137X.2018.01.014 |
[11] | 吴珺,王春枝. 面向大数据的多维粒矩阵关联分析及应用 Multiple Correlation Analysis and Application of Granular Matrix Based on Big Data 计算机科学, 2017, 44(Z11): 407-410. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.086 |
[12] | 赵颖秀,刘文奇,李金海,赵宁. 基于粒计算与信息融合的P2P网贷用户信用评估 Credit Evaluation of P2P Lending User Based on Granular Computing and Information Fusion 计算机科学, 2016, 43(9): 242-246. https://doi.org/10.11896/j.issn.1002-137X.2016.09.048 |
[13] | 姜芳,李国和,岳翔. 基于语义的文档特征提取研究方法 Semantic-based Feature Extraction Method for Document 计算机科学, 2016, 43(2): 254-258. https://doi.org/10.11896/j.issn.1002-137X.2016.02.053 |
[14] | 郑鹭斌,陈玉明,曾志强,卢俊文. 二进制粒计算模型 Binary Granular Computing Model 计算机科学, 2016, 43(1): 270-274. https://doi.org/10.11896/j.issn.1002-137X.2016.01.058 |
[15] | 陈丽芳,陈亮,刘保相. 基于粒计算的哈夫曼树SVM多分类模型研究 Research of SVM Multiclass Model Based on Granular Computing & Huffman Tree 计算机科学, 2016, 43(1): 64-68. https://doi.org/10.11896/j.issn.1002-137X.2016.01.015 |
|