计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 119-126.doi: 10.11896/jsjkx.240600043

• 计算机软件 • 上一篇    下一篇

基于同层粒度类别关联程度计算的多路径选择分层分类

张悦康1, 折延宏2   

  1. 1 西安石油大学计算机学院 西安 710065
    2 西安石油大学理学院 西安 710065
  • 收稿日期:2024-06-05 修回日期:2024-09-24 发布日期:2025-07-17
  • 通讯作者: 折延宏(yanhongshe@xsyu.edu.cn)
  • 作者简介:(ZhangYKtryharder@outlook.com)
  • 基金资助:
    国家自然科学基金(12471442);陕西省自然科学基础研究计划(2023-JC-YB-027);陕西数理基础科学研究项目(23JSQ047);陕西省教育厅高校青年创新团队项目(23JP130)

Hierarchical Classification with Multi-path Selection Based on Calculation of Correlation Degree of Granularity Categories in the Same Level

ZHANG Yuekang1, SHE Yanhong2   

  1. 1 College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
    2 College of Science, Xi'an Shiyou University, Xi'an 710065, China
  • Received:2024-06-05 Revised:2024-09-24 Published:2025-07-17
  • About author:ZHANG Yuekang,born in 1999,postgraduate.His main research interests include fuzzy rough set and hierarchical classification.
    SHE Yanhong,born in 1983.Ph.D,professor,is a senior member of CCF(No.43154M).His main research interests include artificial Intelligence,rough set theory and uncertainty reasoning.
  • Supported by:
    National Natural Science Foundation of China(12471442),Natural Science Basic Research Plan of Shaanxi Pro-vince,China(2023-JC-YB-027),Shaanxi Fundamental Science Research Project for Mathematics and Physics(23JSQ047) and Youth Innovation Team of Shaanxi Universities Funded by Education Department of Shaanxi Provincial Goverment(23JP130).

摘要: 分层分类是数据挖掘领域中的一个重要分支,通过挖掘数据之间的信息,将数据有组织地构建为层次结构。然而,层间误差传播是分层分类中一个不可避免的问题。为有效缓解层间误差传播问题,提出一种基于同层类别关联关系的多路径选择的分层分类方法。首先,通过预测类别和真实类别的分布,构造类别之间的相关性矩阵。其次,受点互信息PMI的启发,设计出一种度量同层类别之间的关联程度的方法RPMI,并基于RPMI计算出同层类别之间的关联程度。然后,在层次结构中自上而下地递归使用逻辑回归在每层选择预测类别,并通过选择与预测类别关联程度较大的类别,确定当前层的多个候选类别。最后,使用随机森林从多路径预测的结果中选出最佳预测类别。在5个数据集上对该方法进行评估,证明了其具有较好的分类性能。

关键词: 分层分类, 点互信息, 多路径选择, 统计, 关联程度

Abstract: Hierarchical classification is an important branch in the field of data mining,which organizes data into a hierarchical structure by mining information between data.However,inter-level error propagation is an inevitable problem in hierarchical classification.This paper proposes a hierarchical classification method based on multi-path selection of the association relationship between categories in the same level,which can effectively alleviate the problem of error propagation between levels.Firstly,the correlation matrix between categories is constructed by the distribution of predicted categories and true categories.Then,inspiring by the pointwise mutual information(PMI),it designes a measurement method RPMI of the degree of correlation between categories in the same level,and the degree of correlation between categories in the same level is calculated based on RPMI.Secondly,logistic regression is used recursively from top to bottom in the hierarchical structure to select prediction categories at each level,and multiple candidate categories at the current level are determined by selecting categories that are more closely related to the prediction categories.Finally,Random Forest is used to select the best prediction category from the results of multi-path prediction.The proposed method is evaluated on five datasets,demonstrating that the method has a good classification performance.

Key words: Hierarchical classification, Pointwise mutual information(PMI), Multipath selection, Statistics, Correlation degree

中图分类号: 

  • TP311
[1]ZHAO T Y,ZHANG B P,HE M,et al.Embedding visual hierarchy with deep networks for large-scale visual recognition[J].IEEE Transactions on Image Processing,2018,27(10):4740-4755.
[2]ZHAO H,HU Q H,ZHU P F,et al.A recursive regularization based feature selection framework for hierarchical classification[J].IEEE Transactions on Knowledge and Data Engineering,2019,33(7):2833-2846.
[3]FREEMAN C,KULIC D,BASIR O.Feature-selected tree-based classification[J].IEEE Transactions on Cybernetics,2013,43(6):1990-2004.
[4]ZHAO H,ZHU P F,WANG P,et al.Hierarchical Feature Selection with Recursive Regularization[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.Melbourne:AAAI,2017:3483-3489.
[5]ZHU X F,LI X L,ZHANG S C.Block-row sparse multiviewmultilabel learning for image classification[J].IEEE Transactions on Cybernetics,2015,46(2):450-461.
[6]AZHAR F,LI C T.Hierarchical relaxed partitioning system for activity recognition[J].IEEE Transactions on Cybernetics,2016,47(3):784-795.
[7]FAN J P,ZHANG J,MEI K Z,et al.Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection[J].Pattern Recognition,2015,48(5):1673-1687.
[8]GUO S X,ZHAO H,YANG W Y.Hierarchical feature selection with multi-granularity clustering structure[J].Information Sciences,2021,568:448-462.
[9]GHAZI D,INKPEN D,SZPAKOWICZ S.Hierarchical versusflat classification of emotions in text[C]//Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.2010:140-146.
[10]WAN C,FREITAS A A.An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features[J].Artificial Intelligence Review,2018,50:201-240.
[11]HU Q H,WANG Y,ZHOU Y C,et al.Review of hierarchical learning methods for large-scale classification tasks[J].Chinese Science:Information Science,2018,48(5):487-500.
[12]GAO T S,KOLLER D.Discriminative learning of relaxed hierarchy for large-scale visual recognition[C]//2011 International Conference on Computer Vision.IEEE,2011:2072-2079.
[13]QU Y Y,LIN L,SHEN F M,et al.Joint hierarchical category structure learning and large-scale image classification[J].IEEE Transactions on Image Processing,2016,26(9):4331-4346.
[14]GUO S X,ZHAO H.Hierarchical classification with multi-path selection based on granular computing[J].Artificial Intelligence Review,2021,54(3):2067-2089.
[15]ZHENG W J,ZHAO H.Cost-sensitive hierarchical classification via multi-scale information entropy for data with an imbalanced distribution[J].Applied Intelligence,2021,51(8):5940-5952.
[16]CHEN Y L,HU H W,TANG K.Constructing a decision tree from data with hierarchical class labels[J].Expert Systems with Applications,2009,36(3):4838-4847.
[17]DING C H Q,DUBCHAK I.Multi-class protein fold recognition using support vector machines and neural networks[J].Bioinformatics,2001,17(4):349-358.
[18]EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The pascal visual object classes(VOC) challenge[J].International Journal of Computer Vision,2010,88:303-338.
[19]DIMITROVSKI I,KOCEV D,LOSKOVSKA S,et al.Hierarchical annotation of medical images[J].Pattern Recognition,2011,44(10/11):2436-2449.
[20]WANG Y,HU Q H,ZHOU Y C,et al.Local bayes risk minimization based stopping strategy for hierarchical classification[C]//2017 IEEE International Conference on Data Mining(ICDM).IEEE,2017:515-524.
[21]ZHENG W J,ZHAO H.Cost-sensitive hierarchical classification for imbalance classes[J].Applied Intelligence,2020,50(8):2328-2338.
[22]KOSMOPOULOS A,PARTALAS I,GAUSSIER E,et al.Evaluation measures for hierarchical classification:a unified view and novel approaches[J].Data Mining and Knowledge Discovery,2015,29:820-865.
[23]DEKEL O,KESHET J,SINGER Y.Large margin hierarchicalclassification[C]//Proceedings of the Twenty-first International Conference on Machine Learning.2004.
[24]FRIEDMAN M.A comparison of alternative tests of significance for the problem of m rankings[J].The Annals of Mathematical Statistics,1940,11(1):86-92.
[25]DUNN O J.Multiple comparisons among means[J].Journal of the American Statistical Association,1961,56(293):52-64.
[26]DEMSAR J.Statistical comparisons of classifiers over multiple data sets[J].The Journal of Machine Learning Research,2006,7:1-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!