计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 69-74.doi: 10.11896/j.issn.1002-137X.2019.06.009
曾庆田1,2, 刘晨征1, 倪维健1, 段华3
ZENG Qing-tian1,2, LIU Chen-zheng1, NI Wei-jian1, DUAN Hua3
摘要: 序数回归(也称序数分类)是一种监督学习任务,即使用具有自然顺序的标签对数据项进行分类。序数回归与诸多实际问题密切相关,近几年关于序数回归的研究受到越来越多的关注。序数回归与其他监督学习任务(分类、回归等)一样,需要通过特征提取来提高模型的效率和准确性。虽然特征提取被广泛研究并用于分类学习任务中,但是在序数回归中的研究较少。众所周知,相比单特征,组合特征可以表达更多的数据底层语义,但是加入一般的组合特征很难提高模型的准确性。文中基于频繁模式挖掘,借助K-L散度值来选取最有区分能力的频繁模式进行特征组合,提出了一种新的序数回归组合特征提取方法,并在公开数据集和自有数据集上使用多个序数回归模型进行实验。结果表明,使用最有区分能力的频繁模式组合特征,能够有效提升大多数序数回归模型的训练效果。
中图分类号:
[1]GUTIÉRREZ P A,PÉREZ-ORTIZ M,SÁNCHEZ-MONEDERO J,et al.Ordinal Regression Methods:Survey and Experimental Study[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(1):127-146. [2]SMOLA A J,SCHOELKOPF B.A tutorial on support vector regression[J].Statistics and Computing,2004,14(3):199-222. [3]HSU C W,LIN C J.A Comparison of Methods for Multiclass Support Vector Machines[C]∥IEEE TRANS.on Neural Networks.2002:415-425. [4]CHENG J,WANG Z,POLLASTRI G.A neural network approach to ordinal regression[C]∥IEEE International Joint Conference on Neural Networks.IEEE,2008:1279-1284. [5]DENG W Y,ZHENG Q H,LIAN S,et al.Ordinal extreme learning machine[J].Neurocomputing,2010,74(1-3):447-456. [6]MCCULLAGH P.Regression Models for Ordinal Data[J].Journal of the Royal Statistical Society,1980,42(2):109-142. [7]MATHIESON M.Ordinal Models for Neural Networks[C]∥ Neural Networks in Finanicial Engineering.1996:523-536. [8]WEI C,KEERTHI S S.Support Vector Ordinal Regression [M].MIT Press,2007. [9]MUKRAS R,WIRATUNGA N,LOTHIAN R,et al.Information Gain Feature Selection for Ordinal Text Classification using Probability Redistribution[C]∥Proceedings of the Textlink Workshop at IJCAI 2007.Hyderabad,2007:1-10. [10]BACCIANELLA S,ESULI A,SEBASTIANI F.Multi-facet Ra-ting of Product Reviews[C]∥European Conference on Ir Research on Advances in Information Retrieval.Springer-Verlag,2009:461-472. [11]BACCIANELLA S,ESULI A,SEBASTIANI F.Feature selec-tion for ordinal regression[C]∥ACM Symposium on Applied Computing.DBLP,2010:1748-1754. [12]BACCIANELLA S,ESULI A,SEBASTIANI F.Feature Selection for Ordinal Text Classification[J].Neural Computation,2014,26(3):557-591. [13]AGRAWAL R,IMIELIN'SKI T,SWAMI A.Mining association rules between sets of items in large databases[C]∥ACM SIGMOD International Conference on Management of Data.ACM,1993:207-216. [14]AGRAWAL R.Fast algorithms for mining association rules[C]∥ Proc.VLDB Conference.1994:487-499. [15]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2000:1-12. [16]ZAKI M J.Hsiao:CHARM:An efficient algorithm for closed itemset mining[C]∥Proceedings of the Second SIAM International Conference on Data Mining.Arlington,2002:457-473. [17]YAN X,YU P S,HAN J.Graph indexing:a frequent structure-based approach[C]∥SIGMOD’04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data.New York:ACM,2004:335-346. [18]WANG K,XU C,LIU B.Clustering transactions using large items[C]∥International Conference on Information and Know-ledge Management,Proceedings.Staff Publications,1999:483-490. [19]LIUB,HSUW,MAY M.Integrating classification and association rule mining[C]∥4th International Conference on Know-ledge Discovery and Data Mining.1998:80-86. [20]LODHI H,SAUNDERS C,SHAWE-TAYLOR J,et al.Text classification using string kernels[J].Journal of Machine Lear-ning Research,2002,2(3):419-444. [21]DEHKORDI M N,SHENASSA M H.CLoPAR:Classification based on Predictive Association Rules[C]∥2006 3rd International IEEE Conference Intelligent Systems.IEEE,2007:483-487. [22]WANG J,KARYPIS G.HARMONY:Efficiently Mining the Best Rules for Classification[C]∥Siam Conference on Data Mining.2005:205-216. [23]PASCAL.Pascal(Pattern Analysis,Statistical Modelling and Computational Learning) machine learning benchmarks repository[EB/OL].Available:http://mldata.org. [24] ASUNCIONA,NEWMAND.UCI Machine Learning Reposito-ry[EB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 帅剑波, 王金策, 黄飞虎, 彭舰. 基于神经架构搜索的点击率预测模型 Click-Through Rate Prediction Model Based on Neural Architecture Search 计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009 |
[3] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[4] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[5] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[6] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[7] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[8] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[9] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[10] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[11] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[12] | 吴成凤, 蔡莉, 李劲, 梁宇. 基于多源位置数据的居民出行频繁模式挖掘 Frequent Pattern Mining of Residents’ Travel Based on Multi-source Location Data 计算机科学, 2021, 48(7): 155-163. https://doi.org/10.11896/jsjkx.200800072 |
[13] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[14] | 丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025 |
[15] | 滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168 |
|