计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 69-74.doi: 10.11896/j.issn.1002-137X.2019.06.009

• 大数据与数据科学* • 上一篇    下一篇

面向序数回归的组合特征提取方法

曾庆田1,2, 刘晨征1, 倪维健1, 段华3   

  1. (山东科技大学计算机科学与工程学院 山东 青岛266590)1
    (山东科技大学电子信息工程学院 山东 青岛266590)2
    (山东科技大学数学与系统科学学院 山东 青岛266590)3
  • 收稿日期:2018-06-20 发布日期:2019-06-24
  • 通讯作者: 刘晨征(1994-),男,硕士生,主要研究方向为数据挖掘、人工智能倪维健(1981-),男,博士,副教授,主要研究方向为机器学习、数据挖掘、信息检索,E-mail:niweijian@gmail.com
  • 作者简介:曾庆田(1976-),男,教授,博士生导师,CCF会员,主要研究方向为过程挖掘、智能信息处理、个性化推荐、人工智能,E-mail:qtzeng@163.com;段 华(1976-),女,博士,副教授,主要研究方向为机器学习、优化算法。
  • 基金资助:
    国家自然科学基金(61472229,61702306,61602278,61602279),山东省科技发展项目(2016ZDJS02A11,ZR2017BF015,ZR2017MF027),山东省泰山学者攀登计划专项和山东科技大学科研创新团队支持计划项目基金(2015TDJH102)资助。

Combined Feature Extraction Method for Ordinal Regression

ZENG Qing-tian1,2, LIU Chen-zheng1, NI Wei-jian1, DUAN Hua3   

  1. (College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)1
    (College of Electronic and Information Engineering,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)2
    (College of Mathematics and Systems Science,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)3
  • Received:2018-06-20 Published:2019-06-24

摘要: 序数回归(也称序数分类)是一种监督学习任务,即使用具有自然顺序的标签对数据项进行分类。序数回归与诸多实际问题密切相关,近几年关于序数回归的研究受到越来越多的关注。序数回归与其他监督学习任务(分类、回归等)一样,需要通过特征提取来提高模型的效率和准确性。虽然特征提取被广泛研究并用于分类学习任务中,但是在序数回归中的研究较少。众所周知,相比单特征,组合特征可以表达更多的数据底层语义,但是加入一般的组合特征很难提高模型的准确性。文中基于频繁模式挖掘,借助K-L散度值来选取最有区分能力的频繁模式进行特征组合,提出了一种新的序数回归组合特征提取方法,并在公开数据集和自有数据集上使用多个序数回归模型进行实验。结果表明,使用最有区分能力的频繁模式组合特征,能够有效提升大多数序数回归模型的训练效果。

关键词: 频繁模式, 特征选择, 特征组合, 序数回归

Abstract: Ordinal regression,also known as ordinal classification, is a supervised learning task that uses the labels with a natural order to classify data items.Ordinal regression is closely related to many practical problems.In recent years,the research on ordinal regression has attracted more and more attention.Ordinal regression,like other supervised lear-ning tasks(classification,regression,etc.),requires feature extraction to improve the efficiency and accuracy of the model.However,while feature extraction has been extensively studied for other classification tasks,there are few researches in ordinal regression.It is well known that the combined features could capture more underlying data semantics than single features,but it is difficult to improve the accuracy of the model by adding general combined features.Based on the frequent mining patterns,this paper used the K-L divergence value to select the most discriminative frequent patterns for feature combination,and proposed a new ordinal regression combination feature extraction method.Multiple ordinal regression models are used for validation on both the public and our own datasets.The experimental results show that using the most distinguishing frequent pattern combination features can effectively improve the training effect of most ordinal regression models.

Key words: Feature combination, Feature selection, Frequent pattern, Ordinal regression

中图分类号: 

  • TP391
[1]GUTIÉRREZ P A,PÉREZ-ORTIZ M,SÁNCHEZ-MONEDERO J,et al.Ordinal Regression Methods:Survey and Experimental Study[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(1):127-146.
[2]SMOLA A J,SCHOELKOPF B.A tutorial on support vector regression[J].Statistics and Computing,2004,14(3):199-222.
[3]HSU C W,LIN C J.A Comparison of Methods for Multiclass Support Vector Machines[C]∥IEEE TRANS.on Neural Networks.2002:415-425.
[4]CHENG J,WANG Z,POLLASTRI G.A neural network approach to ordinal regression[C]∥IEEE International Joint Conference on Neural Networks.IEEE,2008:1279-1284.
[5]DENG W Y,ZHENG Q H,LIAN S,et al.Ordinal extreme learning machine[J].Neurocomputing,2010,74(1-3):447-456.
[6]MCCULLAGH P.Regression Models for Ordinal Data[J].Journal of the Royal Statistical Society,1980,42(2):109-142.
[7]MATHIESON M.Ordinal Models for Neural Networks[C]∥ Neural Networks in Finanicial Engineering.1996:523-536.
[8]WEI C,KEERTHI S S.Support Vector Ordinal Regression [M].MIT Press,2007.
[9]MUKRAS R,WIRATUNGA N,LOTHIAN R,et al.Information Gain Feature Selection for Ordinal Text Classification using Probability Redistribution[C]∥Proceedings of the Textlink Workshop at IJCAI 2007.Hyderabad,2007:1-10.
[10]BACCIANELLA S,ESULI A,SEBASTIANI F.Multi-facet Ra-ting of Product Reviews[C]∥European Conference on Ir Research on Advances in Information Retrieval.Springer-Verlag,2009:461-472.
[11]BACCIANELLA S,ESULI A,SEBASTIANI F.Feature selec-tion for ordinal regression[C]∥ACM Symposium on Applied Computing.DBLP,2010:1748-1754.
[12]BACCIANELLA S,ESULI A,SEBASTIANI F.Feature Selection for Ordinal Text Classification[J].Neural Computation,2014,26(3):557-591.
[13]AGRAWAL R,IMIELIN'SKI T,SWAMI A.Mining association rules between sets of items in large databases[C]∥ACM SIGMOD International Conference on Management of Data.ACM,1993:207-216.
[14]AGRAWAL R.Fast algorithms for mining association rules[C]∥ Proc.VLDB Conference.1994:487-499.
[15]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2000:1-12.
[16]ZAKI M J.Hsiao:CHARM:An efficient algorithm for closed itemset mining[C]∥Proceedings of the Second SIAM International Conference on Data Mining.Arlington,2002:457-473.
[17]YAN X,YU P S,HAN J.Graph indexing:a frequent structure-based approach[C]∥SIGMOD’04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data.New York:ACM,2004:335-346.
[18]WANG K,XU C,LIU B.Clustering transactions using large items[C]∥International Conference on Information and Know-ledge Management,Proceedings.Staff Publications,1999:483-490.
[19]LIUB,HSUW,MAY M.Integrating classification and association rule mining[C]∥4th International Conference on Know-ledge Discovery and Data Mining.1998:80-86.
[20]LODHI H,SAUNDERS C,SHAWE-TAYLOR J,et al.Text classification using string kernels[J].Journal of Machine Lear-ning Research,2002,2(3):419-444.
[21]DEHKORDI M N,SHENASSA M H.CLoPAR:Classification based on Predictive Association Rules[C]∥2006 3rd International IEEE Conference Intelligent Systems.IEEE,2007:483-487.
[22]WANG J,KARYPIS G.HARMONY:Efficiently Mining the Best Rules for Classification[C]∥Siam Conference on Data Mining.2005:205-216.
[23]PASCAL.Pascal(Pattern Analysis,Statistical Modelling and Computational Learning) machine learning benchmarks repository[EB/OL].Available:http://mldata.org.
[24] ASUNCIONA,NEWMAND.UCI Machine Learning Reposito-ry[EB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 帅剑波, 王金策, 黄飞虎, 彭舰.
基于神经架构搜索的点击率预测模型
Click-Through Rate Prediction Model Based on Neural Architecture Search
计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009
[3] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[4] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[5] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[6] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[7] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[8] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[9] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[10] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[11] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[12] 吴成凤, 蔡莉, 李劲, 梁宇.
基于多源位置数据的居民出行频繁模式挖掘
Frequent Pattern Mining of Residents’ Travel Based on Multi-source Location Data
计算机科学, 2021, 48(7): 155-163. https://doi.org/10.11896/jsjkx.200800072
[13] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[14] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
[15] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!