计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 242-246.doi: 10.11896/jsjkx.210200108
王文强1, 贾星星1,2, 李朋1
WANG Wen-qiang1, JIA Xing-xing1,2, LI Peng1
摘要: 定序变量常常用来表达人们对事物的态度和偏好,例如在推荐系统中,消费者对商品的打分评价是定序变量,在自然语言处理中,情感分析的情感也是定序变量。目前学术界采用定序Logit模型来处理定序变量,但是定序Logit回归模型要求定序变量大体服从均匀分布,当自变量没能很好符合均匀分布时,定序Logit回归模型预测定序变量的结果并不理想。基于此,文中提出一种自适应的集成定序算法。首先,借助Boosting思想提出了类Boosting算法,根据定序Logit回归模型的思想构造了定序多层感知机模型和定序随机森林模型,这两个模型同Softmax多分类模型和定序Logit模型构成类Boosting算法。在处理数据中,当4个模型产生的预测值不完全相同时,该样本进入类Boosting模型继续进行训练,直到训练轮数超过某个阈值时,停止训练。然后,利用随机森林模型构建训练集的全部预测值到真实值的映射函数。所提算法在定序变量是任意分布时,仍然有较高的预测精度,极大地提升了定序Logit回归模型的适用范围。将所提算法用于白酒质量数据集、红酒质量数据集上对酒的质量进行预测时,其准确率优于定序Logit模型、多分类算法Softmax、多层感知机和KNN。
中图分类号:
[1] MCCULLAGH P.Regression Models for Ordinal Data[J].Journal of the Royal Statistical Society.Series B:Methodological,1980,42(2):109-127. [2] ENGEL J.Polytomous Logistic Regression[J].Statistica Neerlandica,2010,42(4):233-252. [3] BENDER R,GROUVEN U.Using Binary Logistic Regression Models for Ordinal Data with Non-proportional Odds[J].Journal of Clinical Epidemiology,1998,51(10):809-816. [4] WINSHIP C,MARE R D.Regression Models with Ordinal Va-riables[J].American Sociological Review,1984,49(4):512-525. [5] WALTER S D,FEINSTEIN A R,WELLS C K.Coding ordinal independent variables in multiple regression analyses[J].American Journal of Epidemiology,1987,125(2):319-323. [6] GAO G,HE L.Test of application conditions of Logistic regression for multiple categorical ordinal response variables[J].China Health Statistics,2003,20(5):276-278. [7] GERTHEISS J,TUTZ G.Penalized Regression with OrdinalPredictors[J].International Statistical Review,2010,77(3):345-365. [8] HONG H G,HE X.Prediction of functional status for the elderly based on a new ordinal regression model.Journal of the American Statistical Association,2010,105(491):930-941. [9] HONG H G,ZHOU J.A multi-index model for quantile regression with ordinal data[J].Journal of Applied Statistics,2013,40(6):1231-1245. [10] RAHMAN M A.Bayesian quantile regression for ordinal models[J].Bayesian Analysis,2016,11(1):1-24. [11] ALHAMZAWI R.Bayesian model selection in ordinal quantile regression[J].Computational Statistics & Data Analysis,2016,103:68-78. [12] ALHAMZAWI R.Bayesian quantile regression for ordinal longitudinal data Non-proportional Odds[J].Journal of Applied Statistics,2017,45(5):1-14. |
[1] | 夏源, 赵蕴龙, 范其林. 基于信息熵更新权重的数据流集成分类算法 Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight 计算机科学, 2022, 49(3): 92-98. https://doi.org/10.11896/jsjkx.210200047 |
[2] | 崔景春, 王静. 基于增强头部姿态估计的人脸表情识别模型 Face Expression Recognition Model Based on Enhanced Head Pose Estimation 计算机科学, 2019, 46(6): 322-327. https://doi.org/10.11896/j.issn.1002-137X.2019.06.049 |
[3] | 徐魁,陈 科,徐 君,田佳林,刘 浩,王宇凡. CGDNA:基于簇图的基因组序列集成拼接算法 CGDNA:An Ensemble De Novo Genome Assembly Algorithm Based on Clustering Graph 计算机科学, 2015, 42(9): 235-239. https://doi.org/10.11896/j.issn.1002-137X.2015.09.045 |
|