计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 242-246.doi: 10.11896/jsjkx.210200108

• 大数据&数据科学 • 上一篇    下一篇

自适应的集成定序算法

王文强1, 贾星星1,2, 李朋1   

  1. 1 兰州大学数学与统计学院 兰州 730000
    2 桂林电子科技大学广西可信软件重点实验室 广西 桂林 451000
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 贾星星(jiaxx@lzu.edu.cn)
  • 作者简介:(2303858285@qq.com)
  • 基金资助:
    国家自然科学基金(61902164,61972225);中央高校基本科研业务费(lzujbky-2021-53);甘肃省自然科学基金(20JR5RA286);广西可信软件重点实验室研究课题(KX201907)

Adaptive Ensemble Ordering Algorithm

WANG Wen-qiang1, JIA Xing-xing1,2, LI Peng1   

  1. 1 School of Mathematics and Statistics,Lanzhou University,Lanzhou 730000,China
    2 Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 451000,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:WANG Wen-qiang,born in 1996,postgraduate.His main research interests include statistical theory and its application.
    JIA Xing-xing,born in 1982,associated professor,master supervisor.Her main research interests include secret sharing,visual cryptography and data science.
  • Supported by:
    National Natural Science Foundation of China(61902164,61972225),Fundamental Research Funds for the Chinese Central Universities(lzujbky-2021-53),Natural Science Foundation of Gansu Province of China(20JR5RA286) and Guangxi Key Laboratory of Trusted Software(KX201907).

摘要: 定序变量常常用来表达人们对事物的态度和偏好,例如在推荐系统中,消费者对商品的打分评价是定序变量,在自然语言处理中,情感分析的情感也是定序变量。目前学术界采用定序Logit模型来处理定序变量,但是定序Logit回归模型要求定序变量大体服从均匀分布,当自变量没能很好符合均匀分布时,定序Logit回归模型预测定序变量的结果并不理想。基于此,文中提出一种自适应的集成定序算法。首先,借助Boosting思想提出了类Boosting算法,根据定序Logit回归模型的思想构造了定序多层感知机模型和定序随机森林模型,这两个模型同Softmax多分类模型和定序Logit模型构成类Boosting算法。在处理数据中,当4个模型产生的预测值不完全相同时,该样本进入类Boosting模型继续进行训练,直到训练轮数超过某个阈值时,停止训练。然后,利用随机森林模型构建训练集的全部预测值到真实值的映射函数。所提算法在定序变量是任意分布时,仍然有较高的预测精度,极大地提升了定序Logit回归模型的适用范围。将所提算法用于白酒质量数据集、红酒质量数据集上对酒的质量进行预测时,其准确率优于定序Logit模型、多分类算法Softmax、多层感知机和KNN。

关键词: 定序Logit回归模型, 定序变量, 多层感知机算法, 集成算法, 随机森林算法

Abstract: Ordinal variables are used to express people's attitudes and preferences towards things.For example,in recommendation system,consumers' grades for goods are ordinal variables,and the emotion in sentiment analysis of NLP is also ordinal variables.At present,the ordered Logit model is adoptedto deal with the ordinal variables.However,the ordered Logit regression mo-del requires that theordinal variables generally follow uniform distribution.When theordinal variables do not conform to uniform distribution,the prediction result of the ordered Logit regression is not ideal.Based on this,this paper proposes an adaptive ensemble ordering algorithm.Firstly,this paper proposes a boosting-like algorithm with the aid of the idea of boosting.According to the concept of the ordered Logit regression model,the ordered multi-layer perceptron model and the ordered random fo-rest model are constructed.The two models,combined with the Softmax multi classification model and the ordered Logit model,constitute a boosting-like algorithm.In data processing,when the prediction values of the four models are not identical,the sample enters the boosting-like model and continues to train until the number of training rounds exceeds a certain threshold.Then,the random fo-rest model is adopted to construct the mapping function from all the predicted values of the training set to the real values.The proposed algorithm has a high prediction accuracy when the ordered variables are arbitrarily distributed,which greatly improves the application scope of the ordered Logit regression model.When the proposed algorithm is applied to the Baijiu quality datasets and the red wine quality datasets,its prediction accuracy is superior to that of the ordered Logit model and Softmax algorithm,Multi-layer Perceptron and KNN.

Key words: Ensemble algorithm, Multi-layer perceptron, Ordered Logit regression model, Ordinal variables, Random forest algorithm

中图分类号: 

  • TP391
[1] MCCULLAGH P.Regression Models for Ordinal Data[J].Journal of the Royal Statistical Society.Series B:Methodological,1980,42(2):109-127.
[2] ENGEL J.Polytomous Logistic Regression[J].Statistica Neerlandica,2010,42(4):233-252.
[3] BENDER R,GROUVEN U.Using Binary Logistic Regression Models for Ordinal Data with Non-proportional Odds[J].Journal of Clinical Epidemiology,1998,51(10):809-816.
[4] WINSHIP C,MARE R D.Regression Models with Ordinal Va-riables[J].American Sociological Review,1984,49(4):512-525.
[5] WALTER S D,FEINSTEIN A R,WELLS C K.Coding ordinal independent variables in multiple regression analyses[J].American Journal of Epidemiology,1987,125(2):319-323.
[6] GAO G,HE L.Test of application conditions of Logistic regression for multiple categorical ordinal response variables[J].China Health Statistics,2003,20(5):276-278.
[7] GERTHEISS J,TUTZ G.Penalized Regression with OrdinalPredictors[J].International Statistical Review,2010,77(3):345-365.
[8] HONG H G,HE X.Prediction of functional status for the elderly based on a new ordinal regression model.Journal of the American Statistical Association,2010,105(491):930-941.
[9] HONG H G,ZHOU J.A multi-index model for quantile regression with ordinal data[J].Journal of Applied Statistics,2013,40(6):1231-1245.
[10] RAHMAN M A.Bayesian quantile regression for ordinal models[J].Bayesian Analysis,2016,11(1):1-24.
[11] ALHAMZAWI R.Bayesian model selection in ordinal quantile regression[J].Computational Statistics & Data Analysis,2016,103:68-78.
[12] ALHAMZAWI R.Bayesian quantile regression for ordinal longitudinal data Non-proportional Odds[J].Journal of Applied Statistics,2017,45(5):1-14.
[1] 夏源, 赵蕴龙, 范其林.
基于信息熵更新权重的数据流集成分类算法
Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight
计算机科学, 2022, 49(3): 92-98. https://doi.org/10.11896/jsjkx.210200047
[2] 崔景春, 王静.
基于增强头部姿态估计的人脸表情识别模型
Face Expression Recognition Model Based on Enhanced Head Pose Estimation
计算机科学, 2019, 46(6): 322-327. https://doi.org/10.11896/j.issn.1002-137X.2019.06.049
[3] 徐魁,陈 科,徐 君,田佳林,刘 浩,王宇凡.
CGDNA:基于簇图的基因组序列集成拼接算法
CGDNA:An Ensemble De Novo Genome Assembly Algorithm Based on Clustering Graph
计算机科学, 2015, 42(9): 235-239. https://doi.org/10.11896/j.issn.1002-137X.2015.09.045
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!