计算机科学 ›› 2017, Vol. 44 ›› Issue (6): 212-215.doi: 10.11896/j.issn.1002-137X.2017.06.035

• 人工智能 • 上一篇    下一篇

基于线性回归和属性集成的分类算法

强保华,唐波,王玉峰,邹显春,柳正利,孙忠旭,谢武   

  1. 桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,中国电子科技集团公司第54研究所 石家庄050081,西南大学计算机与信息科学学院 重庆400715,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004,桂林电子科技大学广西可信软件重点实验室,广西云计算与大数据协同创新中心 桂林541004
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家海洋技术公共福利项目(201505002),国家自然科学基金(61462020),广西可信软件重点实验室开放项目(KX201510),广西云计算与大数据协同创新项目(YD16E04),研究生创新项目(YJCXS201538)资助

Classification Algorithm Using Linear Regression and Attribute Ensemble

QIANG Bao-hua, TANG Bo, WANG Yu-feng, ZOU Xian-chun, LIU Zheng-li, SUN Zhong-xu and XIE Wu   

  • Online:2018-11-13 Published:2018-11-13

摘要: 对于高维度小样本数据的分类问题,高维属性的复杂性限制了分类模型预测的准确率。为了进一步提高准确率,提出了基于线性回归和属性集成的分类算法。首先,采用线性回归为每一个属性构建属性线性分类器(Attri-bute Linear Classifier,ALC);其次,为了避免因ALC数量过多而导致准确率下降,利用经验风险最小化策略中的经验损失值作为评估标准来优选ALC;最后,应用多数投票法来集成被筛选的ALC。采用高维度小样本的基因表达数据集进行实验,结果显示该算法具有比逻辑回归、支持向量机和随机森林算法更高的准确率。

关键词: 线性回归,单属性分类,经验损失,属性集成,多数投票法

Abstract: For the classification problems of high-dimensionality and small-sample data,the predictive accuracy of the classification model is restricted by the complexity of the high dimensional attributes.To further improve the accuracy,a classification algorithm using linear regression and attributes ensemble (LRAE) was proposed.The linear regression is utilized to construct an attribute linear classifier (ALC) for each attribute.To avoid the decrease of accuracy caused by too many ALCs,empirical loss value in the empirical risk minimization strategy is used as the evaluation criteria to select ALCs.The majority voting method is adopted to integrate ALCs.The results of experiments using gene expression data demonstrate that the accuracy of LRAE algorithm is relatively higher than that of logistic regression,support vector machine and random forest algorithms.

Key words: Linear regression,Single attribute classification,Empirical loss,Attribute ensemble,Majority voting method

[1] YUAN G X,HO C H,LIN C J.Recent Advances of Large-Scale Linear Classification[J].Proceedings of the IEEE,2012,0(9):2584-2603.
[2] LIU Z W.Research on Linear Classification Algorithm Based on Combination and Optimization [D].Xi’an:Xidian University,2013.(in Chinese) 刘志伟.基于组合优化的线性分类算法研究[D].西安:西安电子科技大学,2013.
[3] JOACHIMS T.Training linear SVMs in linear time[C]∥Twelfth ACM Sigkdd International Conference on Knowledge Discovery&Data Mining.Philadelphia,USA:ACM press,2006:217-226.
[4] HSIEH C J,CHANG K W,LIN C J,et al.A dual coordinate descent method for large-scale linear SVM[C]∥ International Conference on Machine Learning.Helsinki,Finland:IEEE press,2008:1369-1398.
[5] CRAMER J S.The origins of logistic regression:02-119/4[R].Uinkeveren:Tinbergen Institute,2002.
[6] PLATT J.Sequential minimal optimization:A fast algorithm for training support vector marchines [J].Journal of Information Technology,1998,2(5):1-28.
[7] BOYD S L,VANDENBERGHE.Convex Optimization[M].Cam-bridge,UK:Cambridge University Press,2004.
[8] DIETTERICH T G.Machine learning research:Four current directions [J].AI Magazine,1997,8(4):97-136.
[9] ZHOU Z H,WU J X,TANG W.Ensembling neural networks:Many could be better than all[J].Artificial Intelligence,2002,3(1/2):239-263.
[10] ZHANG C X,ZHANG J S.A Survey of Selective EnsembleLearning Algorithms [J].Chinese Journal of Computer,2011,4(8):1399-1410.(in Chinese) 张春霞,张讲社.选择性集成学习算法综述[J].计算机学报,2011,4(8):1399-1410.
[11] FREUND Y,ROBERT E S.A decision-theoretic generalization of on-line learning and an application to boosting [J].Journal of Computer and System Sciences,1997,5(1):119-139.
[12] BREIMAN L.Bagging predictors [J].Machine Learning,1996,4(2):123-140.
[13] BREIMAN L.Random forests [J].Machine Learning,2001,5(1):5-32.
[14] ZHOU Z H.Machine Learning[M].Beijing:Tsinghua University Press,2016.(in Chinese) 周志华.机器学习[M].北京:清华大学出版社,2016.
[15] LI H.Statistical Learning Method [M].Beijing:Tsinghua University Press,2012.(in Chinese) 李航.统计学习方法[M].北京:清华大学出版社,2012.
[16] LI Y,SI J,ZHOU G J,et al.FREL:A Stable Feature Selection Algorithm [J].IEEE Trains.Neural Netw,2015,6(7):1388-1402.
[17] LU H J,AN C L.Disagreement Measure Based Ensemble of Extreme Learning Machine for Gene Expression Data Classification [J].Chinese Journal of Computer,2013,6(2):341-348.(in Chinese) 陆慧娟,安春霖.基于输出不一致测度的极限学习机集成的基因表达数据分类[J].计算机学报,2013,6(2):341-348.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!