计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 407-411.

• 大数据与数据挖掘 • 上一篇    下一篇

基于数据分布特征的线性孪生支持向量机

宋瑞阳1, 孟华1,2, 龙治国2   

  1. 西南交通大学数学学院 成都6117561;
    西南交通大学信息科学与技术学院 成都6117562
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 孟 华(1982-),男,博士,主要研究方向为知识表示与推理、机器学习,E-mail:menghua@home.swjtu.edu.cn
  • 作者简介:宋瑞阳(1997-),男,主要研究方向为数据挖掘、机器学习;龙治国(1989-),男,博士,主要研究方向为知识表示与推理、机器学习。
  • 基金资助:
    本文受NSFC(61773324),教育部人文社科项目(18XJC72040001),中央高校基本科研业务费专项资金(2682016CX114,2682018CX25)资助。

Linear Twin Support Vector Machine Based on Data Distribution Characteristics

SONG Rui-yang1, MENG Hua1,2, LONG Zhi-guo2   

  1. School of Mathematics,Southwest Jiaotong University,Chengdu 611756,China1;
    School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China2
  • Online:2019-06-14 Published:2019-07-02

摘要: 孪生支持向量机(TWSVM)目前已在众多领域取得了成功的应用,但标准TWSVM模型在处理具有分布特征的数据分类问题时鲁棒性差,尤其当数据的不确定性程度较大时,不考虑样本点分布特征的标准分类模型已不能满足分类准确率的要求。为此,文中提出了基于数据分布特征的加权线性孪生支持向量机(TWSVM-U)模型,它在TWSVM的基础上考虑数据的分布特征对分类超平面位置的影响,根据数据在分类超平面法方向的分散程度定量构造距离权重。事实上,TWSVM-U是TWSVM的推广,当训练样本数据不具有分布特征时,TWSVM-U模型将退化为标准TWSVM模型。十折交叉验证的实验结果表明,TWSVM-U模型在处理波动范围较大的不确定性数据分类问题时比SVM和TWSVM表现更优。

关键词: 不确定信息, 二分类, 加权距离, 孪生支持向量机

Abstract: Twin Support Vector Machine(TWSVM) have been successfully applied in many fields.However,the standard TWSVM model have poor robustness when dealing with data classification problems involving distribution characteristics,especially when uncertainty in data fluctuates wildly,the standard classification model,which doesn’t consider the distribution characteristics,is no longer satisfactory for classification accuracy.Therefore,a weighted linear twin support vector machine model based on data distribution characteristics was proposed in this paper.The new model,denoted by TWSVM-U,further considers the influence of data distribution characteristics on the locations of classification hyperplanes,and constructs distance weights quantitatively according to data dispersity at the normal vector directions of classification hyperplanes.TWSVM-U is a generalization of TWSVM.In fact,when training samples do not have distribution characteristics,TWSVM-U model will degenerate to the standard TWSVM model.Experiments with 10-fold cross validation show that the TWSVM-U model performs better than the SVM and the TWSVM on classification problems with large data fluctuation range.

Key words: Binary classification, Twin support vector machine, Uncertain information, Weighted distance

中图分类号: 

  • TP301
[1]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
[2]MANGASARIAN O L,WILD E W.Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(1):69-74.
[3]JAYADEVA,KHEMCHANDANI R,CHANDRA S.Twin Support Vector Machines for pattern classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(5):905-910.
[4]KUMAR M A,GOPAL M.Least squares twin support vector machines for pattern classification[J].Expert Systems with Applications,2009,36(4):7535-7543.
[5]PENG X J,XU D.Twin Mahalanobis distance-based support vector machines for pattern recognition[J].Information Sciences,2012,200:22-37.
[6]CHEN S G,WU X J.A new fuzzy twin support vector machine for pattern classification[J].International Journal of Machine Learning and Cybernetics,2018,9:1553-1564.
[7]KHEMCHANDANI R,GOYAL K,CHANDRA S.TWSVR: Regression via Twin Support Vector Machine[J].Neural Networks,2016,74:14-21.
[8]WANG Z,SHAO Y H,BAI L,et al.Twin support vector machine for clustering[J].IEEE Trans Neural Netw Learn Syst,2015,26(10):2583-2588.
[9]CURRAN J M,BUCKLETON J S.An investigation into the performance of methods for adjusting for sampling uncertainty in DNA likelihood ratio calculations[J].Forensic Science International Genetics,2011,5(5):512-516.
[10]MA X,DJOUADI S M,CHARALAMBOUS C D.Optimal Filtering Over Uncertain Wireless Communication Channels[J].IEEE Signal Processing Letters,2011,18(6):359-362.
[11]POWELL W B,BOUZAIENE-AYARI B,BERGER J,et al.The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations[J].Acm Transactions on Modeling & Computer Simulation,2011,22(1):1-19.
[12]BI J,ZHANG T.Support Vector Classification with Input Data Uncertainty[J].Proc.of Neural Inf.proc.systems,2004,17:161-168.
[13]WENZEL F,GALY-FAJOU T,DEUTSCH M,et al.Bayesian Nonlinear Support Vector Machines for Big Data[C]∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham,Springer,2017:307-322.
[14]DEISENROTH M P,FOX D,RASMUSSEN C E.Gaussian Processes for Data-Efficient Learning in Robotics and Control[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(2):408-423.
[15]LANCKRIET G R G,GHAOUI L E,BHATTACHARYYA C,et al.A robust minimax approach to classification[J].Journal of Machine Learning Research,2003,3(3):555-582.
[16]TZELEPIS C,MEZARIS V,PATRAS I.Linear Maximum Margin Classifier for Learning from Uncertain Data[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(12):2948-2962.
[17]Han R J,Cao Q L.Fuzzy chance constrained least squares twin support vector machine for uncertain classification[J].Journal of Intelligent & Fuzzy Systems,2017,33(5):3041-3049.
[18]JAYADEVA,KHEMCHANDANI R,CHANDRA S.Twin Support Vector Machines:Models,Extensions and Applications [M].Cham,Springer,2017:43-53.
[19]李航.统计学习方法[M].北京:清华大学出版社,2012:225-228.
[20]DUA D,EFI K T.UCI Machine Learning Repository[EB/ OL].http://archive.ics.uci.edu/ml.
[1] 朱维军, 王鑫, 钟英辉, 樊永文, 陈永华.
一种基于梯度提升回归树的系外行星宜居性预测方法
Habitability Prediction of Exoplanets Based on GBRT Algorithm
计算机科学, 2019, 46(6A): 71-73.
[2] 于诚, 朱皖宁.
基于战场热点图的MOBA类游戏战术分析研究
Tactical Analysis of MOBA Games Based on Hotspot Map of Battlefield
计算机科学, 2018, 45(11A): 149-151.
[3] 安悦瑄, 丁世飞, 胡继普.
孪生支持向量机综述
Twin Support Vector Machine:A Review
计算机科学, 2018, 45(11): 29-36. https://doi.org/10.11896/j.issn.1002-137X.2018.11.003
[4] 李凯,顾丽凤,胡少方.
引入调整项的模糊孪生支持向量机
Regularized Fuzzy Twin Support Vector Machine
计算机科学, 2017, 44(8): 260-264. https://doi.org/10.11896/j.issn.1002-137X.2017.08.044
[5] 张谢锴,丁世飞.
基于马氏距离的孪生多分类支持向量机
Mahalanobis Distance-based Twin Multi-class Classification Support Vector Machine
计算机科学, 2016, 43(3): 49-53. https://doi.org/10.11896/j.issn.1002-137X.2016.03.009
[6] .
集对分析理论及其应用研究进展

计算机科学, 2006, 33(1): 205-209.
[7] 胡仲海 桂志波.
一种考虑不确定信息的QoS单播路由改进算法

计算机科学, 2005, 32(6): 59-61.
[8] 桂志波 胡仲海.
QoS路由的不确定信息研究:形式化描述与分析

计算机科学, 2003, 30(11): 80-83.
[9] 蔡希尧.
不确定信息的数值表示和计算方法

计算机科学, 1991, 18(5): 50-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!