计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 251-258.doi: 10.11896/j.issn.1002-137X.2018.06.045
吕巨建1,2, 赵慧民1,2, 陈荣军1, 李键红3
LV Ju-jian1,2, ZHAO Hui-min1,2, CHEN Rong-jun1, LI Jian-hong3
摘要: 在很多信息处理任务中,人们容易获得大量的无标签样本,但对样本进行标注是非常费时和费力的。作为机器学习领域中一种重要的学习方法,主动学习通过选择最有信息量的样本进行标注,减少了人工标注的代价。然而,现有的大多数主动学习算法都是基于分类器的监督学习方法,这类算法并不适用于无任何标签信息的样本选择。针对这个问题,借鉴最优实验设计的算法思想,结合自适应稀疏邻域重构理论,提出基于自适应稀疏邻域重构的主动学习算法。该算法可以根据数据集各区域的不同分布自适应地选择邻域规模,同步完成邻域点的搜寻和重构系数的计算,能在无任何标签信息的情况下较好地选择最能代表样本集分布结构的样本。基于人工合成数据集和真实数据集的实验表明,在同等标注代价下,基于自适应稀疏邻域重构的主动学习算法在分类精度和鲁棒性上具有较高的性能。
中图分类号:
[1]ANGLUIN D.Queries and concept learning[J].Machine Learning,1988,2(4):319-342. [2]SETTLES B.Active learning literature survey:Computer Sciences Technical Report 1648[R].University of Wisconsin-Ma-dison,2010. [3]LEWIS D,CATLETT J.Heterogeneous uncertainty sampling for supervised learning[C]//International Conference on Machine Learning(ICML).1994:148-156. [4]FUJII A,TOKUNAGA T,INUI K,et al.Selective sampling for example based word sense disambiguation[J].Computational Linguistics,1998,24(4):573-597. [5]TONG S,KOLLER D.Support vector machine active learning with applications to text classification[C]//International Conference on Machine Learning(ICML).2000:999-1006. [6]LINDENBAUM M,MARKOVITCH S,RUSAKOV D.Selective sampling for nearest neighbor classifiers[J].Machine Learning,2004,54(2):125-152. [7]YANG Y,MA Z,NIE F.et al.Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization[J].International Journal of Computer Vision,2015,113(2):113-127. [8]NGUYEN H T,SMEULDERS A.Active learning using preclustering[C]//International Conference on Machine Learning(ICML).2004:79-86. [9]ATKINSON A,DONEV A,TOBIAS R.Optimum Experimental Designs[M].New York:SAS Oxford University Press,2007. [10]YU K,BI J,TRESP V.Active Learning via transductive experimental design[C]//International Conference on Machine Lear-ning(ICML).2006:1081-1088. [11]ZHANG L,CHEN C,BU J.Active learning based on locally linear reconstruction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(10):2026-2038. [12]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326. [13]XIA J M,YANG J A,CHEN G.Active learning based on sparse linear reconstruction[J].Pattern Recognition and Artificial Intelligence,2013,26(12):1121-1129.(in Chinese) 夏建明,杨俊安,陈功.基于稀疏线性重构的主动学习算法[J].模式识别与人工智能,2013,26(12):1121-1129. [14]ELHAMIFAR E.Sparse manifold clustering and embedding[C]//International Conference on Neural Information Proces-sing Systems.2011:55-63. [15]DONOHO D.For most large underdetermined systems of linear equations the minimal L1-norm solution is also the sparsest solution[J].Communications on Pure and Applied Mathematics,2006,59(6):797-829. [16]WRIGHT J,YANG A,GANESH A,et al.Robust face recognition via sparse representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(2):210-227. [17]ZHANG Z,XU Y,LI X,et al.A Survey of Sparse Representation:Algorithms and Applications[J].IEEE Access,2017,3:49-530. [18]BOYD S,VANDENBERGHE L.Convex Optimization[M].Cambridgeshire Cambridge University Press,2004. [19]GRANT M,BOYD S.CVX:Matlab Software for Disciplined Convex Programming(Version1.21) [EB/OL].http://cvxr.com/cvx. [20]GEORGHIADES A,BELHUMEURAND P,KRIEGMAN D. From few to many:Illumination cone models for face recognition under variable lighting and pose[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(6):643-660. [21]ROWEIS S.USPS Handwritten Digits [EB/OL].http://www.cs.nyu.edu/~roweis/data.html. |
[1] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[2] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[3] | 张人之, 朱焱. 基于主动学习的社交网络恶意用户检测方法 Malicious User Detection Method for Social Network Based on Active Learning 计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151 |
[4] | 王体爽, 李培峰, 朱巧明. 基于数据增强的中文隐式篇章关系识别方法 Chinese Implicit Discourse Relation Recognition Based on Data Augmentation 计算机科学, 2021, 48(10): 85-90. https://doi.org/10.11896/jsjkx.200800115 |
[5] | 董心悦, 范瑞东, 侯臣平. 基于边际概率分布匹配的主动标记分布学习 Active Label Distribution Learning Based on Marginal Probability Distribution Matching 计算机科学, 2020, 47(9): 190-197. https://doi.org/10.11896/jsjkx.200700077 |
[6] | 李金霞, 赵志刚, 李强, 吕慧显, 李明生. 改进的局部和相似性保持特征选择算法 Improved Locality and Similarity Preserving Feature Selection Algorithm 计算机科学, 2020, 47(6A): 480-484. https://doi.org/10.11896/JsJkx.20190800095 |
[7] | 钱玲龙, 武娇, 王人锋, 陆慧娟. 基于稀疏表示的多文档自动摘要 Multi-document Automatic Summarization Based on Sparse Representation 计算机科学, 2020, 47(11A): 97-105. https://doi.org/10.11896/jsjkx.200300087 |
[8] | 李秀琴, 王天荆, 白光伟, 沈航. 基于压缩感知的两阶段多目标定位算法 Two-phase Multi-target Localization Algorithm Based on Compressed Sensing 计算机科学, 2019, 46(5): 50-56. https://doi.org/10.11896/j.issn.1002-137X.2019.05.007 |
[9] | 李翼宏, 刘方正, 杜镇宇. 一种改进主动学习的恶意代码检测算法 Malware Detection Algorithm for Improving Active Learning 计算机科学, 2019, 46(5): 92-99. https://doi.org/10.11896/j.issn.1002-137X.2019.05.014 |
[10] | 赵海燕, 汪静, 陈庆奎, 曹健. 主动学习在推荐系统中的应用 Application of Active Learning in Recommendation System 计算机科学, 2019, 46(11A): 153-158. |
[11] | 孙金, 陈若煜, 罗恒利. 基于主动学习的人脸标注研究 Research on Face Tagging Based on Active Learning 计算机科学, 2018, 45(9): 299-302. https://doi.org/10.11896/j.issn.1002-137X.2018.09.050 |
[12] | 李昌利, 张琳, 樊棠怀. 基于自适应主动学习与联合双边滤波的高光谱图像分类 Hyperspectral Image Classification Based on Adaptive Active Learning and Joint Bilateral Filtering 计算机科学, 2018, 45(12): 223-228. https://doi.org/10.11896/j.issn.1002-137X.2018.12.037 |
[13] | 李锋,万小强. 基于关联矩阵的短信自动分类 SMS Automatic Classification Based on Relational Matrix 计算机科学, 2017, 44(Z6): 428-432. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.096 |
[14] | 王长宝,李青雯,于化龙. 面向类别不平衡数据的主动在线加权极限学习机算法 Active,Online and Weighted Extreme Learning Machine Algorithm for Class Imbalance Data 计算机科学, 2017, 44(12): 221-226. https://doi.org/10.11896/j.issn.1002-137X.2017.12.040 |
[15] | 翟俊海,臧立光,张素芳. 在线序列主动学习方法 Online Sequential Active Learning Approach 计算机科学, 2017, 44(1): 37-41. https://doi.org/10.11896/j.issn.1002-137X.2017.01.007 |
|