计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 251-258.doi: 10.11896/j.issn.1002-137X.2018.06.045

• 人工智能 • 上一篇    下一篇

基于自适应稀疏邻域重构的无监督主动学习算法

吕巨建1,2, 赵慧民1,2, 陈荣军1, 李键红3   

  1. 广东技术师范学院 广州5106651;
    广州数字内容处理及其安全性技术重点实验室 广州5106652;
    广东外语外贸大学语言工程与计算实验室 广州5100063
  • 收稿日期:2017-01-11 出版日期:2018-06-15 发布日期:2018-07-24
  • 作者简介:吕巨建(1984-),男,博士,讲师,主要研究方向为机器学习、信号与信息处理、计算机视觉等,E-mail:jujianlv@163.com(通信作者);赵慧民(1966-),男,博士,教授,主要研究方向为信息安全、信号与信息处理等;陈荣军(1978-),男,博士,副教授,主要研究方向为物联网技术、智能信息系统等;李键红(1981-),男,博士,助理研究员,主要研究方向为语音处理、图像处理、机器学习等
  • 基金资助:
    本文受国家自然科学基金(61672008),广东省自然科学基金重点项目(2016A030311013),广东省普通高校国际合作重大项目(2015KGJHZ021),广东省自然科学基金(2016A030310335)资助

Unsupervised Active Learning Based on Adaptive Sparse Neighbors Reconstruction

LV Ju-jian1,2, ZHAO Hui-min1,2, CHEN Rong-jun1, LI Jian-hong3   

  1. Guangdong Polytechnic Normal University,Guangzhou 510665,China1;
    Key Laboratory of Guangzhou Digital Content Processing and Security Technology,Guangzhou 510665,China2;
    Language Engineering and Computing Laboratory,Guangdong University of Foreign Studies,Guangzhou 510006,China3
  • Received:2017-01-11 Online:2018-06-15 Published:2018-07-24

摘要: 在很多信息处理任务中,人们容易获得大量的无标签样本,但对样本进行标注是非常费时和费力的。作为机器学习领域中一种重要的学习方法,主动学习通过选择最有信息量的样本进行标注,减少了人工标注的代价。然而,现有的大多数主动学习算法都是基于分类器的监督学习方法,这类算法并不适用于无任何标签信息的样本选择。针对这个问题,借鉴最优实验设计的算法思想,结合自适应稀疏邻域重构理论,提出基于自适应稀疏邻域重构的主动学习算法。该算法可以根据数据集各区域的不同分布自适应地选择邻域规模,同步完成邻域点的搜寻和重构系数的计算,能在无任何标签信息的情况下较好地选择最能代表样本集分布结构的样本。基于人工合成数据集和真实数据集的实验表明,在同等标注代价下,基于自适应稀疏邻域重构的主动学习算法在分类精度和鲁棒性上具有较高的性能。

关键词: 局部线性重构, 稀疏重构, 优化实验设计, 直推式实验设计, 主动学习

Abstract: In many information processing tasks,individuals are easy to get a lot of unlabeled data,but labeling the unlabeled data is quite time-consuming and usually expensive.As an important learning method in the field of machine lear-ning,active learning reduces the cost of labeling data by selecting the most information data points to label.However,most of the existing active learning algorithms are supervised method based on the classifier,not suitable for the sample selection problem without any label information.Aiming at this problem,a novel unsupervised active learning algorithm was proposed,called active learning based on adaptive sparse neighbors reconstruction,by learning from the optimal experiment design and combining the adaptive sparse neighbors reconstruction.The proposed algorithm adaptively selects the neighborhood scale according to different regional distribution of dataset,searches the sparse neighbors and calculates the reconstruct coefficients simultaneously,and can choose the most representative data points of the distribution structure of dataset without any label information.Empirical results on both synthetic and real-world data sets show that the proposed algorithm has high performance in classification accuracy and robustness under the same labeling cost.

Key words: Active learning, Local linear reconstruction, Optimal experimental design, Sparse reconstruction, Transductive experimental design

中图分类号: 

  • TP181
[1]ANGLUIN D.Queries and concept learning[J].Machine Learning,1988,2(4):319-342.
[2]SETTLES B.Active learning literature survey:Computer Sciences Technical Report 1648[R].University of Wisconsin-Ma-dison,2010.
[3]LEWIS D,CATLETT J.Heterogeneous uncertainty sampling for supervised learning[C]//International Conference on Machine Learning(ICML).1994:148-156.
[4]FUJII A,TOKUNAGA T,INUI K,et al.Selective sampling for example based word sense disambiguation[J].Computational Linguistics,1998,24(4):573-597.
[5]TONG S,KOLLER D.Support vector machine active learning with applications to text classification[C]//International Conference on Machine Learning(ICML).2000:999-1006.
[6]LINDENBAUM M,MARKOVITCH S,RUSAKOV D.Selective sampling for nearest neighbor classifiers[J].Machine Learning,2004,54(2):125-152.
[7]YANG Y,MA Z,NIE F.et al.Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization[J].International Journal of Computer Vision,2015,113(2):113-127.
[8]NGUYEN H T,SMEULDERS A.Active learning using preclustering[C]//International Conference on Machine Learning(ICML).2004:79-86.
[9]ATKINSON A,DONEV A,TOBIAS R.Optimum Experimental Designs[M].New York:SAS Oxford University Press,2007.
[10]YU K,BI J,TRESP V.Active Learning via transductive experimental design[C]//International Conference on Machine Lear-ning(ICML).2006:1081-1088.
[11]ZHANG L,CHEN C,BU J.Active learning based on locally linear reconstruction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(10):2026-2038.
[12]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[13]XIA J M,YANG J A,CHEN G.Active learning based on sparse linear reconstruction[J].Pattern Recognition and Artificial Intelligence,2013,26(12):1121-1129.(in Chinese)
夏建明,杨俊安,陈功.基于稀疏线性重构的主动学习算法[J].模式识别与人工智能,2013,26(12):1121-1129.
[14]ELHAMIFAR E.Sparse manifold clustering and embedding[C]//International Conference on Neural Information Proces-sing Systems.2011:55-63.
[15]DONOHO D.For most large underdetermined systems of linear equations the minimal L1-norm solution is also the sparsest solution[J].Communications on Pure and Applied Mathematics,2006,59(6):797-829.
[16]WRIGHT J,YANG A,GANESH A,et al.Robust face recognition via sparse representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(2):210-227.
[17]ZHANG Z,XU Y,LI X,et al.A Survey of Sparse Representation:Algorithms and Applications[J].IEEE Access,2017,3:49-530.
[18]BOYD S,VANDENBERGHE L.Convex Optimization[M].Cambridgeshire Cambridge University Press,2004.
[19]GRANT M,BOYD S.CVX:Matlab Software for Disciplined Convex Programming(Version1.21) [EB/OL].http://cvxr.com/cvx.
[20]GEORGHIADES A,BELHUMEURAND P,KRIEGMAN D.
From few to many:Illumination cone models for face recognition under variable lighting and pose[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(6):643-660.
[21]ROWEIS S.USPS Handwritten Digits [EB/OL].http://www.cs.nyu.edu/~roweis/data.html.
[1] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[2] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[3] 张人之, 朱焱.
基于主动学习的社交网络恶意用户检测方法
Malicious User Detection Method for Social Network Based on Active Learning
计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151
[4] 王体爽, 李培峰, 朱巧明.
基于数据增强的中文隐式篇章关系识别方法
Chinese Implicit Discourse Relation Recognition Based on Data Augmentation
计算机科学, 2021, 48(10): 85-90. https://doi.org/10.11896/jsjkx.200800115
[5] 董心悦, 范瑞东, 侯臣平.
基于边际概率分布匹配的主动标记分布学习
Active Label Distribution Learning Based on Marginal Probability Distribution Matching
计算机科学, 2020, 47(9): 190-197. https://doi.org/10.11896/jsjkx.200700077
[6] 李金霞, 赵志刚, 李强, 吕慧显, 李明生.
改进的局部和相似性保持特征选择算法
Improved Locality and Similarity Preserving Feature Selection Algorithm
计算机科学, 2020, 47(6A): 480-484. https://doi.org/10.11896/JsJkx.20190800095
[7] 钱玲龙, 武娇, 王人锋, 陆慧娟.
基于稀疏表示的多文档自动摘要
Multi-document Automatic Summarization Based on Sparse Representation
计算机科学, 2020, 47(11A): 97-105. https://doi.org/10.11896/jsjkx.200300087
[8] 李秀琴, 王天荆, 白光伟, 沈航.
基于压缩感知的两阶段多目标定位算法
Two-phase Multi-target Localization Algorithm Based on Compressed Sensing
计算机科学, 2019, 46(5): 50-56. https://doi.org/10.11896/j.issn.1002-137X.2019.05.007
[9] 李翼宏, 刘方正, 杜镇宇.
一种改进主动学习的恶意代码检测算法
Malware Detection Algorithm for Improving Active Learning
计算机科学, 2019, 46(5): 92-99. https://doi.org/10.11896/j.issn.1002-137X.2019.05.014
[10] 赵海燕, 汪静, 陈庆奎, 曹健.
主动学习在推荐系统中的应用
Application of Active Learning in Recommendation System
计算机科学, 2019, 46(11A): 153-158.
[11] 孙金, 陈若煜, 罗恒利.
基于主动学习的人脸标注研究
Research on Face Tagging Based on Active Learning
计算机科学, 2018, 45(9): 299-302. https://doi.org/10.11896/j.issn.1002-137X.2018.09.050
[12] 李昌利, 张琳, 樊棠怀.
基于自适应主动学习与联合双边滤波的高光谱图像分类
Hyperspectral Image Classification Based on Adaptive Active Learning and Joint Bilateral Filtering
计算机科学, 2018, 45(12): 223-228. https://doi.org/10.11896/j.issn.1002-137X.2018.12.037
[13] 李锋,万小强.
基于关联矩阵的短信自动分类
SMS Automatic Classification Based on Relational Matrix
计算机科学, 2017, 44(Z6): 428-432. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.096
[14] 王长宝,李青雯,于化龙.
面向类别不平衡数据的主动在线加权极限学习机算法
Active,Online and Weighted Extreme Learning Machine Algorithm for Class Imbalance Data
计算机科学, 2017, 44(12): 221-226. https://doi.org/10.11896/j.issn.1002-137X.2017.12.040
[15] 翟俊海,臧立光,张素芳.
在线序列主动学习方法
Online Sequential Active Learning Approach
计算机科学, 2017, 44(1): 37-41. https://doi.org/10.11896/j.issn.1002-137X.2017.01.007
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!