计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 113-118.doi: 10.11896/jsjkx.210500034
侯夏晔1, 陈海燕1,3, 张兵1, 袁立罡2, 贾亦真1
HOU Xia-ye1, CHEN Hai-yan1,3, ZHANG Bing1, YUAN Li-gang2, JIA Yi-zhen1
摘要: 度量学习是机器学习领域的重要研究内容。度量结果的优劣会直接影响后续机器学习算法的性能。目前大多度量学习的研究工作都是在有监督环境下进行的。然而,实际应用中往往存在大量数据没有标记或需要付出昂贵代价才能获得标记的问题。针对这一问题,提出一种适用于半监督环境的基于支持向量机的主动度量学习算法(ASVM2L)。首先,从待学习无标记样本中随机选择少量样本交予专家标注,再利用这些样本训练支持向量机度量学习器;然后,根据度量学习结果,采用不同K近邻分类器对剩余未标记样本进行分类评估,选择表决差异最大的样本交予专家标注,再加入训练集重新进行度量学习;重复执行上述步骤至满足终止条件,以保证在有限的标记样本子集下能获得最佳的度量学习矩阵。在标准数据集上的对比实验验证了所提ASVM2L算法能在不影响分类精度的前提下,利用最少的标记样本获得更多的标记信息,因而具有更好的度量性能。
中图分类号:
[1] KULIS B.Metric learning:A survey[J].Foundations andTrends in Machine Learning,2012,5(4):287-364. [2] XING E P,NG A Y,JORDAN M I,et al.Distance metric lear-ning with application to clustering with side-information[J].NIPS,2002,15(12):505-512 [3] LIU W,WEN Y,YU Z,et al.Sphereface:Deep hypersphere embedding for face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:212-220. [4] PLÖTZ T,ROTH S.Neural nearest neighbors networks[J].Advances in Neural Information Processing Systems,2018,31:1087-1098. [5] CHOI J,MIN C,LEE B.Mathematical Analysis on Information-Theoretic Metric Learning With Application to Supervised Learning[J].IEEE Access,2019,7:121998-122005. [6] WU W,TAO D,LI H,et al.Deep features for person re-identification on metric learning[J].Pattern Recognition,2021,110:107424. [7] WANG F,ZUO W,ZHANG L,et al.A kernel classificationframework for metric learning[J].IEEE Transactions on Neural Networks and Learning Systems,2014,26(9):1950-1962. [8] PASOLLI E,YANG H L,CRAWFORD M M.Active-metriclearning for classification of remotely sensed hyperspectral images[J].IEEE Transactions on Geoscience and Remote Sensing,2015,54(4):1925-1939. [9] KUMARI P,GORU R,CHAUDHURI S,et al.Batch Decorrelation for Active Metric Learning[C]//IJCAI.2020. [10] ZHOU Z H.A brief introduction to weakly supervised learning[J].National science review,2018,5(1):44-53. [11] AGGARWAL C C,KONG X,GU Q,et al.Active learning:Asurvey[M]//Data Classification:Algorithms and Applications.CRC Press,2014:571-605. [12] SHARMA M,BILGIC M.Evidence-based uncertainty sampling for active learning[J].Data Mining and Knowledge Discovery,2017,31(1):164-202. [13] YE Y,LI T,ADJEROH D,et al.A survey on malware detection using data mining techniques[J].ACM Computing Surveys(CSUR),2017,50(3):1-40. [14] GRAVES A,BELLEMARE M G,MENICK J,et al.Automated Curriculum Learning for Neural Networks[C]//International Conference on Machine Learning.PMLR,2017:1311-1320. [15] YANG Y,LOOG M.A benchmark and comparison of activelearning for logistic regression[J].Pattern Recognition,2018,83:401-415. [16] KAO C C,LEE T Y,SEN P,et al.Localization-aware activelearning for object detection[C]//Asian Conference on Compu-ter Vision.Cham:Springer,2018:506-522. |
[1] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[2] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[3] | 庞兴龙, 朱国胜. 基于半监督学习的网络流量分析研究 Survey of Network Traffic Analysis Based on Semi Supervised Learning 计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131 |
[4] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[5] | 许华杰, 陈育, 杨洋, 秦远卓. 基于混合样本自动数据增强技术的半监督学习方法 Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques 计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156 |
[6] | 张人之, 朱焱. 基于主动学习的社交网络恶意用户检测方法 Malicious User Detection Method for Social Network Based on Active Learning 计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151 |
[7] | 王体爽, 李培峰, 朱巧明. 基于数据增强的中文隐式篇章关系识别方法 Chinese Implicit Discourse Relation Recognition Based on Data Augmentation 计算机科学, 2021, 48(10): 85-90. https://doi.org/10.11896/jsjkx.200800115 |
[8] | 董心悦, 范瑞东, 侯臣平. 基于边际概率分布匹配的主动标记分布学习 Active Label Distribution Learning Based on Marginal Probability Distribution Matching 计算机科学, 2020, 47(9): 190-197. https://doi.org/10.11896/jsjkx.200700077 |
[9] | 秦悦, 丁世飞. 半监督聚类综述 Survey of Semi-supervised Clustering 计算机科学, 2019, 46(9): 15-21. https://doi.org/10.11896/j.issn.1002-137X.2019.09.002 |
[10] | 吴振宇, 李云雷, 吴凡. 基于Tucker分解的半监督支持张量机 Semi-supervised Support Tensor Based on Tucker Decomposition 计算机科学, 2019, 46(9): 195-200. https://doi.org/10.11896/j.issn.1002-137X.2019.09.028 |
[11] | 李翼宏, 刘方正, 杜镇宇. 一种改进主动学习的恶意代码检测算法 Malware Detection Algorithm for Improving Active Learning 计算机科学, 2019, 46(5): 92-99. https://doi.org/10.11896/j.issn.1002-137X.2019.05.014 |
[12] | 沈鸿, 刘军发, 陈益强, 蒋鑫龙, 黄正宇. 基于多模融合的半监督场景识别方法 Semi-supervised Scene Recognition Method Based on Multi-mode Fusion 计算机科学, 2019, 46(12): 306-312. https://doi.org/10.11896/jsjkx.191200500C |
[13] | 于诚, 朱皖宁, 游坤, 朱金付. 基于Attention机制与LRUA模块的ESports行为模式预测模型 Prediction Model of E-sports Behavior Pattern Based on Attention Mechanism and LRUA Module 计算机科学, 2019, 46(11A): 76-79. |
[14] | 赵海燕, 汪静, 陈庆奎, 曹健. 主动学习在推荐系统中的应用 Application of Active Learning in Recommendation System 计算机科学, 2019, 46(11A): 153-158. |
[15] | 喻影, 陈珂, 寿黎但, 陈刚, 吴晓凡. 基于关键词和关键句抽取的用户评论情感分析 Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences 计算机科学, 2019, 46(10): 19-26. https://doi.org/10.11896/jsjkx.191000531C |
|