基于最大影响力集合的主动学习方法

doi:10.11896/jsjkx.231100075

摘要/Abstract

摘要： 随着深度学习技术的不断进步,其已经在许多领域广泛应用。但深度模型的训练需要消耗大量标注数据,时间与资源成本高,如何利用尽可能少的标注数据达到最佳模型效果成为一个重要的研究课题。主动学习的提出正是为了解决这一问题,其旨在选择最有价值的样本进行标注并用于模型训练。传统的主动学习策略通常关注不确定性或多样性,旨在查询最困难或最具代表性的样本。然而,在主动学习问题中,这些方法通常没有考虑标注样本和无标注样本之间的交互作用。另一类主动学习方法则使用辅助网络进行样本选择,但这些方法通常会增加计算复杂度。在上述背景下,提出一种新的主动学习算法,旨在通过考虑不同样本之间的相互作用,综合衡量候选样本对其他样本的影响力与不确定性,来最大限度地提高模型的整体性能增益。所提算法首先根据样本隐含层表征之间的距离估计样本相互之间的影响力,进一步根据候选样本的影响力与无标注样本的不确定性估计该样本能够带来的潜在增益,并迭代地选择全局增益最大的样本进行标注。进一步在一系列不同领域的多种任务上将该方法与其他主动学习策略进行了比较,实验结果表明,该方法在所有任务中的表现均显著优于所有基线方法。进一步的量化分析实验也证明该方法在不确定性和多样性之间取得了良好的权衡,并探究了主动学习不同阶段应该注重的因素。

关键词: 主动学习, 深度学习, 不确定性

Abstract: With the continuous progress of deep learning,it has been widely applied in numerous fields.However,the training of deep models requires a large amount of labeled data,and the cost of time and resources is high.How to maximize the model performance with the least amount of labeled data has become an important research topic.Active learning aims to address this issue by selecting the most valuable samples for annotation and utilizing them for model training.Traditional active learning approaches usually concentrate on uncertainty or diversity,aiming to query the most difficult or representative samples.Nevertheless,these methods typically only take into account one-sided effects and overlook the interaction between labeled and unlabeled data in active learning scenarios.Another type of active learning method utilizes auxiliary networks for sample selection,but these methods usually result in higher computational complexity.This paper proposes a novel active learning approach designed to optimize the model’s total performance gain by taking into account sample-to-sample interactions and comprehensively measuring local uncertainty and the influence of candidate samples on other samples.The method first estimates the influence of samples on each other based on the distance between the hidden layer representations of the samples,and further estimates the potential gain that the sample can bring based on the influence of candidate samples and the uncertainty of unlabeled samples.The sample with the highest global gain is iteratively chosen for annotation.On a series of tasks across several domains,the study further compares the proposed method with other active learning strategies.Experimental results demonstrate that the proposed method outperforms all competitors in all tasks.Further quantitative analysis experiments have also demonstrated that it balances uncertainty and diversity well,and explores the factors that should be emphasized at different stages of active learning.

Key words: Active learning, Deep learning, Uncertainty

中图分类号:

TP391

李雅和, 谢志鹏. 基于最大影响力集合的主动学习方法[J]. 计算机科学, 2025, 52(1): 289-297. https://doi.org/10.11896/jsjkx.231100075

LI Yahe, XIE Zhipeng. Active Learning Based on Maximum Influence Set[J]. Computer Science, 2025, 52(1): 289-297. https://doi.org/10.11896/jsjkx.231100075

参考文献

[1]LEWIS D D,CATLETT J.Heterogeneous uncertainty sampling for supervised learning [C]//Machine Learning Proceedings 1994.Elsevier,1994:148-156.
[2]COHN D A,GHAHRAMANI Z,JORDAN M I.Active learning with statistical models [J].Journal of Artificial Intelligence Research,1996,4:129-145.
[3]LEWIS D D.A sequential algorithm for training text classifiers:Corrigendum and additional data[C]//Acm Sigir Forum.1995:13-19.
[4]GAL Y,ISLAM R,GHAHRAMANI Z.Deep bayesian active learning with image data[C]//Proceedings of the 34th International Conference on Machine Learning.2017:1183-1192.
[5]NGUYEN H T,SMEULDERS A.Active learning using pre-clustering[C]//Proceedings of the Twenty-first International Conference on Machine Learning.2004.
[6]SENER O,SAVARESE S.Active learning for convolutionalneural networks:A core-set approach[C]//International Conference on Learning Representations.2018:1-13.
[7]ASH J T,ZHANG C,KRISHNAMURTHY A,et al.Deepbatch active learning by diverse,uncertain gradient lower bounds[C]//International Conference on Learning Representations.2020:1-26.
[8]GISSIN D,SHALEV-SHWARTZ S.Discriminative active lear-ning [J].arXiv:1907.06347,2019.
[9]YOO D,KWEON I S.Learning loss for active learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:93-102.
[10]ROTH D,SMALL K.Margin-based active learning for structured output spaces[C]//European Conference on Machine Learning.2006:413-424.
[11]MARGATINA K,VERNIKOS G,BARRAULT L,et al.Active learning by acquiring contrastive examples[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:650-663.
[12]HOULSBY N,HUSZR F,GHAHRAMANI Z,et al.Bayesianactive learning for classification and preference learning [J].arXiv:11125745,2011.
[13]GAL Y,GHAHRAMANI Z.Dropout as a bayesian approximation:Representing model uncertainty in deep learning[C]//Proceedings of the 33nd International Conference on Machine Learning.2016:1050-1059.
[14]SIDDHANT A,LIPTON Z C.Deep bayesian active learning for natural language processing:Results of a large-scale empirical study[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:2904-2909.
[15]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186.
[16]EIN-DOR L,HALFON A,GERA A,et al.Active Learning for BERT:An Empirical Study[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:7949-7962.
[17]SHELMANOV A,PUZYREV D,KUPRIYANOVA L,et al.Active learning for sequence tagging with deep pre-trained mo-dels and Bayesian uncertainty estimates[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.2021:1698-1712.
[18]MARGATINA K,BARRAULT L,ALETRAS N.On the Importance of Effectively Adapting Pretrained Language Models for Active Learning[C]//Proceedings of the 60th Annual Mee-ting of the Association for Computational Linguistics(Volume 2:Short Papers).2022:825-836.
[19]LINDENBAUM M,MARKOVITCH S,RUSAKOV D.Selective sampling for nearest neighbor classifiers [J].Machine Learning,2004,54(2):125-152.
[20]WAN F,YUAN T,FU M,et al.Nearest Neighbor Classifier Embedded Network for Active Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10041-10048.
[21]ARTHUR D,VASSILVITSKII S.K-means++ the advantages of careful seeding[C]//Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms.2007:1027-1035.
[22]YUAN D,CHANG X,LIU Q,et al.Active Learning for Deep Visual Tracking [J].IEEE Transactions on Neural Networks and Learning Systems,2023,35(10):13284-13296.
[23]FREYTAG A,RODNER E,DENZLER J.Selecting influential examples:Active learning with expected model output changes[C]//European Conference on Computer Vision.2014:562-577.
[24]KDING C,RODNER E,FREYTAG A,et al.Active and conti-nuous exploration with deep neural networks and expected mo-del output changes [J].arXiv:1612.06129,2016.
[25]ROY N,MCCALLUM A.Toward optimal active learningthrough sampling estimation of error reduction[C]//Procee-dings of the Eighteenth International Conference on Machine Learning.2001:441-448.
[26]MAC AODHA O,CAMPBELL N D F,KAUTZ J,et al.Hierarchical subquery evaluation for active learning on a graph[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:564-571.
[27]SETTLES B,CRAVEN M,RAY S.Multiple-instance activelearning[C]//Advances in Neural Information Processing Systems 20,Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems.2007:1289-1296.
[28]FANG M,LI Y,COHN T.Learning how to active learn:A deep reinforcement learning approach[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Proces-sing.2017:595-605.
[29]LIU M,BUNTINE W,HAFFARI G.Learning how to actively learn:A deep imitation learning approach[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:1874-1883.
[30]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems(Volume 2).2014:2672-2680.
[31]SINHA S,EBRAHIMI S,DARRELL T.Variational adversarial active learning[C]//Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision.2019:5972-5981.
[32]KIM K,PARK D,KIM K I,et al.Task-aware variational adversarial active learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8166-8175.
[33]ZHANG B,LI L,YANG S,et al.State-relabeling adversarial ac-tive learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8756-8765.
[34]CHO J W,KIM D J,JUNG Y,et al.Mcdal:Maximum classifier discrepancy for active learning [J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(11):8753-8763.
[35]GENG L,LIU N,QIN J.Multi-classifier adversarial optimization for active learning[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2023:7687-7695.
[36]ZHOU H,SHI H C,TU Y F,et al.Robust Deep Neural Network Learning Based on Active Sampling [J].Computer Science,2022,49(7):164-169.
[37]DING H,ZOU P,ZHAO J,et al.Active Learning-based Text Entity and Relation Joint Extraction Method [J].Computer Science,2023,50(10):126-134.
[38]DHIMAN G,KUMAR A V,NIRMALAN R,et al.Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications [J].Multimedia Tools and Applications,2023,82(4):5343-5367.
[39]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition [J].Proceedings of the IEEE,1998,86(11):2278-2324.
[40]VOORHEES E M,TICE D M.Building a question answeringtest collection[C]//Proceedings of the 23rd Annual Interna-tional ACM SIGIR Conference on Research and Development in Information Retrieval.2000:200-207.
[41]SCHUSTER M,PALIWAL K.Bidirectional recurrent neuralnetworks [J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681.
[42]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.2014:1532-1543.
[43]KINGMA D P,BA J.Adam:A method for stochastic optimization[C]//International Conference on Learning Representations.2015:1-15.
[44]YUAN M,LIN H T,BOYD-GRABER J.Cold-start activelearning through self-supervised language modeling[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:7935-7948.
[45]ZHDANOV F.Diverse mini-batch active learning [J].arXiv:1901.05954,2019.
[46]ZHU J,WANG H,YAO T,et al.Active learning with sampling by uncertainty and density for word sense disambiguation and text classification[C]//Proceedings of the 22nd International Conference on Computational Linguistics(Coling 2008).2008:1137-1144.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed