计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 222-230.doi: 10.11896/jsjkx.240100191
沈雅馨1, 高利剑1, 毛启容1,2
SHEN Yaxin1, GAO Lijian1 , MAO Qirong1,2
摘要: 现有的半监督声音事件检测方法直接使用强标签合成样本、弱标签真实样本和无标签真实样本进行训练,以缓解标签样本量不足的问题。然而,合成和真实数据域之间存在不可避免的分布差异,这种差异会干扰模型梯度优化方向,从而限制模型的泛化能力。针对这一问题,基于元学习(Meta Learning)提出了一种新颖的半监督声音事件检测学习范式MMT(Meta Mean Teacher)。具体来说,对于每个训练批次的数据,将其分为由合成样本组成的元训练集和由真实样本组成的元测试集;将模型在元训练集上计算的元梯度作为元测试梯度更新的指导,使模型感知并学习到更具泛化性的知识。在DCASE2021任务4数据集的测试集上进行对比实验,结果表明,相较于官方基线,所提出的学习范式MMT在F1,PSDS1和PSDS2指标上分别提升了8.9%,6.6%和1.1%;相较于当前的先进方法,所提出的学习范式MMT同样表现出了显著的性能优势。
中图分类号:
[1]ROVETTA S,MNASRI Z,MASULLI F,et al.Anomaly Detection Based on Interval-Valued Fuzzy Sets:Application to Rare Sound Event Detection[C]//Proceedings of the 13th International Workshop on Fuzzy Logic and Applications.Vietri sul Mare:Springer Press,2021:1-8. [2]NERI M,BATTISTI F,NERI A,et al.Sound Event Detection for Human Safety and Security in Noisy Environments[J].IEEE Access,2022,10:134230-134240. [3]PANDEYA Y R,BHATTARAI B,LEE J.Visual Object Detector for Cow Sound Event Detection[J].IEEE Access,2020,8:162625-162633. [4]GAO L J,MAO Q R.Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection[J].Computer Science,2020,47(1):159-164. [5]SERIZEL R,TURPAULT N,EGHBAL Z H,et al.Large-Scale Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Surrey:IEEE Press,2018:1-5. [6]TARVAINEN A,VALPOLA H.Mean Teachers are BetterRole Models:Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:MIT Press,2017:1195-1204. [7]TURPAULT N,SERIZEL R,SHAH A P,et al.Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.New York:IEEE Press,2019:1-6. [8]HU Y,ZHU X J,LI Y L,et al.A Multi-Grained Based Attention Network for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1531-1535. [9]SHAO N,LOWEIMI E,LI X F.RCT:Random ConsistencyTraining for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1541-1545. [10]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on Iterspeech.Incheon:ISCA,2022:2763-2767. [11]PARK S,KOTHINTI S R,ELHILALI M.Temporal Codingwith Magnitude-Phase Regularization for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1536-1540. [12]YANG L P,HAO J Y,GU X H,et al.Sound Event Detection with Audio Tagging Consistency Constraint CRNN[J].Journal of Electronics & Information Technology,2022,44(3):1102-1110. [13]YANG S Z,ZHANG L,WANG J H,et al.Review of SoundEvent Detection[J].Journal of Guangxi Normal University(Natural Science Edition),2023,41(2):1-18. [14]YANG L P,HAO J Y,HOU Z W,et al.Two-Stage Domain Ada-ptation for Sound Event Detection[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:230-234. [15]ZHENG X,SONG Y,DAI L R,et al.An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Brno:ISCA,2021:556-560. [16]HUISMAN M,VAN RIJN J N,PLAAT A.A Survey of Deep Meta-Learning[J].Artificial Intelligence Review,2021,54(6):4483-4541. [17]WEI Q Y,YU L Q,LI X H,et al.Consistency-Guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image Segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention.Vancouver:Springer Press,2023:183-193. [18]XU H,XIE H T,TAN Q F,et al.Meta Semi-Supervised Medical Image Segmentation with Label Hierarchy[J].Health Information Science and Systems,2023,11(1):26. [19]LI J N,WONG Y K,ZHAO Q,et al.Learning to Learn from Noisy Labeled Data[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:5051-5059. [20]ALGAN G,ULUSOY I.MetaLabelNet:Learning to Generate Soft-Labels from Noisy-Labels[J].IEEE Transactions on Image Processing,2022,31:4352-4362. [21]LI J Z,SUN H L.Correct Twice at Once:Learning to Correct Noisy Labels for Robust Deep Learning[C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon:ACM Press,2022:5142-5151. [22]ZHU W T,LIU W,LIANG S S,et al.Variational Continuous Bayesian Meta-Learning Based Algorithm for Recommendation[J].Computer Science,2023,50(7):66-71. [23]BAI J,GENG X Y,YI L,et al.Improved Feature Interaction Algorithm Based on Meta-learning[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=22016. [24]WEI W,ZHU H,BENETOS E,et al.A-CRNN:A Domain Adaptation Model for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:276-280. [25]GAO L J,MAO Q R,DONG M.Joint-Former:Jointly Regularized and Locally Down-sampled Conformer for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:2753-2757. [26]HOSPEDALES T,ANTONIOU A,MICAELLI P,et al.Meta-Learning in Neural Networks:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):5149-5169. [27]YAO H X,WU X,TAO Z Q,et al.Automated Relational Meta-Learning[C]//Proceedings of the 8th International Conference on Learning Representations.Virtual:Ithaca,2020:1-19. [28]FINN C,ABBEEL P,LEVINE S.Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks[C]//Proceedings of the International Conference on Machine Learning.Sydney:ACM Press,2017:1126-1135. [29]ZHOU K,LIU Z,QIAO Y,et al.Domain Generalization:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):4396-4415. [30]ZENG C,WANG X,MIAO X X,et al.Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:1998-2002. [31]SERIZEL R,TURPAULT N,SHAH A,et al.Sound Event Detection in Synthetic Domestic Environments[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:86-90. [32]GEMMEKE J F,ELLIS D P W,FREEDMAN D,et al.Audio Set:An Ontology and Human-Labeled Dataset for Audio Events[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.New Orleans:IEEE Press,2017:776-780. [33]SALAMON J,MACCONNELL D,CARTWRIGHT M,et al.Scaper:A Library for Soundscape Synthesis and Augmentation[C]//Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.New Paltz:IEEE Press,2017:344-348. [34]RONCHINI F,SERIZEL R.A Benchmark of State-of-the-Art Sound Event Detection Systems Evaluated on Synthetic Soundscapes[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:1031-1035. [35]PARK S,ELHILALI M.Time-Balanced Focal Loss for AudioEvent Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:311-315. [36]KOTHINTI S,ELHILALI M.Temporal Contrastive-Loss forAudio Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:326-330. [37]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:2763-2767. [38]WANG J,YAO P,DENG F,et al.NAS-DYMC:NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection[C]//Proceedings of IEEE International Confe-rence on Acoustics,Speech and Signal Processing.Rhodes Island:IEEE Press,2023:1-5. [39]LIN W C,BONDI L,GHAFFARZADEGAN S.Background Domain Switch:A Novel Data Augmentation Technique for Robust Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:326-330. [40]MIYAZAKI K,KOMATSU T,HAYASHI T,et al.Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:101-104. [41]WAKAYAMA K,SAITO S.CNN-Transformer with Self-At-tention Network for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:806-810. [42]MESAROS A,HEITTOLA T,VIRTANEN T.Metrics for Polyphonic Sound Event Detection[J].Applied Sciences,2016,6(6):162. [43]BILEN Ç,FERRONI G,TUVERI F,et al.A Framework for the Robust Evaluation of Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:61-65. [44]LIU L Y,JIANG H M,HE P C,et al.On The Variance of the Adaptive Learning Rate and Beyond[C]//Proceedings of International Conference on Learning Representations.Virtual:Ithaca,2020:1-14. [45]TURPAULT N,SERIZEL R.Training Sound Event Detectionon A Heterogeneous Dataset[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:1-5. |
|