计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 222-230.doi: 10.11896/jsjkx.240100191

• 计算机图形学&多媒体 • 上一篇    下一篇

基于元学习的半监督声音事件检测方法

沈雅馨1, 高利剑1, 毛启容1,2   

  1. 1 江苏大学计算机科学与通信工程学院 江苏 镇江 212013
    2 江苏省大数据泛在感知与智能农业应用工程研究中心 江苏 镇江 212013
  • 收稿日期:2024-01-29 修回日期:2024-05-23 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 毛启容(mao_qr@ujs.edu.cn)
  • 作者简介:(2212108045@stmail.ujs.edu.cn)
  • 基金资助:
    国家自然科学基金(62176106);江苏省研究生科研与实践创新计划项目(KYCX22_3668);江苏大学应急管理学院专项科研项目(KY-A-01)

Semi-supervised Sound Event Detection Based on Meta Learning

SHEN Yaxin1, GAO Lijian1 , MAO Qirong1,2   

  1. 1 College of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China
    2 Jiangsu Province Big Data Ubiquitous Perception and Intelligent Agriculture Application Engineering Research Center,Zhenjiang,Jiangsu 212013,China
  • Received:2024-01-29 Revised:2024-05-23 Online:2025-03-15 Published:2025-03-07
  • About author:SHEN Yaxin,born in 1999,postgra-duate.Her main research interests include multimedia intelligent analysis and so on.
    MAO Qirong,born in 1975,professor,Ph.Dsupervisor,is a member of CCF(No.17370S).Her main research in-terests include multimedia intelligent analysis and emotional computing.
  • Supported by:
    National Natural Science Foundation of China(62176106),Postgraduate Research & Practice Innovation Program of Jiangsu Province(KYCX22_3668) and Special Scientific Research Project of School of Emergency Management,Jiangsu University(KY-A-01).

摘要: 现有的半监督声音事件检测方法直接使用强标签合成样本、弱标签真实样本和无标签真实样本进行训练,以缓解标签样本量不足的问题。然而,合成和真实数据域之间存在不可避免的分布差异,这种差异会干扰模型梯度优化方向,从而限制模型的泛化能力。针对这一问题,基于元学习(Meta Learning)提出了一种新颖的半监督声音事件检测学习范式MMT(Meta Mean Teacher)。具体来说,对于每个训练批次的数据,将其分为由合成样本组成的元训练集和由真实样本组成的元测试集;将模型在元训练集上计算的元梯度作为元测试梯度更新的指导,使模型感知并学习到更具泛化性的知识。在DCASE2021任务4数据集的测试集上进行对比实验,结果表明,相较于官方基线,所提出的学习范式MMT在F1,PSDS1和PSDS2指标上分别提升了8.9%,6.6%和1.1%;相较于当前的先进方法,所提出的学习范式MMT同样表现出了显著的性能优势。

关键词: 声音事件检测, 元学习, 一致性正则化, 半监督学习, 深度学习

Abstract: Existing semi-supervised sound event detection methods directly utilize strongly labeled synthetic samples,weakly labeled real samples,and unlabeled real samples for training to alleviate the issue of insufficient labeled samples.However,there is an inevitable distribution gap between synthetic and real domains,which can interfere with the direction of model gradient optimization,thereby restricting generalization ability of these models.To address this challenge,a novel semi-supervised sound event detection learning paradigm,meta mean teacher(MMT),is proposed based on meta-learning.Specifically,for each batch of trai-ning data,it is divided into a meta-training set consisting of synthetic samples and a meta-test set consisting of real samples.The meta-gradient calculated on the meta-training set serves as guidance for updating the meta-test gradient,allowing the model to perceive and learn more generalized knowledge.Experimental results on the DCASE2021 Task 4 dataset show that,compared to the official baseline,the proposed learning paradigm MMT has a relative improvement of 8.9%,6.6%,and 1.1% in the F1,PSDS1,and PSDS2 metrics,respectively.Compared to the current state-of-the-art methods in the field,the proposed learning paradigm MMT still demonstrates a significant performance advantage.

Key words: Sound event detection, Meta learning, Consistency regularization, Semi-supervised learning, Deep learning

中图分类号: 

  • TP391
[1]ROVETTA S,MNASRI Z,MASULLI F,et al.Anomaly Detection Based on Interval-Valued Fuzzy Sets:Application to Rare Sound Event Detection[C]//Proceedings of the 13th International Workshop on Fuzzy Logic and Applications.Vietri sul Mare:Springer Press,2021:1-8.
[2]NERI M,BATTISTI F,NERI A,et al.Sound Event Detection for Human Safety and Security in Noisy Environments[J].IEEE Access,2022,10:134230-134240.
[3]PANDEYA Y R,BHATTARAI B,LEE J.Visual Object Detector for Cow Sound Event Detection[J].IEEE Access,2020,8:162625-162633.
[4]GAO L J,MAO Q R.Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection[J].Computer Science,2020,47(1):159-164.
[5]SERIZEL R,TURPAULT N,EGHBAL Z H,et al.Large-Scale Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Surrey:IEEE Press,2018:1-5.
[6]TARVAINEN A,VALPOLA H.Mean Teachers are BetterRole Models:Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:MIT Press,2017:1195-1204.
[7]TURPAULT N,SERIZEL R,SHAH A P,et al.Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.New York:IEEE Press,2019:1-6.
[8]HU Y,ZHU X J,LI Y L,et al.A Multi-Grained Based Attention Network for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1531-1535.
[9]SHAO N,LOWEIMI E,LI X F.RCT:Random ConsistencyTraining for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1541-1545.
[10]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on Iterspeech.Incheon:ISCA,2022:2763-2767.
[11]PARK S,KOTHINTI S R,ELHILALI M.Temporal Codingwith Magnitude-Phase Regularization for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1536-1540.
[12]YANG L P,HAO J Y,GU X H,et al.Sound Event Detection with Audio Tagging Consistency Constraint CRNN[J].Journal of Electronics & Information Technology,2022,44(3):1102-1110.
[13]YANG S Z,ZHANG L,WANG J H,et al.Review of SoundEvent Detection[J].Journal of Guangxi Normal University(Natural Science Edition),2023,41(2):1-18.
[14]YANG L P,HAO J Y,HOU Z W,et al.Two-Stage Domain Ada-ptation for Sound Event Detection[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:230-234.
[15]ZHENG X,SONG Y,DAI L R,et al.An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Brno:ISCA,2021:556-560.
[16]HUISMAN M,VAN RIJN J N,PLAAT A.A Survey of Deep Meta-Learning[J].Artificial Intelligence Review,2021,54(6):4483-4541.
[17]WEI Q Y,YU L Q,LI X H,et al.Consistency-Guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image Segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention.Vancouver:Springer Press,2023:183-193.
[18]XU H,XIE H T,TAN Q F,et al.Meta Semi-Supervised Medical Image Segmentation with Label Hierarchy[J].Health Information Science and Systems,2023,11(1):26.
[19]LI J N,WONG Y K,ZHAO Q,et al.Learning to Learn from Noisy Labeled Data[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:5051-5059.
[20]ALGAN G,ULUSOY I.MetaLabelNet:Learning to Generate Soft-Labels from Noisy-Labels[J].IEEE Transactions on Image Processing,2022,31:4352-4362.
[21]LI J Z,SUN H L.Correct Twice at Once:Learning to Correct Noisy Labels for Robust Deep Learning[C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon:ACM Press,2022:5142-5151.
[22]ZHU W T,LIU W,LIANG S S,et al.Variational Continuous Bayesian Meta-Learning Based Algorithm for Recommendation[J].Computer Science,2023,50(7):66-71.
[23]BAI J,GENG X Y,YI L,et al.Improved Feature Interaction Algorithm Based on Meta-learning[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=22016.
[24]WEI W,ZHU H,BENETOS E,et al.A-CRNN:A Domain Adaptation Model for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:276-280.
[25]GAO L J,MAO Q R,DONG M.Joint-Former:Jointly Regularized and Locally Down-sampled Conformer for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:2753-2757.
[26]HOSPEDALES T,ANTONIOU A,MICAELLI P,et al.Meta-Learning in Neural Networks:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):5149-5169.
[27]YAO H X,WU X,TAO Z Q,et al.Automated Relational Meta-Learning[C]//Proceedings of the 8th International Conference on Learning Representations.Virtual:Ithaca,2020:1-19.
[28]FINN C,ABBEEL P,LEVINE S.Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks[C]//Proceedings of the International Conference on Machine Learning.Sydney:ACM Press,2017:1126-1135.
[29]ZHOU K,LIU Z,QIAO Y,et al.Domain Generalization:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):4396-4415.
[30]ZENG C,WANG X,MIAO X X,et al.Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:1998-2002.
[31]SERIZEL R,TURPAULT N,SHAH A,et al.Sound Event Detection in Synthetic Domestic Environments[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:86-90.
[32]GEMMEKE J F,ELLIS D P W,FREEDMAN D,et al.Audio Set:An Ontology and Human-Labeled Dataset for Audio Events[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.New Orleans:IEEE Press,2017:776-780.
[33]SALAMON J,MACCONNELL D,CARTWRIGHT M,et al.Scaper:A Library for Soundscape Synthesis and Augmentation[C]//Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.New Paltz:IEEE Press,2017:344-348.
[34]RONCHINI F,SERIZEL R.A Benchmark of State-of-the-Art Sound Event Detection Systems Evaluated on Synthetic Soundscapes[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:1031-1035.
[35]PARK S,ELHILALI M.Time-Balanced Focal Loss for AudioEvent Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:311-315.
[36]KOTHINTI S,ELHILALI M.Temporal Contrastive-Loss forAudio Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:326-330.
[37]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:2763-2767.
[38]WANG J,YAO P,DENG F,et al.NAS-DYMC:NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection[C]//Proceedings of IEEE International Confe-rence on Acoustics,Speech and Signal Processing.Rhodes Island:IEEE Press,2023:1-5.
[39]LIN W C,BONDI L,GHAFFARZADEGAN S.Background Domain Switch:A Novel Data Augmentation Technique for Robust Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:326-330.
[40]MIYAZAKI K,KOMATSU T,HAYASHI T,et al.Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:101-104.
[41]WAKAYAMA K,SAITO S.CNN-Transformer with Self-At-tention Network for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:806-810.
[42]MESAROS A,HEITTOLA T,VIRTANEN T.Metrics for Polyphonic Sound Event Detection[J].Applied Sciences,2016,6(6):162.
[43]BILEN Ç,FERRONI G,TUVERI F,et al.A Framework for the Robust Evaluation of Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:61-65.
[44]LIU L Y,JIANG H M,HE P C,et al.On The Variance of the Adaptive Learning Rate and Beyond[C]//Proceedings of International Conference on Learning Representations.Virtual:Ithaca,2020:1-14.
[45]TURPAULT N,SERIZEL R.Training Sound Event Detectionon A Heterogeneous Dataset[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:1-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!