Computer Science ›› 2025, Vol. 52 ›› Issue (3): 222-230.doi: 10.11896/jsjkx.240100191

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Semi-supervised Sound Event Detection Based on Meta Learning

SHEN Yaxin1, GAO Lijian1 , MAO Qirong1,2   

  1. 1 College of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China
    2 Jiangsu Province Big Data Ubiquitous Perception and Intelligent Agriculture Application Engineering Research Center,Zhenjiang,Jiangsu 212013,China
  • Received:2024-01-29 Revised:2024-05-23 Online:2025-03-15 Published:2025-03-07
  • About author:SHEN Yaxin,born in 1999,postgra-duate.Her main research interests include multimedia intelligent analysis and so on.
    MAO Qirong,born in 1975,professor,Ph.Dsupervisor,is a member of CCF(No.17370S).Her main research in-terests include multimedia intelligent analysis and emotional computing.
  • Supported by:
    National Natural Science Foundation of China(62176106),Postgraduate Research & Practice Innovation Program of Jiangsu Province(KYCX22_3668) and Special Scientific Research Project of School of Emergency Management,Jiangsu University(KY-A-01).

Abstract: Existing semi-supervised sound event detection methods directly utilize strongly labeled synthetic samples,weakly labeled real samples,and unlabeled real samples for training to alleviate the issue of insufficient labeled samples.However,there is an inevitable distribution gap between synthetic and real domains,which can interfere with the direction of model gradient optimization,thereby restricting generalization ability of these models.To address this challenge,a novel semi-supervised sound event detection learning paradigm,meta mean teacher(MMT),is proposed based on meta-learning.Specifically,for each batch of trai-ning data,it is divided into a meta-training set consisting of synthetic samples and a meta-test set consisting of real samples.The meta-gradient calculated on the meta-training set serves as guidance for updating the meta-test gradient,allowing the model to perceive and learn more generalized knowledge.Experimental results on the DCASE2021 Task 4 dataset show that,compared to the official baseline,the proposed learning paradigm MMT has a relative improvement of 8.9%,6.6%,and 1.1% in the F1,PSDS1,and PSDS2 metrics,respectively.Compared to the current state-of-the-art methods in the field,the proposed learning paradigm MMT still demonstrates a significant performance advantage.

Key words: Sound event detection, Meta learning, Consistency regularization, Semi-supervised learning, Deep learning

CLC Number: 

  • TP391
[1]ROVETTA S,MNASRI Z,MASULLI F,et al.Anomaly Detection Based on Interval-Valued Fuzzy Sets:Application to Rare Sound Event Detection[C]//Proceedings of the 13th International Workshop on Fuzzy Logic and Applications.Vietri sul Mare:Springer Press,2021:1-8.
[2]NERI M,BATTISTI F,NERI A,et al.Sound Event Detection for Human Safety and Security in Noisy Environments[J].IEEE Access,2022,10:134230-134240.
[3]PANDEYA Y R,BHATTARAI B,LEE J.Visual Object Detector for Cow Sound Event Detection[J].IEEE Access,2020,8:162625-162633.
[4]GAO L J,MAO Q R.Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection[J].Computer Science,2020,47(1):159-164.
[5]SERIZEL R,TURPAULT N,EGHBAL Z H,et al.Large-Scale Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Surrey:IEEE Press,2018:1-5.
[6]TARVAINEN A,VALPOLA H.Mean Teachers are BetterRole Models:Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:MIT Press,2017:1195-1204.
[7]TURPAULT N,SERIZEL R,SHAH A P,et al.Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.New York:IEEE Press,2019:1-6.
[8]HU Y,ZHU X J,LI Y L,et al.A Multi-Grained Based Attention Network for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1531-1535.
[9]SHAO N,LOWEIMI E,LI X F.RCT:Random ConsistencyTraining for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1541-1545.
[10]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on Iterspeech.Incheon:ISCA,2022:2763-2767.
[11]PARK S,KOTHINTI S R,ELHILALI M.Temporal Codingwith Magnitude-Phase Regularization for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1536-1540.
[12]YANG L P,HAO J Y,GU X H,et al.Sound Event Detection with Audio Tagging Consistency Constraint CRNN[J].Journal of Electronics & Information Technology,2022,44(3):1102-1110.
[13]YANG S Z,ZHANG L,WANG J H,et al.Review of SoundEvent Detection[J].Journal of Guangxi Normal University(Natural Science Edition),2023,41(2):1-18.
[14]YANG L P,HAO J Y,HOU Z W,et al.Two-Stage Domain Ada-ptation for Sound Event Detection[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:230-234.
[15]ZHENG X,SONG Y,DAI L R,et al.An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Brno:ISCA,2021:556-560.
[16]HUISMAN M,VAN RIJN J N,PLAAT A.A Survey of Deep Meta-Learning[J].Artificial Intelligence Review,2021,54(6):4483-4541.
[17]WEI Q Y,YU L Q,LI X H,et al.Consistency-Guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image Segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention.Vancouver:Springer Press,2023:183-193.
[18]XU H,XIE H T,TAN Q F,et al.Meta Semi-Supervised Medical Image Segmentation with Label Hierarchy[J].Health Information Science and Systems,2023,11(1):26.
[19]LI J N,WONG Y K,ZHAO Q,et al.Learning to Learn from Noisy Labeled Data[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:5051-5059.
[20]ALGAN G,ULUSOY I.MetaLabelNet:Learning to Generate Soft-Labels from Noisy-Labels[J].IEEE Transactions on Image Processing,2022,31:4352-4362.
[21]LI J Z,SUN H L.Correct Twice at Once:Learning to Correct Noisy Labels for Robust Deep Learning[C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon:ACM Press,2022:5142-5151.
[22]ZHU W T,LIU W,LIANG S S,et al.Variational Continuous Bayesian Meta-Learning Based Algorithm for Recommendation[J].Computer Science,2023,50(7):66-71.
[23]BAI J,GENG X Y,YI L,et al.Improved Feature Interaction Algorithm Based on Meta-learning[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=22016.
[24]WEI W,ZHU H,BENETOS E,et al.A-CRNN:A Domain Adaptation Model for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:276-280.
[25]GAO L J,MAO Q R,DONG M.Joint-Former:Jointly Regularized and Locally Down-sampled Conformer for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:2753-2757.
[26]HOSPEDALES T,ANTONIOU A,MICAELLI P,et al.Meta-Learning in Neural Networks:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):5149-5169.
[27]YAO H X,WU X,TAO Z Q,et al.Automated Relational Meta-Learning[C]//Proceedings of the 8th International Conference on Learning Representations.Virtual:Ithaca,2020:1-19.
[28]FINN C,ABBEEL P,LEVINE S.Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks[C]//Proceedings of the International Conference on Machine Learning.Sydney:ACM Press,2017:1126-1135.
[29]ZHOU K,LIU Z,QIAO Y,et al.Domain Generalization:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):4396-4415.
[30]ZENG C,WANG X,MIAO X X,et al.Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:1998-2002.
[31]SERIZEL R,TURPAULT N,SHAH A,et al.Sound Event Detection in Synthetic Domestic Environments[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:86-90.
[32]GEMMEKE J F,ELLIS D P W,FREEDMAN D,et al.Audio Set:An Ontology and Human-Labeled Dataset for Audio Events[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.New Orleans:IEEE Press,2017:776-780.
[33]SALAMON J,MACCONNELL D,CARTWRIGHT M,et al.Scaper:A Library for Soundscape Synthesis and Augmentation[C]//Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.New Paltz:IEEE Press,2017:344-348.
[34]RONCHINI F,SERIZEL R.A Benchmark of State-of-the-Art Sound Event Detection Systems Evaluated on Synthetic Soundscapes[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:1031-1035.
[35]PARK S,ELHILALI M.Time-Balanced Focal Loss for AudioEvent Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:311-315.
[36]KOTHINTI S,ELHILALI M.Temporal Contrastive-Loss forAudio Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:326-330.
[37]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:2763-2767.
[38]WANG J,YAO P,DENG F,et al.NAS-DYMC:NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection[C]//Proceedings of IEEE International Confe-rence on Acoustics,Speech and Signal Processing.Rhodes Island:IEEE Press,2023:1-5.
[39]LIN W C,BONDI L,GHAFFARZADEGAN S.Background Domain Switch:A Novel Data Augmentation Technique for Robust Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:326-330.
[40]MIYAZAKI K,KOMATSU T,HAYASHI T,et al.Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:101-104.
[41]WAKAYAMA K,SAITO S.CNN-Transformer with Self-At-tention Network for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:806-810.
[42]MESAROS A,HEITTOLA T,VIRTANEN T.Metrics for Polyphonic Sound Event Detection[J].Applied Sciences,2016,6(6):162.
[43]BILEN Ç,FERRONI G,TUVERI F,et al.A Framework for the Robust Evaluation of Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:61-65.
[44]LIU L Y,JIANG H M,HE P C,et al.On The Variance of the Adaptive Learning Rate and Beyond[C]//Proceedings of International Conference on Learning Representations.Virtual:Ithaca,2020:1-14.
[45]TURPAULT N,SERIZEL R.Training Sound Event Detectionon A Heterogeneous Dataset[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:1-5.
[1] ZHONG Yue, GU Jieming. 3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss [J]. Computer Science, 2025, 52(3): 77-85.
[2] WANG Yuan, HUO Peng, HAN Yi, CHEN Tun, WANG Xiang, WEN Hui. Survey on Deep Learning-based Meteorological Forecasting Models [J]. Computer Science, 2025, 52(3): 112-126.
[3] HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276.
[4] WANG Tao, BAI Xuefei, WANG Wenjian. Selective Feature Fusion for 3D CT Image Segmentation of Renal Cancer Based on Edge Enhancement [J]. Computer Science, 2025, 52(3): 41-49.
[5] WANG Jie, WANG Chuangye, XIE Jiucheng, GAO Hao. Animatable Head Avatar Reconstruction Algorithm Based on Region Encoding [J]. Computer Science, 2025, 52(3): 50-57.
[6] SUN Rui, WANG Fei, FENG Huidong, ZHANG Xudong, GAO Jun. Research Progress in Facial Presentation Attack Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(2): 323-335.
[7] DING Ruiyang, SUN Lei, DAI Leyu, ZANG Weifei, XU Bayi. Generation Method for Adversarial Networks Traffic Based on Universal Perturbations [J]. Computer Science, 2025, 52(2): 336-343.
[8] CHEN Zigang, PAN Ding, LENG Tao, ZHU Haihua, CHEN Long, ZHOU Yousheng. Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing [J]. Computer Science, 2025, 52(2): 374-379.
[9] ZHANG Jian, LI Hui, ZHANG Shengming, WU Jie, PENG Ying. Review of Pre-training Methods for Visually-rich Document Understanding [J]. Computer Science, 2025, 52(1): 259-276.
[10] LI Yahe, XIE Zhipeng. Active Learning Based on Maximum Influence Set [J]. Computer Science, 2025, 52(1): 289-297.
[11] ZHANG Xin, ZHANG Han, NIU Manyu, JI Lixia. Adversarial Sample Detection in Computer Vision:A Survey [J]. Computer Science, 2025, 52(1): 345-361.
[12] SU Chaoran, ZHANG Dalong, HUANG Yong, DONG An. RF Fingerprint Recognition Based on SE Attention Multi-source Domain Adversarial Network [J]. Computer Science, 2025, 52(1): 412-419.
[13] ZHANG Yusong, XU Shuai, YAN Xingyu, GUAN Donghai, XU Jianqiu. Survey on Cross-city Human Mobility Prediction [J]. Computer Science, 2025, 52(1): 102-119.
[14] LIU Yuming, DAI Yu, CHEN Gongping. Review of Federated Learning in Medical Image Processing [J]. Computer Science, 2025, 52(1): 183-193.
[15] LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!