Computer Science ›› 2025, Vol. 52 ›› Issue (3): 222-230.doi: 10.11896/jsjkx.240100191

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Semi-supervised Sound Event Detection Based on Meta Learning

SHEN Yaxin1, GAO Lijian1 , MAO Qirong1,2   

  1. 1 College of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China
    2 Jiangsu Province Big Data Ubiquitous Perception and Intelligent Agriculture Application Engineering Research Center,Zhenjiang,Jiangsu 212013,China
  • Received:2024-01-29 Revised:2024-05-23 Online:2025-03-15 Published:2025-03-07
  • About author:SHEN Yaxin,born in 1999,postgra-duate.Her main research interests include multimedia intelligent analysis and so on.
    MAO Qirong,born in 1975,professor,Ph.Dsupervisor,is a member of CCF(No.17370S).Her main research in-terests include multimedia intelligent analysis and emotional computing.
  • Supported by:
    National Natural Science Foundation of China(62176106),Postgraduate Research & Practice Innovation Program of Jiangsu Province(KYCX22_3668) and Special Scientific Research Project of School of Emergency Management,Jiangsu University(KY-A-01).

Abstract: Existing semi-supervised sound event detection methods directly utilize strongly labeled synthetic samples,weakly labeled real samples,and unlabeled real samples for training to alleviate the issue of insufficient labeled samples.However,there is an inevitable distribution gap between synthetic and real domains,which can interfere with the direction of model gradient optimization,thereby restricting generalization ability of these models.To address this challenge,a novel semi-supervised sound event detection learning paradigm,meta mean teacher(MMT),is proposed based on meta-learning.Specifically,for each batch of trai-ning data,it is divided into a meta-training set consisting of synthetic samples and a meta-test set consisting of real samples.The meta-gradient calculated on the meta-training set serves as guidance for updating the meta-test gradient,allowing the model to perceive and learn more generalized knowledge.Experimental results on the DCASE2021 Task 4 dataset show that,compared to the official baseline,the proposed learning paradigm MMT has a relative improvement of 8.9%,6.6%,and 1.1% in the F1,PSDS1,and PSDS2 metrics,respectively.Compared to the current state-of-the-art methods in the field,the proposed learning paradigm MMT still demonstrates a significant performance advantage.

Key words: Sound event detection, Meta learning, Consistency regularization, Semi-supervised learning, Deep learning

CLC Number: 

  • TP391
[1]ROVETTA S,MNASRI Z,MASULLI F,et al.Anomaly Detection Based on Interval-Valued Fuzzy Sets:Application to Rare Sound Event Detection[C]//Proceedings of the 13th International Workshop on Fuzzy Logic and Applications.Vietri sul Mare:Springer Press,2021:1-8.
[2]NERI M,BATTISTI F,NERI A,et al.Sound Event Detection for Human Safety and Security in Noisy Environments[J].IEEE Access,2022,10:134230-134240.
[3]PANDEYA Y R,BHATTARAI B,LEE J.Visual Object Detector for Cow Sound Event Detection[J].IEEE Access,2020,8:162625-162633.
[4]GAO L J,MAO Q R.Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection[J].Computer Science,2020,47(1):159-164.
[5]SERIZEL R,TURPAULT N,EGHBAL Z H,et al.Large-Scale Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Surrey:IEEE Press,2018:1-5.
[6]TARVAINEN A,VALPOLA H.Mean Teachers are BetterRole Models:Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:MIT Press,2017:1195-1204.
[7]TURPAULT N,SERIZEL R,SHAH A P,et al.Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.New York:IEEE Press,2019:1-6.
[8]HU Y,ZHU X J,LI Y L,et al.A Multi-Grained Based Attention Network for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1531-1535.
[9]SHAO N,LOWEIMI E,LI X F.RCT:Random ConsistencyTraining for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1541-1545.
[10]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on Iterspeech.Incheon:ISCA,2022:2763-2767.
[11]PARK S,KOTHINTI S R,ELHILALI M.Temporal Codingwith Magnitude-Phase Regularization for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:1536-1540.
[12]YANG L P,HAO J Y,GU X H,et al.Sound Event Detection with Audio Tagging Consistency Constraint CRNN[J].Journal of Electronics & Information Technology,2022,44(3):1102-1110.
[13]YANG S Z,ZHANG L,WANG J H,et al.Review of SoundEvent Detection[J].Journal of Guangxi Normal University(Natural Science Edition),2023,41(2):1-18.
[14]YANG L P,HAO J Y,HOU Z W,et al.Two-Stage Domain Ada-ptation for Sound Event Detection[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:230-234.
[15]ZHENG X,SONG Y,DAI L R,et al.An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Brno:ISCA,2021:556-560.
[16]HUISMAN M,VAN RIJN J N,PLAAT A.A Survey of Deep Meta-Learning[J].Artificial Intelligence Review,2021,54(6):4483-4541.
[17]WEI Q Y,YU L Q,LI X H,et al.Consistency-Guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image Segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention.Vancouver:Springer Press,2023:183-193.
[18]XU H,XIE H T,TAN Q F,et al.Meta Semi-Supervised Medical Image Segmentation with Label Hierarchy[J].Health Information Science and Systems,2023,11(1):26.
[19]LI J N,WONG Y K,ZHAO Q,et al.Learning to Learn from Noisy Labeled Data[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Long Beach:IEEE Press,2019:5051-5059.
[20]ALGAN G,ULUSOY I.MetaLabelNet:Learning to Generate Soft-Labels from Noisy-Labels[J].IEEE Transactions on Image Processing,2022,31:4352-4362.
[21]LI J Z,SUN H L.Correct Twice at Once:Learning to Correct Noisy Labels for Robust Deep Learning[C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon:ACM Press,2022:5142-5151.
[22]ZHU W T,LIU W,LIANG S S,et al.Variational Continuous Bayesian Meta-Learning Based Algorithm for Recommendation[J].Computer Science,2023,50(7):66-71.
[23]BAI J,GENG X Y,YI L,et al.Improved Feature Interaction Algorithm Based on Meta-learning[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=22016.
[24]WEI W,ZHU H,BENETOS E,et al.A-CRNN:A Domain Adaptation Model for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:276-280.
[25]GAO L J,MAO Q R,DONG M.Joint-Former:Jointly Regularized and Locally Down-sampled Conformer for Semi-Supervised Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:2753-2757.
[26]HOSPEDALES T,ANTONIOU A,MICAELLI P,et al.Meta-Learning in Neural Networks:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):5149-5169.
[27]YAO H X,WU X,TAO Z Q,et al.Automated Relational Meta-Learning[C]//Proceedings of the 8th International Conference on Learning Representations.Virtual:Ithaca,2020:1-19.
[28]FINN C,ABBEEL P,LEVINE S.Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks[C]//Proceedings of the International Conference on Machine Learning.Sydney:ACM Press,2017:1126-1135.
[29]ZHOU K,LIU Z,QIAO Y,et al.Domain Generalization:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):4396-4415.
[30]ZENG C,WANG X,MIAO X X,et al.Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:1998-2002.
[31]SERIZEL R,TURPAULT N,SHAH A,et al.Sound Event Detection in Synthetic Domestic Environments[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:86-90.
[32]GEMMEKE J F,ELLIS D P W,FREEDMAN D,et al.Audio Set:An Ontology and Human-Labeled Dataset for Audio Events[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.New Orleans:IEEE Press,2017:776-780.
[33]SALAMON J,MACCONNELL D,CARTWRIGHT M,et al.Scaper:A Library for Soundscape Synthesis and Augmentation[C]//Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.New Paltz:IEEE Press,2017:344-348.
[34]RONCHINI F,SERIZEL R.A Benchmark of State-of-the-Art Sound Event Detection Systems Evaluated on Synthetic Soundscapes[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:1031-1035.
[35]PARK S,ELHILALI M.Time-Balanced Focal Loss for AudioEvent Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:311-315.
[36]KOTHINTI S,ELHILALI M.Temporal Contrastive-Loss forAudio Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:326-330.
[37]NAM H,KIM S H,KO B Y,et al.Frequency Dynamic Convolution:Frequency-Adaptive Pattern Recognition for Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Incheon:ISCA,2022:2763-2767.
[38]WANG J,YAO P,DENG F,et al.NAS-DYMC:NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection[C]//Proceedings of IEEE International Confe-rence on Acoustics,Speech and Signal Processing.Rhodes Island:IEEE Press,2023:1-5.
[39]LIN W C,BONDI L,GHAFFARZADEGAN S.Background Domain Switch:A Novel Data Augmentation Technique for Robust Sound Event Detection[C]//Proceedings of the International Conference on INTERSPEECH.Dublin:ISCA,2023:326-330.
[40]MIYAZAKI K,KOMATSU T,HAYASHI T,et al.Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:101-104.
[41]WAKAYAMA K,SAITO S.CNN-Transformer with Self-At-tention Network for Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Press,2022:806-810.
[42]MESAROS A,HEITTOLA T,VIRTANEN T.Metrics for Polyphonic Sound Event Detection[J].Applied Sciences,2016,6(6):162.
[43]BILEN Ç,FERRONI G,TUVERI F,et al.A Framework for the Robust Evaluation of Sound Event Detection[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE Press,2020:61-65.
[44]LIU L Y,JIANG H M,HE P C,et al.On The Variance of the Adaptive Learning Rate and Beyond[C]//Proceedings of International Conference on Learning Representations.Virtual:Ithaca,2020:1-14.
[45]TURPAULT N,SERIZEL R.Training Sound Event Detectionon A Heterogeneous Dataset[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.Tokyo:IEEE Press,2020:1-5.
[1] HUANG Miaomiao, WANG Huiying, WANG Meixia, WANG Yejiang , ZHAO Yuhai. Review of Graph Embedding Learning Research:From Simple Graph to Complex Graph [J]. Computer Science, 2026, 53(1): 58-76.
[2] WANG Cheng, JIN Cheng. KAN-based Unsupervised Multivariate Time Series Anomaly Detection Network [J]. Computer Science, 2026, 53(1): 89-96.
[3] XUE Jingyan, XIA Jianan, HUO Ruili, LIU Jie, ZHOU Xuezhong. Review of Retinal Image Analysis Methods for OCT/OCTA Based on Deep Learning [J]. Computer Science, 2026, 53(1): 128-140.
[4] ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin. EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection [J]. Computer Science, 2026, 53(1): 153-162.
[5] LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[6] YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[7] ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[8] LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[9] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[10] LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[11] LIU Zhengyu, ZHANG Fan, QI Xiaofeng, GAO Yanzhao, SONG Yijing, FAN Wang. Review of Research on Deep Learning Compiler [J]. Computer Science, 2025, 52(8): 29-44.
[12] TANG Boyuan, LI Qi. Review on Application of Spatial-Temporal Graph Neural Network in PM2.5 ConcentrationForecasting [J]. Computer Science, 2025, 52(8): 71-85.
[13] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[14] CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[15] GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!