计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 159-164.doi: 10.11896/jsjkx.190200365

• 计算机图形学&多媒体 • 上一篇    下一篇

环境辅助的多任务混合声音事件检测方法

高利剑,毛启容   

  1. (江苏大学计算机科学与通信工程学院 江苏 镇江212013)
  • 收稿日期:2019-02-26 发布日期:2020-01-19
  • 通讯作者: 毛启容(mao_qr@ujs.edu.cn)
  • 基金资助:
    国家自然科学基金通用联合重点项目(U1836220);国家自然科学基金面上项目(61672267)

Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection

GAO Li-jian,MAO Qi-rong   

  1. (School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China)
  • Received:2019-02-26 Published:2020-01-19
  • About author:GAO Li-jian,born in 1993,postgraduate.His main research interests include multimedia intelligent analysis;MAO Qi-rong,born in 1975,professor,Ph.D supervisor,is member of China Computer Federation (CCF).Her main research interests include multimedia intelligent analysis and emotional computing.
  • Supported by:
    This work was supported by the Key Projects of the National Natural Science Foundation of China (1836220) and National Nature Science Foundation of China (61672267).

摘要: 在混合声音事件检测任务中,不同事件的声音信号相互混杂,从混合语音信号中提取的全局特征无法很好地表达每种单独的事件,导致当声音事件数量增加或者环境变化时,声音事件检测性能急剧下降。目前已存在的方法尚未考虑环境变化对检测性能的影响。鉴于此,文中提出了一种基于多任务学习的环境辅助的声音事件检测模型(Environment-Assisted Multi-Task,EAMT),该模型主要包含场景分类器和事件检测器两大核心部分,其中场景分类器用于学习环境上下文特征,该特征作为事件检测的额外信息与声音事件特征融合,并通过多任务学习方式来辅助声音事件检测,以此提高模型对环境变化的鲁棒性及多目标事件检测的性能。基于声音事件检测领域的主流公开数据集Freesound以及通用性能评估指标F1分数,将所提模型与基准模型(Deep Neural Network,DNN)及主流模型(Convolutional Recurrent Neural Network,CRNN)进行对比,共设置了3组对比实验。实验结果表明:1)相比单一任务的模型,基于多任务学习的EAMT模型的场景分类效果和事件检测性能均有所提升,且环境上下文特征的引入进一步提升了声音事件检测的性能;2)EAMT模型对环境变化具有更强的鲁棒性,在环境发生变化时,EAMT模型事件检测的F1分数高出其他模型2%~5%;3)在目标声音事件数量增加时,相比其他模型,EAMT模型的表现依旧突出,在F1指标上取得了2%~10%的提升。

关键词: 多任务学习, 环境辅助, 环境鲁棒性, 声音事件检测, 特征融合

Abstract: Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from diffe-rent events,and the overall features extracted from the mixture can not represent each event well,leading to suboptimal AED performance especially when the number of sound events increases or environment changes.Existing methods do not consider the impact of environmental changes on detection performance.Therefore,an Environment-Assisted Multi-Task learning (EAMT) method for AED was proposed.EAMT model mainly consists of two core parts:environment classifier and sound event detector,where the environment classifier is used to learn environment context features.As additional information of event detection,the environment context features are fused with sound event features to assist sound event detection by muli-task learning,so as to improve the robustness of EAMT model to environmental changes and the performance of polyphonic event detection.Based on Freesound dataset,one of the mainstream open data set in the field of AED,and general performance evaluation metrics F1 score,three sets of comparative experiments were set up to compare the proposed method with DNN(baseline) and CRNN,which is one of the most popular methods.The experimental results show that:compared with the single task model,EAMT model improves the performance of environment classification and event detection,and the introduction of environment context features further improves the performance of acoustic event detection.EAMT model has stronger robustness than DNN and CRNN as the F1 score of EAMT is 2% to 5% higher than other models when environment changes.When the number of target events increases,EAMT model still performs prominently,and compared with other models,EAMT model achieves an improvement of about 2% to 10% in F1 score.

Key words: Acoustic event detection, Environmental robustness, Environment-assisted, Features fusion, Multi-task learning

中图分类号: 

  • TP391
[1]CAKIR E,HEITTOLA T,HUTTUNEN H,et al.Polyphonic sound event detection using multi label deep neural networks[C]∥Proceedings of the 6th International Joint Conference on Neural Networks.Killarney,Ireland,2015:1-7.
[2]ZHANG A Y,NI C J.Research on background model adaptive method of audio monitoring system based on audio event detection and classification[J].Computer Science,2016,43(9):310-314.
[3]ZHANG D,ELLISD.Detecting sound events in basketballvideo archive[R].Department of Electrical Engineering,Columbia University,New York,2001.
[4]CHU S,NARAYANAN S,KUO C l.Where am I? Scene Recognition for Mobile Robots using Audio Features[C]∥Procee-dings of the 7th International Conference on Multimedia and Expo.Toronto,Canada,2006:885-888.
[5]HARMAA,MCKINNEY M F,SKOWRONEK J.Automatic surveillance of the acoustic activity in our living environment[C]∥Proceedings of the 6th IEEE International Conference on Multimedia and Expo.Amsterdam,Netherlands,2005:634-637.
[6]INNAMI S,KASAH.NMF-based environmental sound source separation using time-variant gain features[J].Computers & Mathematics with Applications,2012,64(5):1333-1342.
[7]DESSEINA,CONT A,LEMAITRE G.Real-time detection of overlapping sound events with non-negative matrix factorization[M].Matrix Information Geometry,2013:341-371.
[8]MESARO A,HEITTOLA T,ERONEN A,et al.Acoustic event detection in real life recordings[C]∥Proceedings of the 18th Signal Processing Conference.Aalborg,Denmark,2010:1267-1271.
[9]HEITTOLA T,MESAROS A,VIRTANEN T,et al.Supervised model training for overlapping sound eventsbased on unsupervised source separation[C]∥Proceedings of the 38th IEEE International Conference on Acoustics,Speech and Signal Proces-sing.Vancouver,Canada,2013:8677-8681.
[10]MUN S,SHON S,KIM W,et al.Deep neural networkbottleneck features for acoustic event recognition[C]∥Proceedings of the 16th INTERSPEECH.SanFrancisco,USA,2016:2954-2957.
[11]GENCOGLUO,VIRTANEN T,HUTTUNEN H.Recognitionof acoustic events using deep neural networks[C]∥Proceedings of the 22nd European Signal Processing Conference.Lisbon,Portugal,2014:506-510.
[12]PARASCANDOLO G,HUTTUNEN H,VIRTANEN T.Re- current neural networks for polyphonic sound event detection in real life recordings[C]∥Proceedings of the 9th International Conference on Acoustics,Speech,and Signal Processing.Shanghai,China,2016:6440-6444.
[13]WANG Y,METZE F.A transfer learning based featureextractor for polyphonic sound event detection using connectionist temporal classification[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:3097-3101.
[14]XIA X,TOGNERI R,SOHEL F,et al.Frame-wise dynamic threshold based polyphonic acoustic eventdetection[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018.
[15]ZHRER M,PERNKOPF F.Virtual adversarial trainingand data augmentation for acoustic event detection withgated recurrent neural networks[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:493-497.
[16]MCLOUGHLIN I,ZHANGH,XIE Z,et al.Robust sound event classification using deep neuralnetworks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(3):540-552.
[17]DO V H,CHEN N F,LIM B P,et al.Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(3):501-514.
[18]TAN Z,MAK M W,MAK K W.DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(4):700-712.
[19]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification[C]∥Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:1131-1140.
[20]SØGAARD A,GOLDBERG Y.Deep multi-task learning with low level tasks supervised at lower layers[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,2016:231-235.
[21]FONT F,ROMA G,SERRA X.Freesound technical demo[C]∥Proceedings of the 21st ACM International Conference on Multimedia.Barcelona,Spain,2013:411-412.
[22]ADAVANNE S,VIRTANEN T.A report on sound event detection with different binaural features[R].Technical Report,DCASE2017 Challenge,2017.
[1] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[2] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[3] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[4] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[5] 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建.
基于注意力机制和多任务学习的阿尔茨海默症分类
Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning
计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
[6] 陈永平, 朱建清, 谢懿, 吴含笑, 曾焕强.
基于外接圆半径差损失的实时安全帽检测算法
Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss
计算机科学, 2022, 49(6A): 424-428. https://doi.org/10.11896/jsjkx.220100252
[7] 孙洁琪, 李亚峰, 张文博, 刘鹏辉.
基于离散小波变换的双域特征融合深度卷积神经网络
Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation
计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[8] 蓝凌翔, 池明旻.
基于特征注意力融合网络的遥感变化检测研究
Remote Sensing Change Detection Based on Feature Fusion and Attention Network
计算机科学, 2022, 49(6): 193-198. https://doi.org/10.11896/jsjkx.210500058
[9] 李发光, 伊力哈木·亚尔买买提.
基于改进CenterNet的航拍绝缘子缺陷实时检测模型
Real-time Detection Model of Insulator Defect Based on Improved CenterNet
计算机科学, 2022, 49(5): 84-91. https://doi.org/10.11896/jsjkx.210400142
[10] 董奇达, 王喆, 吴松洋.
结合注意力机制与几何信息的特征融合框架
Feature Fusion Framework Combining Attention Mechanism and Geometric Information
计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[11] 李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰.
基于多特征融合的重叠组套索脑功能超网络构建及分类
Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion
计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049
[12] 范新南, 赵忠鑫, 严炜, 严锡君, 史朋飞.
结合注意力机制的多尺度特征融合图像去雾算法
Multi-scale Feature Fusion Image Dehazing Algorithm Combined with Attention Mechanism
计算机科学, 2022, 49(5): 50-57. https://doi.org/10.11896/jsjkx.210400093
[13] 高心悦, 田汉民.
基于改进U-Net网络的液滴分割方法
Droplet Segmentation Method Based on Improved U-Net Network
计算机科学, 2022, 49(4): 227-232. https://doi.org/10.11896/jsjkx.210300193
[14] 徐涛, 陈奕仁, 吕宗磊.
基于改进YOLOv3的机坪工作人员反光背心检测研究
Study on Reflective Vest Detection for Apron Workers Based on Improved YOLOv3 Algorithm
计算机科学, 2022, 49(4): 239-246. https://doi.org/10.11896/jsjkx.210200119
[15] 赵凯, 安卫超, 张晓宇, 王彬, 张杉, 相洁.
共享浅层参数多任务学习的脑出血图像分割与分类
Intracerebral Hemorrhage Image Segmentation and Classification Based on Multi-taskLearning of Shared Shallow Parameters
计算机科学, 2022, 49(4): 203-208. https://doi.org/10.11896/jsjkx.201000153
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!