计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 159-164.doi: 10.11896/jsjkx.190200365
高利剑,毛启容
GAO Li-jian,MAO Qi-rong
摘要: 在混合声音事件检测任务中,不同事件的声音信号相互混杂,从混合语音信号中提取的全局特征无法很好地表达每种单独的事件,导致当声音事件数量增加或者环境变化时,声音事件检测性能急剧下降。目前已存在的方法尚未考虑环境变化对检测性能的影响。鉴于此,文中提出了一种基于多任务学习的环境辅助的声音事件检测模型(Environment-Assisted Multi-Task,EAMT),该模型主要包含场景分类器和事件检测器两大核心部分,其中场景分类器用于学习环境上下文特征,该特征作为事件检测的额外信息与声音事件特征融合,并通过多任务学习方式来辅助声音事件检测,以此提高模型对环境变化的鲁棒性及多目标事件检测的性能。基于声音事件检测领域的主流公开数据集Freesound以及通用性能评估指标F1分数,将所提模型与基准模型(Deep Neural Network,DNN)及主流模型(Convolutional Recurrent Neural Network,CRNN)进行对比,共设置了3组对比实验。实验结果表明:1)相比单一任务的模型,基于多任务学习的EAMT模型的场景分类效果和事件检测性能均有所提升,且环境上下文特征的引入进一步提升了声音事件检测的性能;2)EAMT模型对环境变化具有更强的鲁棒性,在环境发生变化时,EAMT模型事件检测的F1分数高出其他模型2%~5%;3)在目标声音事件数量增加时,相比其他模型,EAMT模型的表现依旧突出,在F1指标上取得了2%~10%的提升。
中图分类号:
[1]CAKIR E,HEITTOLA T,HUTTUNEN H,et al.Polyphonic sound event detection using multi label deep neural networks[C]∥Proceedings of the 6th International Joint Conference on Neural Networks.Killarney,Ireland,2015:1-7. [2]ZHANG A Y,NI C J.Research on background model adaptive method of audio monitoring system based on audio event detection and classification[J].Computer Science,2016,43(9):310-314. [3]ZHANG D,ELLISD.Detecting sound events in basketballvideo archive[R].Department of Electrical Engineering,Columbia University,New York,2001. [4]CHU S,NARAYANAN S,KUO C l.Where am I? Scene Recognition for Mobile Robots using Audio Features[C]∥Procee-dings of the 7th International Conference on Multimedia and Expo.Toronto,Canada,2006:885-888. [5]HARMAA,MCKINNEY M F,SKOWRONEK J.Automatic surveillance of the acoustic activity in our living environment[C]∥Proceedings of the 6th IEEE International Conference on Multimedia and Expo.Amsterdam,Netherlands,2005:634-637. [6]INNAMI S,KASAH.NMF-based environmental sound source separation using time-variant gain features[J].Computers & Mathematics with Applications,2012,64(5):1333-1342. [7]DESSEINA,CONT A,LEMAITRE G.Real-time detection of overlapping sound events with non-negative matrix factorization[M].Matrix Information Geometry,2013:341-371. [8]MESARO A,HEITTOLA T,ERONEN A,et al.Acoustic event detection in real life recordings[C]∥Proceedings of the 18th Signal Processing Conference.Aalborg,Denmark,2010:1267-1271. [9]HEITTOLA T,MESAROS A,VIRTANEN T,et al.Supervised model training for overlapping sound eventsbased on unsupervised source separation[C]∥Proceedings of the 38th IEEE International Conference on Acoustics,Speech and Signal Proces-sing.Vancouver,Canada,2013:8677-8681. [10]MUN S,SHON S,KIM W,et al.Deep neural networkbottleneck features for acoustic event recognition[C]∥Proceedings of the 16th INTERSPEECH.SanFrancisco,USA,2016:2954-2957. [11]GENCOGLUO,VIRTANEN T,HUTTUNEN H.Recognitionof acoustic events using deep neural networks[C]∥Proceedings of the 22nd European Signal Processing Conference.Lisbon,Portugal,2014:506-510. [12]PARASCANDOLO G,HUTTUNEN H,VIRTANEN T.Re- current neural networks for polyphonic sound event detection in real life recordings[C]∥Proceedings of the 9th International Conference on Acoustics,Speech,and Signal Processing.Shanghai,China,2016:6440-6444. [13]WANG Y,METZE F.A transfer learning based featureextractor for polyphonic sound event detection using connectionist temporal classification[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:3097-3101. [14]XIA X,TOGNERI R,SOHEL F,et al.Frame-wise dynamic threshold based polyphonic acoustic eventdetection[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018. [15]ZHRER M,PERNKOPF F.Virtual adversarial trainingand data augmentation for acoustic event detection withgated recurrent neural networks[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:493-497. [16]MCLOUGHLIN I,ZHANGH,XIE Z,et al.Robust sound event classification using deep neuralnetworks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(3):540-552. [17]DO V H,CHEN N F,LIM B P,et al.Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(3):501-514. [18]TAN Z,MAK M W,MAK K W.DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(4):700-712. [19]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification[C]∥Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:1131-1140. [20]SØGAARD A,GOLDBERG Y.Deep multi-task learning with low level tasks supervised at lower layers[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,2016:231-235. [21]FONT F,ROMA G,SERRA X.Freesound technical demo[C]∥Proceedings of the 21st ACM International Conference on Multimedia.Barcelona,Spain,2013:411-412. [22]ADAVANNE S,VIRTANEN T.A report on sound event detection with different binaural features[R].Technical Report,DCASE2017 Challenge,2017. |
[1] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[2] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[3] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[4] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[5] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 |
[6] | 陈永平, 朱建清, 谢懿, 吴含笑, 曾焕强. 基于外接圆半径差损失的实时安全帽检测算法 Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss 计算机科学, 2022, 49(6A): 424-428. https://doi.org/10.11896/jsjkx.220100252 |
[7] | 孙洁琪, 李亚峰, 张文博, 刘鹏辉. 基于离散小波变换的双域特征融合深度卷积神经网络 Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation 计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199 |
[8] | 蓝凌翔, 池明旻. 基于特征注意力融合网络的遥感变化检测研究 Remote Sensing Change Detection Based on Feature Fusion and Attention Network 计算机科学, 2022, 49(6): 193-198. https://doi.org/10.11896/jsjkx.210500058 |
[9] | 李发光, 伊力哈木·亚尔买买提. 基于改进CenterNet的航拍绝缘子缺陷实时检测模型 Real-time Detection Model of Insulator Defect Based on Improved CenterNet 计算机科学, 2022, 49(5): 84-91. https://doi.org/10.11896/jsjkx.210400142 |
[10] | 董奇达, 王喆, 吴松洋. 结合注意力机制与几何信息的特征融合框架 Feature Fusion Framework Combining Attention Mechanism and Geometric Information 计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180 |
[11] | 李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰. 基于多特征融合的重叠组套索脑功能超网络构建及分类 Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion 计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049 |
[12] | 范新南, 赵忠鑫, 严炜, 严锡君, 史朋飞. 结合注意力机制的多尺度特征融合图像去雾算法 Multi-scale Feature Fusion Image Dehazing Algorithm Combined with Attention Mechanism 计算机科学, 2022, 49(5): 50-57. https://doi.org/10.11896/jsjkx.210400093 |
[13] | 高心悦, 田汉民. 基于改进U-Net网络的液滴分割方法 Droplet Segmentation Method Based on Improved U-Net Network 计算机科学, 2022, 49(4): 227-232. https://doi.org/10.11896/jsjkx.210300193 |
[14] | 徐涛, 陈奕仁, 吕宗磊. 基于改进YOLOv3的机坪工作人员反光背心检测研究 Study on Reflective Vest Detection for Apron Workers Based on Improved YOLOv3 Algorithm 计算机科学, 2022, 49(4): 239-246. https://doi.org/10.11896/jsjkx.210200119 |
[15] | 赵凯, 安卫超, 张晓宇, 王彬, 张杉, 相洁. 共享浅层参数多任务学习的脑出血图像分割与分类 Intracerebral Hemorrhage Image Segmentation and Classification Based on Multi-taskLearning of Shared Shallow Parameters 计算机科学, 2022, 49(4): 203-208. https://doi.org/10.11896/jsjkx.201000153 |
|