Computer Science ›› 2020, Vol. 47 ›› Issue (1): 159-164.doi: 10.11896/jsjkx.190200365

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection

GAO Li-jian,MAO Qi-rong   

  1. (School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China)
  • Received:2019-02-26 Published:2020-01-19
  • About author:GAO Li-jian,born in 1993,postgraduate.His main research interests include multimedia intelligent analysis;MAO Qi-rong,born in 1975,professor,Ph.D supervisor,is member of China Computer Federation (CCF).Her main research interests include multimedia intelligent analysis and emotional computing.
  • Supported by:
    This work was supported by the Key Projects of the National Natural Science Foundation of China (1836220) and National Nature Science Foundation of China (61672267).

Abstract: Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from diffe-rent events,and the overall features extracted from the mixture can not represent each event well,leading to suboptimal AED performance especially when the number of sound events increases or environment changes.Existing methods do not consider the impact of environmental changes on detection performance.Therefore,an Environment-Assisted Multi-Task learning (EAMT) method for AED was proposed.EAMT model mainly consists of two core parts:environment classifier and sound event detector,where the environment classifier is used to learn environment context features.As additional information of event detection,the environment context features are fused with sound event features to assist sound event detection by muli-task learning,so as to improve the robustness of EAMT model to environmental changes and the performance of polyphonic event detection.Based on Freesound dataset,one of the mainstream open data set in the field of AED,and general performance evaluation metrics F1 score,three sets of comparative experiments were set up to compare the proposed method with DNN(baseline) and CRNN,which is one of the most popular methods.The experimental results show that:compared with the single task model,EAMT model improves the performance of environment classification and event detection,and the introduction of environment context features further improves the performance of acoustic event detection.EAMT model has stronger robustness than DNN and CRNN as the F1 score of EAMT is 2% to 5% higher than other models when environment changes.When the number of target events increases,EAMT model still performs prominently,and compared with other models,EAMT model achieves an improvement of about 2% to 10% in F1 score.

Key words: Acoustic event detection, Environment-assisted, Multi-task learning, Features fusion, Environmental robustness

CLC Number: 

  • TP391
[1]CAKIR E,HEITTOLA T,HUTTUNEN H,et al.Polyphonic sound event detection using multi label deep neural networks[C]∥Proceedings of the 6th International Joint Conference on Neural Networks.Killarney,Ireland,2015:1-7.
[2]ZHANG A Y,NI C J.Research on background model adaptive method of audio monitoring system based on audio event detection and classification[J].Computer Science,2016,43(9):310-314.
[3]ZHANG D,ELLISD.Detecting sound events in basketballvideo archive[R].Department of Electrical Engineering,Columbia University,New York,2001.
[4]CHU S,NARAYANAN S,KUO C l.Where am I? Scene Recognition for Mobile Robots using Audio Features[C]∥Procee-dings of the 7th International Conference on Multimedia and Expo.Toronto,Canada,2006:885-888.
[5]HARMAA,MCKINNEY M F,SKOWRONEK J.Automatic surveillance of the acoustic activity in our living environment[C]∥Proceedings of the 6th IEEE International Conference on Multimedia and Expo.Amsterdam,Netherlands,2005:634-637.
[6]INNAMI S,KASAH.NMF-based environmental sound source separation using time-variant gain features[J].Computers & Mathematics with Applications,2012,64(5):1333-1342.
[7]DESSEINA,CONT A,LEMAITRE G.Real-time detection of overlapping sound events with non-negative matrix factorization[M].Matrix Information Geometry,2013:341-371.
[8]MESARO A,HEITTOLA T,ERONEN A,et al.Acoustic event detection in real life recordings[C]∥Proceedings of the 18th Signal Processing Conference.Aalborg,Denmark,2010:1267-1271.
[9]HEITTOLA T,MESAROS A,VIRTANEN T,et al.Supervised model training for overlapping sound eventsbased on unsupervised source separation[C]∥Proceedings of the 38th IEEE International Conference on Acoustics,Speech and Signal Proces-sing.Vancouver,Canada,2013:8677-8681.
[10]MUN S,SHON S,KIM W,et al.Deep neural networkbottleneck features for acoustic event recognition[C]∥Proceedings of the 16th INTERSPEECH.SanFrancisco,USA,2016:2954-2957.
[11]GENCOGLUO,VIRTANEN T,HUTTUNEN H.Recognitionof acoustic events using deep neural networks[C]∥Proceedings of the 22nd European Signal Processing Conference.Lisbon,Portugal,2014:506-510.
[12]PARASCANDOLO G,HUTTUNEN H,VIRTANEN T.Re- current neural networks for polyphonic sound event detection in real life recordings[C]∥Proceedings of the 9th International Conference on Acoustics,Speech,and Signal Processing.Shanghai,China,2016:6440-6444.
[13]WANG Y,METZE F.A transfer learning based featureextractor for polyphonic sound event detection using connectionist temporal classification[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:3097-3101.
[14]XIA X,TOGNERI R,SOHEL F,et al.Frame-wise dynamic threshold based polyphonic acoustic eventdetection[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018.
[15]ZHRER M,PERNKOPF F.Virtual adversarial trainingand data augmentation for acoustic event detection withgated recurrent neural networks[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:493-497.
[16]MCLOUGHLIN I,ZHANGH,XIE Z,et al.Robust sound event classification using deep neuralnetworks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(3):540-552.
[17]DO V H,CHEN N F,LIM B P,et al.Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(3):501-514.
[18]TAN Z,MAK M W,MAK K W.DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(4):700-712.
[19]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification[C]∥Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:1131-1140.
[20]SØGAARD A,GOLDBERG Y.Deep multi-task learning with low level tasks supervised at lower layers[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,2016:231-235.
[21]FONT F,ROMA G,SERRA X.Freesound technical demo[C]∥Proceedings of the 21st ACM International Conference on Multimedia.Barcelona,Spain,2013:411-412.
[22]ADAVANNE S,VIRTANEN T.A report on sound event detection with different binaural features[R].Technical Report,DCASE2017 Challenge,2017.
[1] PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[2] WU Hong-tao, LIU Li-yuan, MENG Ying, RONG Ya-peng and LI Lu-kai. Novel Threat Degree Analysis Method for Scattered ObJects in Road Traffic Based on Dynamic Multi-feature Fusion [J]. Computer Science, 2020, 47(6A): 196-205.
[3] ZHOU Zi-qin, YAN Hua. 3D Shape Recognition Based on Multi-task Learning with Limited Multi-view Data [J]. Computer Science, 2020, 47(4): 125-130.
[4] GENG Lei-lei, CUI Chao-ran, SHI Cheng, SHEN Zhen, YIN Yi-long, FENG Shi-hong. Social Image Tag and Group Joint Recommendation Based on Deep Multi-task Learning [J]. Computer Science, 2020, 47(12): 177-182.
[5] CHEN Xun-min, YE Shu-han, ZHAN Rui. Crowd Counting Model of Convolutional Neural Network Based on Multi-task Learning and Coarse to Fine [J]. Computer Science, 2020, 47(11A): 183-187.
[6] WU Liang-qing, ZHANG Dong, LI Shou-shan, CHEN Ying. Multi-modal Emotion Recognition Approach Based on Multi-task Learning [J]. Computer Science, 2019, 46(11): 284-290.
[7] XU Deng, HUANG Xiao-dong. Fire Images Features Extraction Based on Improved Two-stream Convolution Network [J]. Computer Science, 2019, 46(11): 291-296.
[8] ZHANG Ai-ying and NI Chong-jia. Research on Background Model Adaptation for Acoustic Event Detection and Classification Based on Acoustic Surveillance System [J]. Computer Science, 2016, 43(9): 310-314.
[9] . Optimal Particle Filter Object Tracking Algorithm Based on Features Fusion and Clustering Kernel Function Smooth Sampling [J]. Computer Science, 2012, 39(4): 210-213.
[10] HAN Guang ZHAO Chun-xia YUAN Xia (College of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094,China). [J]. Computer Science, 2009, 36(6): 268-272.
[11] MENG Hao-hua LI Guo-zheng (School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China). [J]. Computer Science, 2008, 35(10): 186-187.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .