环境辅助的多任务混合声音事件检测方法

doi:10.11896/jsjkx.190200365

Abstract

Abstract: Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from diffe-rent events,and the overall features extracted from the mixture can not represent each event well,leading to suboptimal AED performance especially when the number of sound events increases or environment changes.Existing methods do not consider the impact of environmental changes on detection performance.Therefore,an Environment-Assisted Multi-Task learning (EAMT) method for AED was proposed.EAMT model mainly consists of two core parts:environment classifier and sound event detector,where the environment classifier is used to learn environment context features.As additional information of event detection,the environment context features are fused with sound event features to assist sound event detection by muli-task learning,so as to improve the robustness of EAMT model to environmental changes and the performance of polyphonic event detection.Based on Freesound dataset,one of the mainstream open data set in the field of AED,and general performance evaluation metrics F1 score,three sets of comparative experiments were set up to compare the proposed method with DNN(baseline) and CRNN,which is one of the most popular methods.The experimental results show that:compared with the single task model,EAMT model improves the performance of environment classification and event detection,and the introduction of environment context features further improves the performance of acoustic event detection.EAMT model has stronger robustness than DNN and CRNN as the F1 score of EAMT is 2% to 5% higher than other models when environment changes.When the number of target events increases,EAMT model still performs prominently,and compared with other models,EAMT model achieves an improvement of about 2% to 10% in F1 score.

Key words: Acoustic event detection, Environmental robustness, Environment-assisted, Features fusion, Multi-task learning

CLC Number:

TP391

GAO Li-jian,MAO Qi-rong. Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection[J].Computer Science, 2020, 47(1): 159-164.

References

[1]CAKIR E,HEITTOLA T,HUTTUNEN H,et al.Polyphonic sound event detection using multi label deep neural networks[C]∥Proceedings of the 6th International Joint Conference on Neural Networks.Killarney,Ireland,2015:1-7.
[2]ZHANG A Y,NI C J.Research on background model adaptive method of audio monitoring system based on audio event detection and classification[J].Computer Science,2016,43(9):310-314.
[3]ZHANG D,ELLISD.Detecting sound events in basketballvideo archive[R].Department of Electrical Engineering,Columbia University,New York,2001.
[4]CHU S,NARAYANAN S,KUO C l.Where am I? Scene Recognition for Mobile Robots using Audio Features[C]∥Procee-dings of the 7th International Conference on Multimedia and Expo.Toronto,Canada,2006:885-888.
[5]HARMAA,MCKINNEY M F,SKOWRONEK J.Automatic surveillance of the acoustic activity in our living environment[C]∥Proceedings of the 6th IEEE International Conference on Multimedia and Expo.Amsterdam,Netherlands,2005:634-637.
[6]INNAMI S,KASAH.NMF-based environmental sound source separation using time-variant gain features[J].Computers & Mathematics with Applications,2012,64(5):1333-1342.
[7]DESSEINA,CONT A,LEMAITRE G.Real-time detection of overlapping sound events with non-negative matrix factorization[M].Matrix Information Geometry,2013:341-371.
[8]MESARO A,HEITTOLA T,ERONEN A,et al.Acoustic event detection in real life recordings[C]∥Proceedings of the 18th Signal Processing Conference.Aalborg,Denmark,2010:1267-1271.
[9]HEITTOLA T,MESAROS A,VIRTANEN T,et al.Supervised model training for overlapping sound eventsbased on unsupervised source separation[C]∥Proceedings of the 38th IEEE International Conference on Acoustics,Speech and Signal Proces-sing.Vancouver,Canada,2013:8677-8681.
[10]MUN S,SHON S,KIM W,et al.Deep neural networkbottleneck features for acoustic event recognition[C]∥Proceedings of the 16th INTERSPEECH.SanFrancisco,USA,2016:2954-2957.
[11]GENCOGLUO,VIRTANEN T,HUTTUNEN H.Recognitionof acoustic events using deep neural networks[C]∥Proceedings of the 22nd European Signal Processing Conference.Lisbon,Portugal,2014:506-510.
[12]PARASCANDOLO G,HUTTUNEN H,VIRTANEN T.Re- current neural networks for polyphonic sound event detection in real life recordings[C]∥Proceedings of the 9th International Conference on Acoustics,Speech,and Signal Processing.Shanghai,China,2016:6440-6444.
[13]WANG Y,METZE F.A transfer learning based featureextractor for polyphonic sound event detection using connectionist temporal classification[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:3097-3101.
[14]XIA X,TOGNERI R,SOHEL F,et al.Frame-wise dynamic threshold based polyphonic acoustic eventdetection[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018.
[15]ZHRER M,PERNKOPF F.Virtual adversarial trainingand data augmentation for acoustic event detection withgated recurrent neural networks[C]∥Proceedings of the 19th INTERSPEECH.Hyderabad,India,2018:493-497.
[16]MCLOUGHLIN I,ZHANGH,XIE Z,et al.Robust sound event classification using deep neuralnetworks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015,23(3):540-552.
[17]DO V H,CHEN N F,LIM B P,et al.Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(3):501-514.
[18]TAN Z,MAK M W,MAK K W.DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2018,26(4):700-712.
[19]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification[C]∥Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:1131-1140.
[20]SØGAARD A,GOLDBERG Y.Deep multi-task learning with low level tasks supervised at lower layers[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,2016:231-235.
[21]FONT F,ROMA G,SERRA X.Freesound technical demo[C]∥Proceedings of the 21st ACM International Conference on Multimedia.Barcelona,Spain,2013:411-412.
[22]ADAVANNE S,VIRTANEN T.A report on sound event detection with different binaural features[R].Technical Report,DCASE2017 Challenge,2017.

Related Articles 15

[1]	DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[2]	ZHAO Kai, AN Wei-chao, ZHANG Xiao-yu, WANG Bin, ZHANG Shan, XIANG Jie. Intracerebral Hemorrhage Image Segmentation and Classification Based on Multi-taskLearning of Shared Shallow Parameters [J]. Computer Science, 2022, 49(4): 203-208.
[3]	YANG Xiao-yu, YIN Kang-ning, HOU Shao-qi, DU Wen-yi, YIN Guang-qiang. Person Re-identification Based on Feature Location and Fusion [J]. Computer Science, 2022, 49(3): 170-178.
[4]	QU Zhong, CHEN Wen. Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion [J]. Computer Science, 2022, 49(3): 192-196.
[5]	SONG Long-ze, WAN Huai-yu, GUO Sheng-nan, LIN You-fang. Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction [J]. Computer Science, 2021, 48(7): 112-117.
[6]	LI Jia-qian, YAN Hua. Crowd Counting Method Based on Cross-column Features Fusion [J]. Computer Science, 2021, 48(6): 118-124.
[7]	LIU Xiao-long, HAN Fang, WANG Zhi-jie. Joint Question Answering Model Based on Knowledge Representation [J]. Computer Science, 2021, 48(6): 241-245.
[8]	ZHOU Xiao-jin, XU Chen-ming, RUAN Tong. Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records [J]. Computer Science, 2021, 48(4): 237-242.
[9]	ZHANG Chun-yun, QU Hao, CUI Chao-ran, SUN Hao-liang, YIN Yi-long. Process Supervision Based Sequence Multi-task Method for Legal Judgement Prediction [J]. Computer Science, 2021, 48(3): 227-232.
[10]	WANG Ti-shuang, LI Pei-feng, ZHU Qiao-ming. Chinese Implicit Discourse Relation Recognition Based on Data Augmentation [J]. Computer Science, 2021, 48(10): 85-90.
[11]	PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[12]	WU Hong-tao, LIU Li-yuan, MENG Ying, RONG Ya-peng and LI Lu-kai. Novel Threat Degree Analysis Method for Scattered ObJects in Road Traffic Based on Dynamic Multi-feature Fusion [J]. Computer Science, 2020, 47(6A): 196-205.
[13]	ZHOU Zi-qin, YAN Hua. 3D Shape Recognition Based on Multi-task Learning with Limited Multi-view Data [J]. Computer Science, 2020, 47(4): 125-130.
[14]	GENG Lei-lei, CUI Chao-ran, SHI Cheng, SHEN Zhen, YIN Yi-long, FENG Shi-hong. Social Image Tag and Group Joint Recommendation Based on Deep Multi-task Learning [J]. Computer Science, 2020, 47(12): 177-182.
[15]	CHEN Xun-min, YE Shu-han, ZHAN Rui. Crowd Counting Model of Convolutional Neural Network Based on Multi-task Learning and Coarse to Fine [J]. Computer Science, 2020, 47(11A): 183-187.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0