Computer Science ›› 2023, Vol. 50 ›› Issue (3): 191-198.doi: 10.11896/jsjkx.220500259

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Sound Event Joint Estimation Method Based on Three-dimension Convolution

MEI Pengcheng, YANG Jibin, ZHANG Qiang, HUANG Xiang   

  1. School of Command and Control Engineering,Army Engineering University,Nanjing 210007,China
  • Received:2022-05-30 Revised:2022-09-13 Online:2023-03-15 Published:2023-03-15
  • About author:MEI Pengcheng,born in 1993,postgra-duate.His main research interests include machine learning and acoustic signal processing.
    YANG Jibin,born in 1978,Ph.D,asso-ciate professor.His main research interests include speech and acoustic signal processing,machine learning and pattern recognition.
  • Supported by:
    National Natural Science Foundation of China(62071484).

Abstract: Sound event localization and detection(SELD) is widely used in monitoring and anomaly detection tasks.Deep learning methods represented by convolutional recurrent neural networks(CRNN) can be realized to improve the performance of SELD.In order to improve the system localization and detection performance,a method based on 3D Convolution feature extraction,called SELD3Dnet,is proposed.The amplitude and phase spectra of input multi-channel acoustic signal are calculated,and the deep feature representation is extracted by multiple 3D Convolution modules.Recurrent neural networks and the fully connected layers are adopted to estimate the type of sound events and their localization.In processing multi-channel acoustic signals,three-dimensional(3D) convolution can carry out convolution calculation of time,frequency and signal channel simultaneously,so that the correlation between signal channels can be exploited to the maximum extent.Comparative experiments are conducted on TUT2018 dataset and TAU2019 dataset,and the results show that the comprehensive performance of the proposed method is significantly improved on TUT2018 REAL and TAU2019 MREAL datasets.The F1 index of acoustic event detection on TUT2018 REAL dataset significan-tly improves by 13.9% and frame accuracy by 21.1%,while the F1 index on TAU2019 MREAL dataset significantly improves by 10.8% and frame accuracy by 14.4%.It is verified that the proposed method can effectively overcome the influence of reverberation existing in real-life scenes.

Key words: Sound event localization and detection, Deep Learning, Convolutional neural networks, Three-dimension convolution, Multi-channel signal

CLC Number: 

  • TP391.42
[1]SALAMON J,BELLO J P.Deep Convolutional Neural Net-works and Data Augmentation for Environmental Sound Classification [J].IEEE Signal Processing Letters,2017,24(3):279-283.
[2]ZHANG X Y,ZHANG H L,HAN Y Y,et al.Research Progress of the Wildlife Monitoring and Identification Based on Deep Learning[J].Journal of Chinese Journal of Wildlife,2022,43(1):251-258.
[3]FOGGIA P,PETKOV N,SAGGESE A,et al.Audio Surveillance of Roads:A System for Detecting Anomalous Sounds [J].IEEE Transactions on Intelligent Transportation Systems,2016,17(1):279-288.
[4]NAKAMURA K,NAKADAI K,INCE G.Real-time super-resolution Sound Source Localization for robots [C]//Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.2012:694-699.
[5]BUTKO T,PLA F G,SEGURA C,et al.Two-source acousticevent detection and localization:Online implementation in a Smart-room.[C]//Proceedings of the European Signal Proces-sing Conference.2011:1317-1321.
[6]HIRVONEN T.Classification of Spatial Audio Location andContent Using Convolutional Neural Networks[C]//Audio Engineering Society Convention 138.Audio Engineering Society,New York,USA,2015:1857-1861.
[7]ADAVANNE S,POLITIS A,NIKUNEN J,et al.Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].IEEE Journal of Selected Topics in Signal Processing,2019,13(1):34-48.
[8]ADAVANNE S,POLITIS A,VIRTANEN T.A multi-room reverberant dataset for sound event localization and detection [J].arXiv:1905.08546,2019.
[9]POLITIS A,MESAROS A,ADAVANNE S,et al.Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019 [J].IEEE/ACM Transactions on Audio Speech and Language Processing,2021,29:684-698.
[10]LI X T,ZHONG S C,ZHONG J F.DOA estimation of wideband signal based on improved MUSIC [J].Computer Engineering,2022,48(11):201-206.
[11]XU C D,LIU H,MIN Y,et al.Sound event localization and detection based on dual attention [EB/OL].http://kns.cnki.net/kcms/detail/11.2127.TP.20220824.1356.008.html.
[12]SONG H,LIU X J,YU S F,et al.Binaural localization algorithm based on deep learning [J].Technical Acoustics,2022,41(4):602-607.
[13]YANG L P,HAO J Y,GU X H,et al.Sound Event Detection width Audio Tagging Consistency Constraint CRNN [J].Journal of Electronics & Information Technology,2022,44(3):1102-1110.
[14]ADAVANNE S,POLITIS A,VIRTANEN T.Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network[C]//Proceedings of the 2018 26th European Signal Processing Conference(EUSIPCO).2018:1462-1466.
[15]GANNOT S,VINCENT E,MARKOVICH-GOLAN S,et al.A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation [J].IEEE/ACM Transaction on Audio,Speech and Language Processing,2017,25(4):692-730.
[16]CAKIR E,PARASCANDOLO G,HEITTOLA T,et al.Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection [J].IEEE/ACM Transactions on Audio,Speech & Language Processing,2017,25(6):1291-1303.
[17]YU L,PAN Z,CHEN Z W,et al.Eigenvalue filtering method for microphone array denoising [J].Acta Acustica,2021,46(3):335-43.
[18]HUANG J,HU X Y.Indoor 3D Sound Source Localization Optimization Algorithm Based on Microphone Array [J]. Compu-ter Systems & Applications,2021,30(9):212-218.
[19]CAO Y,IQBAL T,KONG Q,et al.An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection[C]//Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop(DCASE2019).2021:885-889.
[20]KAPKA S,LEWANDOWSKI M.Sound source detection,localization and classification using consecutive ensemble of CRNN models.[C]//Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop(DCASE2019).2019:119-123.
[21]RANJAN R,JAYABALAN S,NGUYEN T N T,et al.Soundevent detection and direction of arrival estimation using residual net and recurrent neural networks[C]//Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop(DCASE2019).2019:214-218.
[22]GRONDIN F,GLASS J,SOBIERAJ I,et al.Sound event localization and detection using CRNN on pairs of microphones[C]//Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop(DCASE2019).2019:84-88.
[23]ADAVANNE S,POLITIS A,VIRTANEN T.MultichannelSound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features[C]//Proceedings of the 2018 International Joint Conference on Neural Networks(IJCNN).2018:1-7.
[24]ADAVANNE S,PERTILÄ P,VIRTANEN T.Sound event detection using spatial features and convolutional recurrent neural network.[C]//Proceedings of the 2017 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).2017:771-775.
[25]SANG T H,CHIEN F T,CHANG C C,et al.DoA Estimation for FMCW Radar by 3D-CNN [J].Sensors,2021,21:5319.
[26]DIAZ-GUERRA D,MIGUEL A,BELTRÁN J R.Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks [J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:300-311.
[27]AGYEMAN R,RAFIQ M,SHIN H K,et al.Optimizing Spatiotemporal Feature Learning in 3D Convolutional Neural Networks With Pooling Blocks [J].IEEE Access,2021,9:70797-70805.
[28]GU J,YANG X,MELLO S D,et al.Dynamic Facial Analysis:From Bayesian Filtering to Recurrent Neural Network[C]//proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:1531-1540.
[29]ADAVANNE S,POLITIS A,VIRTANEN T.Localization,Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network[C]//Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop(DCASE2019).USA,2019:20-24.
[30]MESAROS A,HEITTOLA T,VIRTANEN T.Metrics for Poly-phonic Sound Event Detection [J].Applied Sciences,2016,6(6):162.
[31]KUHN H W.The Hungarian method for the assignment problem [J].Naval Research Logistics Quarterly,1955,2(1):83-97.
[1] BAI Xuefei, MA Yanan, WANG Wenjian. Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion [J]. Computer Science, 2023, 50(3): 199-207.
[2] LIU Hang, PU Yuanyuan, LYU Dahua, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image [J]. Computer Science, 2023, 50(3): 208-215.
[3] CHEN Liang, WANG Lu, LI Shengchun, LIU Changhong. Study on Visual Dashboard Generation Technology Based on Deep Learning [J]. Computer Science, 2023, 50(3): 238-245.
[4] ZHANG Yi, WU Qin. Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention [J]. Computer Science, 2023, 50(3): 246-253.
[5] YING Zonghao, WU Bin. Backdoor Attack on Deep Learning Models:A Survey [J]. Computer Science, 2023, 50(3): 333-350.
[6] WANG Xiaofei, FAN Xueqiang, LI Zhangwei. Improving RNA Base Interactions Prediction Based on Transfer Learning and Multi-view Feature Fusion [J]. Computer Science, 2023, 50(3): 164-172.
[7] DONG Yongfeng, HUANG Gang, XUE Wanruo, LI Linhao. Graph Attention Deep Knowledge Tracing Model Integrated with IRT [J]. Computer Science, 2023, 50(3): 173-180.
[8] HUA Xiaofeng, FENG Na, YU Junqing, HE Yunfeng. Shooting Event Detection of Free Kick in Soccer Video Based on Rule Reasoning [J]. Computer Science, 2023, 50(3): 181-190.
[9] LIANG Jiali, HUA Baojian, SU Shaobo. Tensor Instruction Generation Optimization Fusing with Loop Partitioning [J]. Computer Science, 2023, 50(2): 374-383.
[10] ZOU Yunzhu, DU Shengdong, TENG Fei, LI Tianrui. Visual Question Answering Model Based on Multi-modal Deep Feature Fusion [J]. Computer Science, 2023, 50(2): 123-129.
[11] WANG Pengyu, TAI Wenxin, LIU Fang, ZHONG Ting, LUO Xucheng, ZHOU Fan. Self-supervised Flight Trajectory Prediction Based on Data Augmentation [J]. Computer Science, 2023, 50(2): 130-137.
[12] GUO Nan, LI Jingyuan, REN Xi. Survey of Rigid Object Pose Estimation Algorithms Based on Deep Learning [J]. Computer Science, 2023, 50(2): 178-189.
[13] LI Junlin, OUYANG Zhi, DU Nisuo. Scene Text Detection with Improved Region Proposal Network [J]. Computer Science, 2023, 50(2): 201-208.
[14] HUA Jie, LIU Xueliang, ZHAO Ye. Few-shot Object Detection Based on Feature Fusion [J]. Computer Science, 2023, 50(2): 209-213.
[15] CAI Xiao, CEHN Zhihua, SHENG Bin. SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing [J]. Computer Science, 2023, 50(1): 105-113.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!