计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220900135-7.doi: 10.11896/jsjkx.220900135
何儒汉1,2, 陈一帆1,2, 余永升3, 姜艾森4
HE Ruhan1,2, CHEN Yifan1,2, YU Yongsheng3and JIANG Aisen4
摘要: 基于神经网络的声源定位近年来受到广泛的关注,但如何缓解隐含DOA位置信息丢失、小样本数据等问题仍然是目前面临的挑战,因此提出了一种基于GRU和自注意力网络的声源到达方向估计方法。该方法采用对小型数据集效果较好的GRU作为骨干网络,弥补了纯净的声音数据采集困难的问题;同时,该方法使用多声道录音的声源形成训练集,经过短时傅里叶变换特征提取得到梅尔频谱图和声学强度矢量,进而形成由多通道语谱图以及归一化的主特征向量叠加的输入特征,避免了对语谱图与GCC-PHAT特征结合的隐式DOA信息的破坏,有效缓解了隐含DOA位置信息丢失问题;将其作为输入进入卷积循环神经网络模型进行监督学习获得模型参数。模型输出使用三维笛卡尔积坐标回归获得DOA位置估计,并增加自注意力网络在模型训练时进行参数回传,使得网络在训练的同时计算损失并预测关联矩阵,以解决预测定位和参考定位之间的最优分配。实验结果表明,该网络在不同混响条件和信噪比的环境下,均具有较高的定位准确率和鲁棒性。
中图分类号:
[1]HONG H,WANG M,FU M,et al.Sound Source Localization Sensor of Robot for Tdoa Method[C]//Third International Conference on Intelligent Human-machine Systems & Cybernetics.Zhejiang,China:IEEE,2011:19-22. [2]SALVATI D,DRIOLI C,FORESTI G L.On the Use of Ma-chine Learning in Microphone Array Beamforming for Far-Field Sound Source Localization[C]//2016 IEEE 26th International Workshop on Machine Learning for Signal Processing(MLSP).Vietrisul Mare,Italy:IEEE,2016:1-6. [3]HIRVONEN T.Speech/Music Classification of Short AudioSegments[C]//IEEE International Symposium on Multimedia.IEEE,2015:2-9. [4]TAKEDA R,KOMATANI K.Sound source localization based on deep neural networks with directional activate function exploiting phase information[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2016:3-20. [5]TAKEDA R,KOMATANI K.Discriminative multiple soundsource localization based on deep neural networks using independent location model[C]//Spoken Language Technology Workshop.IEEE,2017. [6]YALTA N,NAKADAI K,OGATA T.Sound source localization using deep learning models[J].Journal of Robotics and Mechatronics,2017,29(1):37-48. [7]CHU F J,VELA P A.Deep grasp:Detection and localization of grasps with deep neural networks[J].arXiv:1802.00520,2018. [8]CHAKRABARTY S,HABETS,EMANUËL A P.Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals[J].IEEE Journal of Selected Topics in Signal Processing,2019,13(1):8-21. [9]CHAKRABARTY S,HABETS E.Multi-speaker localizationusing convolutional neural network trained with noise[J].ar-Xiv:1712.04276,2017. [10]FERGUSON E L,WILLIAMS S B,JIN C T.Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks[J].arXiv:1710.10948,2017. [11]ADAVANNE S,POLITIS A,VIRTANEN T.Direction of arrivalestimation for multiple sound sources using convolutional recurrent neural network[C]//2018 26th European Signal Processing Conference(EUSIPCO).IEEE,2018:1462-1466. [12]ZHOU Z,RUI Y,CAI X,et al.Constrained total least squares method using TDOA measurements for jointly estimating acoustic emission source and wave velocity[J].Measurement,2021,182:109758. [13]ZHANG Y D.Research on Microphone Array Sound Source Localization and Beamforming Fof Speech Interaction[D].Xiamen:Xiamen University,2019. [14]CHEN Y,HSU Y,BAI M R.Multi-channel end-to-end neural network for speech enhancement,source localization,and voice activity detection[J].arXiv:2206.09728,2022. [15]MAZZON L,KOIZUMI Y,YASUDA M,et al.First order ambisonics domain spatial augmentation for DNN-based direction of arrival estimation[J].arXiv:1910.04388,2019. [16]WANG Q,DU J,WU H X,et al.A four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection[J].arXiv:2101.02919,2021. [17]HIRVONEN T.Classification of spatial audio location and content using convolutional neural networks[C]//Audio Enginee-ring Society Convention 138.Audio Engineering Society,2015. [18]ADAVANNE S,POLITIS A,NIKUNEN J,et al.Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks:10.1109/JSTSP.2018.2885636[P].2018. [19]HU J,CAO Y,WU M,et al.Sound Event Localization and De-tection for Real Spatial Sound Scenes:Event-Independent Network and Data Augmentation Chains[J].arXiv:2209.01802,2022. [20]NGUYEN T N T,JONES D L,GAN W S.A sequence matchingnetwork for polyphonic sound event localization and detection[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2020:71-75. [21]NGUYEN T N T,NGUYEN N K,PHAN H,et al.A general network architecture for sound event localization and detection using transfer learning and recurrent neural network[C]//2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2021).IEEE,2021:935-939. [22]SHIMADA K,KOYAMA Y,TAKAHASHI N,et al.AC-CDOA:Activity-coupled cartesian direction of arrival representation for sound event localization and detection[C]//2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP arXiv).IEEE,2021:915-919. [23]TAKAHASHI N,MITSUFUJI Y.Densely connected multi-dilated convolutional networks for dense prediction tasks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:993-1002. [24]POLITIS A,ADAVANNE S,VIRTANEN T.A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection[J].arXiv:2006.01919,2020. [25]POLITIS A,ADAVANNE S,KRAUSE D,et al.A dataset of dynamic reverberant sound scenes with directional interferers for sound event localization and detection[J].arXiv:2106.06999,2021. [26]WANG Q,WU H,JING Z,et al.The USTC-iFlytek system for sound event localization and detection of DCASE2020 challenge[J].IEEE AASP Chall.Detect.Classif.Acoust.Scenes Events,2020,17(1):5-13. [27]YE Z,WANG X,LIU H,et al.Sound Event Detection Trans-former:An Event-based End-to-End Model for Sound Event Detection[J].arXiv:2110.02011,2021. [28]NARANJO-ALCAZAR J,PEREZ-CASTANOS S,FERRAN-DIS J,et al.Sound event localization and detection using squeeze-excitation residual CNNs[J].arXiv:2006.14436,2020. |
|