Computer Science ›› 2025, Vol. 52 ›› Issue (3): 214-221.doi: 10.11896/jsjkx.240100222
• Computer Graphics & Multimedia • Previous Articles Next Articles
WANG Mengwei, YANG Zhe
CLC Number:
[1]SHOME N,SARKAR A,GHOSH A K,et al.Speaker Recognition through Deep Learning Techniques:A Comprehensive Review and Research Challenges[J].Periodica Polytechnica Electrical Engineering and Computer Science,2023,67(3):300-336. [2]BAI Z,ZHANG X L.Speaker recognition based on deep lear-ning:An overview[J].Neural Networks,2021,140:65-99. [3]WAN Z K,REN Q H,QIN Y C,et al.Statistical pyramid dense time delay neural network for speaker verification[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2022:7532-7536. [4]BENHAFID Z,SELOUANI S A,AMROUCHE A,et al.Attention-based factorized TDNN for a noise-robust and spoof-aware speaker verification system[J].International Journal of Speech Technology,2023,26(4):881-894. [5]DESPLANQUES B,THIENPONDT J,DEMUYNCK K.ECAPA-TDNN:Emphasized Channel Attention,Propagation and Aggregation in TDNN Based Speaker Verification[C]//Proceedings Interspeech.2020:3830-3834. [6]ZHANG X,LIU Q,GUO Q,et al.EIPFD-ResNet:Emphasized Information Propagation and Feature Distribution in ResNet Based Speaker Verification[J].Journal of Chinese Computer Systems.2023,44(3):463-470. [7]KYNYCH F,ZDANSKY J,CERVA P,et al.Online Speaker Diarization Using Optimized SE-ResNet Architecture[C]//International Conference on Text,Speech,and Dialogue.Cham:Springer Nature Switzerland,2023:176-187. [8]CHUNG J S,HUH J,MUN S,et al.In Defence of MetricLearning for Speaker Recognition[C]//Proceedings Interspeech.2020:2977-2981. [9]VARIANI E,LEI X,MCDERMOTT E,et al.Deep neural networks for small footprint text-dependent speaker verification[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2014:4052-4056. [10]SNYDER D,GARCIA-ROMERO D,SELL G,et al.X-vectors:Robust DNN embeddings for speaker recognition[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2018:5329-5333. [11]SNYDER D,GARCIA-ROMERO D,POVEY D,et al.Deep neural network embeddings for text-independent speaker verification[C]//Proceedings Interspeech.2017:999-1003. [12]GAO Z,SONG Y,MCLOUGHLIN I,et al.Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System[C]//Proceedings Interspeech.2019:361-365. [13]TANG Y,DING G,HUANG J,et al.Deep speaker embedding learning with multi-level pooling for text-independent speaker verification[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2019:6116-6120. [14]DEHAK N,KENNY P J,DEHAK R,et al.Front-end factoranalysis for speaker verification[J].IEEE Transactions on Au-dio,Speech,and Language Processing,2010,19(4):788-798. [15]CHOWDHURY F A R R,WANG Q,MORENO I L,et al.Attention-based models for text-dependent speaker verification[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2018:5359-5363. [16]WANG Z,YAO K,LI X,et al.Multi-resolution multi-head attention in deep speaker embedding[C]//2020 IEEE Interna-tional Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2020:6464-6468. [17]ZHANG Y,LV Z,WU H,et al.MFA-Conformer:Multi-scaleFeature Aggregation Conformer for Automatic Speaker Verification[C]//Proceedings Interspeech.2022:306-310. [18]LI C,MA X,JIANG B,et al.Deep speaker:an end-to-end neural speaker embedding system[J].arXiv:1705.02304,2017. [19]GU B,GUO W,DAI L,et al.An improved deep neural network for modeling speaker characteristics at different temporal scales[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2020:6814-6818. [20]THIENPONDT J,DESPLANQUES B,DEMUYNCK K.In-tegrating Frequency Translational Invariance in TDNNs and Frequency Positional Information in 2D ResNets to Enhance Speaker Verification[C]//Proceedings Interspeech.2021:2302-2306. [21]LIU T,DAS R K,LEE K A,et al.MFA:TDNN with multi-scale frequency-channel attention for text-independent speaker verification with short utterances[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2022:7517-7521. [22]ZHAO Z,LI Z,WANG W,et al.PCF:ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification[C]//2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2023:1-5. [23]SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520. [24]LIU Z,MAO H,WU C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11976-11986. [25]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125. [26]JUNG Y,KYE S.M,CHOI Y,et al.Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances[C]//Proceedings Interspeech.2020:1501-1505. [27]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:815-823. [28]LIU W,WEN Y,YU Z,et al.Large-margin softmax loss for convolutional neural networks[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48.2016:507-516. [29]LIU W,WEN Y,YU Z,et al.Sphereface:Deep hypersphere embedding for face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:212-220. [30]WANG F,CHENG J,LIU W,et al.Additive margin softmax for face verification[J].IEEE Signal Processing Letters,2018,25(7):926-930. [31]DENG J,GUO J,XUE N,et al.Arcface:Additive angular margin loss for deep face recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4690-4699. [32]NAGRANI A,CHUNG J S,XIE W,et al.Voxceleb:Large-scale speaker verification in the wild[J].Computer Speech & Language,2020,60:101027. [33]CHUNG J S,NAGRANI A,ZISSERMAN A.VoxCeleb2:Deep Speaker Recognition[C]//Proceedings Interspeech.2018:1086-1090. [34]SNYDER D,CHEN G,POVEY D.MUSAN:A Music,Speech,and Noise Corpus[J].arXiv:1510.08484,2015. [35]KO T,PEDDINTI V,POVEY D,et al.A study on data augmentation of reverberant speech for robust speech recognition[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).New York:IEEE Press,2017:5220-5224. [36]PARK D S,CHAN W,ZHANG Y,et al.SpecAugment:A Simple Data Augmentation Method for Automatic Speech Recognition[C]//Interspeech.2019:2613-2617. |
[1] | LIU Xiaohu, CHEN Defu, LI Jun, ZHOU Xuwen, HU Shan, ZHOU Hao. Speaker Verification Network Based on Multi-scale Convolutional Encoder [J]. Computer Science, 2024, 51(6A): 230700083-6. |
[2] | GUO Xing-chen, YU Yi-biao. Robust Speaker Verification with Spoofing Attack Detection [J]. Computer Science, 2022, 49(6A): 531-536. |
[3] | ZHENG Chun-jun, WANG Chun-li, JIA Ning. Survey of Acoustic Feature Extraction in Speech Tasks [J]. Computer Science, 2020, 47(5): 110-119. |
[4] | HUA Ming, LI Dong-dong, WANG Zhe, GAO Da-qi. End-to-End Speaker Recognition Based on Frame-level Features [J]. Computer Science, 2020, 47(10): 169-173. |
[5] | LUO Yuan and SUN Long. New Method of Robust Voiceprint Feature Extraction and Fusion [J]. Computer Science, 2016, 43(8): 297-299. |
[6] | . TEo-CrCC Characteristic Parameter Extraction Method for Speaker Recognition in Noisy Environments [J]. Computer Science, 2012, 39(12): 198-203. |
|